Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

7,769 datasets found
More categories…
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    Finnish Supreme Court (KKO) decisions in Swedish from 1980-2018 and Supreme Administrative Court (KHO) decisions from 2001-2018 in Swedish. The decisions are available in vrt format. KKO decisions: 5688. KHO decisions: 2603. For most decisions, the language used in court has been Finnish. In that case, the document contains just an abstract in Swedish.
  • Metadata: 2/5

    Wanca 2016, Korp Version (BETA)

    The Korp version of Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundation from their...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    A collection of Finnish Supreme Court (KKO) decisions from 1980-2018 and Supreme Administrative Court (KHO) decisions from 2001-2018. The decisions are in Swedish. The decisions are available in the Korp interface korp.csc.fi. KKO decisions: 5688. KHO decisions: 2603. For most decisions, the language used in court has been Finnish; in that case, there is...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    Finnish Supreme Court (KKO) decisions in Finnish from 1980-2018 and Supreme Administrative Court (KHO) decisions from 1987-2018 in Finnish. The decisions are available in vrt format. KKO decisions: 5651. KHO decisions: 7633. For most decisions, the language used in court has been Finnish. In that case, the document contains the whole decision. If the...
  • Metadata: 1/5

    Finnish Parliament original statutes from 1734-2018 in Finnish, Korp version

    Finnish Parliament original statutes in Finnish from 1734, 1868, 1889, 1895, 1896, 1898, 1901, 1906, 1907 and 1917-2018. The statutes are available in the Korp interface korp.csc.fi. NB! 2019-09-13 Discrepancies in dependency parses of the Finnish data: The dependency parses and relations differ significantly from the parses in other corpora parsed...
  • Metadata: 1/5

    Finnish Parliament original statutes from 1920-2018 in Swedish, Korp version

    A collection of Finnish Parliament original statutes in Swedish from 1920-2018. The statutes are available in the Korp interface korp.csc.fi
  • Metadata: 1/5

    Finnish Parliament original statutes from 1920-2018, Korp version (Finnish-Sw...

    A collection of Finnish Parliament original statutes in Finnish and Swedish from 1920-2018. The statutes are available in the Language Bank of Finland Korp service korp.csc.fi NB! 2019-09-13 Discrepancies in dependency parses of the Finnish data: The dependency parses and relations differ significantly from the parses in other corpora parsed earlier with...
  • Metadata: 1/5

    Finnish Parliament original statutes from 1734-2018, downloadable version

    Finnish Parliament original statutes in Finnish from 1734, 1868, 1889, 1895, 1896, 1898, 1901, 1906, 1907 and 1917-2018 and in Swedish from 1920-2018. The statutes are published in the Language Bank's Download service at korp.csc.fi/download in vrt format. NB! 2019-09-13 Discrepancies in dependency parses of the Finnish data: The dependency parses and...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    Finnish Supreme Court (KKO) decisions from 1980-2018 and Supreme Administrative Court (KHO) decisions from 1987-2018. The decisions are in Finnish. The decisions are available in the Korp interface korp.csc.fi. KKO decisions: 5651. KHO decisions: 7633. For some decisions, the language used in court has been Swedish; in that case the Finnish version...
  • Metadata: 2/5

    Fenno-Ugrica Kielipankki Downloadable Version

    The Kielipankki downloadable version of Fenno-ugrica (http://urn.fi/urn:nbn:fi:lb-2014073056) is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2019032501
  • Metadata: 2/5

    Relative frequencies of part-of-speech n-grams in native and translated Finni...

    These files contain data from Matias Tamminen's MA thesis study "Then shall I know fully: Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose" by Matias Tamminen (2018), University of Helsinki. The material is available at the Language Bank of Finland (Kielipankki) download service, access location...
  • Metadata: 2/5

    Digital Morphology Archives

    The database is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2016032102 DMA is a digital database containing 401729 morphologically coded dialectal clauses from 160 parish dialects of Finnish. The coded clauses are licensed under Creative Commons Attribution 4.0 International. In addition, access to the word...
  • Metadata: 2/5

    Finnish TreeBank 3

    The corpus is available in Kielipankki - the Language Bank of Finland (https://korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2016051001 and downloadable at http://urn.fi/urn:nbn:fi:lb-2016011501 The FinnTreeBank project is creating a treebank and a parsebank for Finnish. This work is licensed under Creative Commons Attribution 3.0. A parsebank for Finnish...
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 1

    The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-201711021 Reference instructions for this older version: University of Helsinki (2016). Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Version 1 [text corpus]. Kielipankki. Retrieved from...
  • Metadata: 2/5

    Collection of OTA Texts in Public Use

    This is a snapshot of the Oxford Text Archive, for testing purposes. For more up-to-date versions of the archive see http://ota.ox.ac.uk/ The snapshot is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, /appl/kielipankki/ota). For information regarding corpus access contact kielipankki@csc.fi
  • Metadata: 2/5

    The Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus

    This sub-corpus of the Helsinki Korp Version of the Finnish TreeBank 3 (http://urn.fi/urn:nbn:fi:lb-2016042602) is available in Kielipankki, the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2016042709
  • Metadata: 2/5

    Finnish TreeBank 2

    The FinnTreeBank project is creating a treebank and a parsebank for Finnish. This work is licensed under Creative Commons Attribution 3.0. The second version of the treebank is annotated by hand and based on 17.000 model senctences in the Large Grammar of Finnish VISK - Iso Suomen Kielioppi. Brief samples of text from other sources, e.g. news items and...
  • Metadata: 2/5

    The Swedish N-grams 1770-1940 of the Newspaper and Periodical Corpus of the N...

    The corpus is available in Kielipankki - the Language Bank of Finland, download: https://korp.csc.fi/download/SNC1/ The National Library of Finland has digitized a large proportion of Finland’s Swedish newspapers, magazines, and periodicals published between 1770 and 1940. This resource contains sets of unigrams, bigrams and trigrams extracted from a...