Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

5 datasets found
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus is available for download. The corpus contains all Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and annotated with morpho-syntactic analysis...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus is available for download. The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting can be found in the original...
  • Metadata: 2/5

    Psycholinguistic Descriptives

    The material is available at the Language Bank of Finland (Kielipankki) download service, access location http://urn.fi/urn:nbn:fi:lb-2018081602. This material comprises a dataset and a query tool for acquiring commonly used psycholinguistic descriptives for Finnish words. The dataset is based on six large corpora from sources such as magazines,...
  • Metadata: 2/5

    Finnish Wikipedia 2017, Kielipankki Korp Version

    The Finnish Wikipedia 2017 Corpus will be available in the concordance tool Korp. The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and annotated with morpho-syntactic...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, Kielipankki Korp Version

    The corpus will be available in Kielipankki through the Interface Korp. The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting can be found in the original publication...