Skip to content
Research data finder

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin ( will be migrated into new Etsin ( at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

15 datasets found
  • Metadata: 2/5

    The Morpho-Syntactic Database of Mikael Agricola's Works

    The database will be available through the Interface Korp at The Morpho-Syntactic Database of Mikael Agricola's Works contains the Finnish parts of Mikael Agricola’s works (Abckiria, Rukouskiria, Se Wsi testamenti, Käsikiria, Messu, Piina, Psaltari, Veisut, Profeetat). The database was created from 2004 to 2008, when the texts offered...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, Kielipankki Korp Version

    The corpus will be available for non-commercial use in the concordance tool Korp where the context is restricted to sentences or paragraph. The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of...
  • Metadata: 2/5

    Opus Subtitles Corpus

    The corpus, containing the OpenSubtitles sub-corpora of the Opus open parallel corpus (, will be made available for download at
  • Metadata: 2/5

    Finnish Folk Poetry

    The corpus is available in Kielipankki - the Language Bank of Finland (, A 34-volume collection of Finnic oral poetry, lyric, short rhymes, incantations etc., collected and recorded from the 16th century to the 1930s and published mostly between 1908 and 1948, with a supplement volume published in 1997....
  • Metadata: 2/5

    Aleksis Kivi Corpus (SKS)

    The corpus is available in Kielipankki - the Language Bank of Finland (, All the known letters, manuscripts and published works by Finnish author Aleksis Kivi (1834–1872). Most of the texts were written in Finnish while some of the letters and manuscripts are in Swedish. More information:...
  • Metadata: 2/5

    The Downloadable Version of the Ylilauta Corpus

    The resource contains the vrt data of the Ylilauta Corpus (
  • Metadata: 2/5

    Opus ECB Corpus

    The corpus, containing the European Central Bank sub-corpus of the Opus open parallel corpus (, will be made available for download at
  • Metadata: 2/5

    Opusparcus: Open Subtitles Paraphrase Corpus for Six Languages (version 1.0)

    Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows. The data in Opusparcus has been extracted from OpenSubtitles2016 (, which is in...
  • Metadata: 2/5

    Ylilauta Corpus

    The corpus is available in Kielipankki - the Language Bank of Finland (, The corpus contains text from discussions of the Ylilauta online discussion board from 2012 to 2014. Short fragments from the discussions, e.g. sentences or paragraphs, are publicly available in Korp...
  • Metadata: 2/5

    A Multimodal Corpus of Tourist Brochures Produced by the City of Helsinki, Fi...

    The corpus is available in in Kielipankki - the Language Bank of Finland (, download location: This multimodal corpus, which consists of the tourist brochures produced by the city of Helsinki, Finland, is fully annotated using XML schema provided for the Genre and Multimodality (GeM) model. The GeM...
  • Metadata: 4/5

    Secondary-aged learners' practices for information seeking and evaluation in ...

    Interviews with students The data collected in the CogAHealth project funded by the Academy of Finland consist of, among other data, transcripts of interviews with secondary school students (Grades 8, 9, and 10) in northern Finland. The interviews (N8 = 17, N9 = 16, N10 = 4) were semi-structured, based on tailored interview frameworks, focusing on either...
  • Metadata: 4/5

    NMR data from continuous SABRE polarizer

    NMR spectra and MR images obtained from SABRE-hyperpolarized samples.
  • Metadata: 3/5

    The 2nd Automatic Speaker Verification Spoofing and Countermeasures Challenge...

    This is a database used for the Second Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2017 ( organized by Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, Kong Aik Lee in 2017. The ASVspoof challenge aims to encourage further progress...
  • Metadata: 3/5

    Densely sampled light fields

    Dataset containing pre-rectified horizontal-parallax multi-perspective images of 3D scenes. Dataset consists of 193 camera views (images), positioned equidistanly on a line, where the disparity range between adjacent views is 1 pixel at most. Images are of size 1280×720 pixels and are stored in 8-bit RGB format (PNG). Several other densely sampled light...
  • Metadata: 4/5

    Semantic Finlex

    This dataset contains Linked Data regarding Finnish legislation and case law. The RDF data has been converted from legacy XML formats used within the Finlex online service. RDF data models used in the converted data conform to European URI and metadata standards, namely ELI (European Legislation Identifier) and ECLI (European Case Law Identifier). The...