Skip to content
Research data finder

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin ( will be migrated into new Etsin ( at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

59 datasets found
More categories…
  • Metadata: 2/5

    The "Hallituskausi 2007–2011" Translation Memory

    The "Hallituskausi 2007–2011" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites. The memory features some 58,000 Finnish-to-English translation segments. The tmx format requires a SDL Trados Studio programme. The...
  • Metadata: 2/5

    Opusparcus: Open Subtitles Paraphrase Corpus for Six Languages (version 1.0)

    Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows. The data in Opusparcus has been extracted from OpenSubtitles2016 (, which is in...
  • Metadata: 2/5

    ERME Erzya and Moksha Extended Corpora

    ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity...
  • Metadata: 2/5

    Samples of Northern Saami

    The corpus contains audio samples of spoken Northern Saami dialects (Sea Saami, Finnmark Saami and Torne Saami). It is available in LAT ( Each audio file contains one interview. The material has been morphologically glossed and the transcripts have been translated into Finnish and English. log 26.11.2018 link...
  • Metadata: 2/5

    Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (, access rights instructions: Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519...
  • Metadata: 3/5

    University of Oulu Kikosa Collection

    The Kikosa Collection consists of video recorded everyday interactions among multicultural families and groups of friends. The collection is housed at the University of Oulu Department of Languages and Literature and it can be used for studies of language and interaction.
  • Metadata: 2/5

    The "Hallituskausi 2011–2015" Translation Memory

    The "Hallituskausi 2011–2015" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites during the ongoing electoral period. The memory features some 11,000 Finnish-to-English translation segments. The translation memory runs...
  • Metadata: 2/5

    Corpus of Age-related Voice Disguise

    This corpus includes normal and age-related disguised speech uttered by 60 native Finnish speakers (31 females and 29 males). The speakers were asked to read the same text fragments several times, in their modal voice and in two disguised voices, first pretending to be an elderly speaker and then pretending to be a child. The texts consisted of the...
  • Metadata: 3/5

    Academic publisher costs in Finland 2010–2017

    This dataset includes academic publisher costs paid by Finnish research organizations to publishers and suppliers during the years 2010–2017. Dataset includes total costs of license contracts made with individual publishers or suppliers. Dataset also includes information on the different materials and types the contracts included. Also included is the...
  • Metadata: 4/5

    Aineisto artikkeliin “Modernization of Russian district heating systems with ...

    Haastatteluäänitteet, sanomalehtiartikkelit
  • Metadata: 2/5

    Citation Database of Fennistic Dialect Dissertations

    The citation database will be published in the Download service in Kielipankki, the Language Bank of Finland The citation database consists of 41 bibliographies of dissertations on dialects in the field of Finnish language. The database contains the following information about each reference: author; publication year; title,...
  • Metadata: 2/5

    CEFLING Project Corpus

    Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 - 9) in the project CEFLING - Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information,...
  • Metadata: 2/5

    MULCOLD, Multilingual Parallel Corpus of Legal Texts

    The corpus is available in Kielipankki - the Language Bank of Finland ( at The sub-corpora containing the Russian, German and Russian texts respectively are available at The corpus contains international conventions and treaties arranged as a parallel corpus aligned...
  • Metadata: 2/5

    Information on Social Security Benefits in Finnish Sign Language

    Information available also in Finnish Sign Language on social security benefits provided by Kela (the Finnish Social Insurance Institution). log 26.11.2018 access location link moved to the description and the link removed
  • Metadata: 2/5

    The National Certificates Corpus

    The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files). The corpus contains background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances. More...
  • Metadata: 2/5

    AddictionLink in Finnish Sign Language

    Information available also in Finnish Sign Language on alcohol, drugs and addictions, on independent change programs and a self-assessment test on the use of alcohol. log 26.11.2018 link removed
  • Metadata: 2/5

    Information in Sign Language on the Tasks of the Parliamentary Ombudsman of F...

    Information available also in Finnish and Finland Swedish sign language on the tasks of the Parliamentary Ombudsman of Finland. log 26.11.2018 link removed
  • Metadata: 2/5

    Corpora of Newspaper Texts

    Computer corpora in Finnish, Swedish and English languages (newspaper texts), with requests and relevance information used in information retrieval evaluation. More information: log 25.11.2018 link to removed
  • Metadata: 2/5

    Finnish Telegraphese Corpus

    Computer corpus of Finnish telegraphese language (with English interlinears and translation). The Finnish Telegraphese Corpus is a product of a cross-linguistic study of telegraphic language produced by normal adult subjects (university students) to describe a set of states of affairs. The responses were gathered in written questionnaire format, and the...
  • Metadata: 2/5

    Consumer Information in Finnish Sign Language

    Advice to consumers in Finnish sign language with regards to e.g. product defects, service related complaints, canceling orders and online shopping. log 25.11.2018 links and removed