Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

29 datasets found
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: elicited narratives, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises elicited narratives from 21 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers three fixed tasks performed by the signers: narrating about short cartoon strips,...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Team

    The mutable-team subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-team subcorpus consists of appr. 25 h of video of authentic teamwork and the respective...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Art

    The mutable-art subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-art subcorpus consists of appr. 2 h of video of authentic live audio description in art...
  • Metadata: 2/5

    Wanca 2016, Korp Version (BETA)

    The Korp version of Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundation from their...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: elicited narratives

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises elicited narratives from 21 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers three fixed tasks performed by the signers: narrating about short cartoon strips,...
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki Korp version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. Access and license: This version of the...
  • Metadata: 2/5

    Corpus of Contemporary American English - Kielipankki Korp version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each). Access and license: This...
  • Metadata: 2/5

    Erzya and Moksha Mordvin Word List Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/mordvin-lgs Contents: The Erzya corpus contains a historical word list of Erzya Mordvin documented in...
  • Metadata: 2/5

    Corpus of Global Web-Based English - Kielipankki Korp version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words and 1 800 000 texts from web pages in United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs. Access and license: This version of the...
  • Metadata: 2/5

    Classics of Finnish Literature, download version

    The corpus will be available in Kielipankki - the Language Bank of Finland.
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki

    The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License: The corpus is available for searching by logged-in staff and students of the FIN-CLARIN member organizations via the...
  • Metadata: 2/5

    Corpus of Contemporary American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each). License details: Researchers in...
  • Metadata: 2/5

    Corpus of Finnish Sign Language

    Finnish Sign Language material collected in the CFINSL project. The material consists of video files and the annotations of the videos in ELAN format as well as the metadata about the signers and the content and format of the videos. The material comprises conversations and elicited narratives from 21 Finnish Sign Language signers who belong to different...
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License details: Researchers in the...
  • Metadata: 2/5

    Learning material for speech analysis

    Aineistoa siirretään parhaillaan Kielipankin LAT-alustalle (http://urn.fi/urn:nbn:fi:lb-100110018959), jossa sen osia on jo saatavilla. Aineisto sisältää puheen analyysimenetelmien opiskeluun tarkoitettua suomenkielistä äänimateriaalia: ääneen luettuja sanoja, lauseita ja tarinoita. Koko aineisto on saatavilla Creative Commons Nimeä -lisenssin uusimmalla...
  • Metadata: 3/5

    The Corpus of Border Karelia

    The Corpus of Border Karelia contains the audio recordings and transcripts of dialects spoken in the area of Border Karelia, where the very closely related varieties of eastern Finnish dialects and Karelian were in contact. The informants are evacuees who were mainly moved to eastern Finland after World War II. The original interviews were recorded in the...
  • Metadata: 2/5

    FinRead Corpus of Read-aloud Finnish Speech

    The FinRead corpus is a subcorpus of FinINTAS. FinRead consists of read-aloud speech from the same speakers whose conversations are included in the FinDialogue subcorpus. The corpus includes audio files (WAV) and phonetic annotation files (Praat TextGrid). FinRead will be made available at http://lat.csc.fi, along with FinDialogue. The speakers were...
  • Metadata: 2/5

    Route to A wing

    The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi: http://urn.fi/urn:nbn:fi:lb-2015050502; lat.csc.fi: http://urn.fi/urn:nbn:fi:lb-100110012813). A freely available public demo corpus. Contains a video conversation between two people who are discussing the route to a specific room in Metsätalo at the University of...