Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

9,870 datasets found
More categories…
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting...
  • Metadata: 1/5

    Finnish News Agency Archive

    The Finnish News Agency Archive corpus comprises newswire articles made public by the Finnish News Agency (STT) during1992 to 2018. The corpora will be available through the corpus interface Korp (korp.csc.fi) as scrambled sentences (CC BY NC) and in the download service as whole texts (CLARIN RES).
  • Metadata: 2/5

    Finnish Wikipedia 2017, Kielipankki Korp Version

    The Finnish Wikipedia 2017 Corpus will be available in the concordance tool Korp. The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and annotated with morpho-syntactic...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, Kielipankki Korp Version

    The corpus will be available in Kielipankki through the Interface Korp. The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting can be found in the original publication...
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Download...

    The resource, containing entire newspaper and magazine articles, has been made available for Download in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-201712201 The data consists of source data in PDF form or as plain text and is not annotated. An annotated version (lehdet90ff-vrt-v2) is available, see links below Relations on...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401 ja Oulun yliopiston Päätalo-kokoelman tietosivu Kielipankin sivustolta osoitteesta https://www.kielipankki.fi/aineistot/oulun-yliopiston-paatalo-kokoelma/ Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Aineiston on julkaistu konkordanssityökalu Korpissa...
  • Metadata: 1/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki TDPP Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401. Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Sarjan 26 kirjaa on jäsennetty Kielipankissa kahdella eri jäsentimellä. Molemmat julkaistaan Kielipankin Korp-konkordanssipalvelussa (korp.csc.fi). Tämän aineisto on jäsennetty Turku Dependency Parser Pipeline (TDPP)...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection

    Iijoki-korpus on Oulun yliopiston Kielipankkiin tallettama kirjailija Kalle Päätalon (11.11.1919-20.11.2000) omaelämäkerrallinen pääteos. Päätaloa voidaan luonnehtia ainutlaatuiseksi suomalaisen lähihistorian ja työn kuvaajaksi sekä Koillismaan murteen tallentajaksi. Hänen kirjojensa aiheita olivat muun muassa nälkäaika, pula-ajat, metsätyöt,...
  • Metadata: 3/5

    Everyday Experiences of Poverty 2012: Follow-up Study

    Aineisto koostuu 'Arkipäivän kokemuksia köyhyydestä' -kirjoituskilpailuun vuonna 2006 osallistuneiden henkilöiden uusista, vuonna 2012 kirjoittamista kirjoituksista. Kirjoituskutsu lähetettiin valikoidusti vuoden 2006 kirjoituskilpailuun osallistuneille henkilöille. Tarkoituksena oli selvittää, mitä köyhyyskirjoituskilpailuun osallistuneille henkilöille...
  • Metadata: 3/5

    Everyday Experiences of Poverty: Self-administered Writings 2006

    Aineisto koostuu "Arkipäivän kokemuksia köyhyydestä" -kirjoituskilpailun kautta kerätyistä teksteistä. Kirjoituksia saapui eri puolilta Suomea, ja kirjoittajat edustavat monipuolisesti eri väestöryhmiä, kuten lapsiperheitä, yksinhuoltajia, mielenterveyskuntoutujia, pitkäaikaissairaita, pienituloisia työntekijöitä, pienyrittäjiä, velkaantuneita,...
  • Metadata: 2/5

    The Swedish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Swedish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Swedish includes 3354 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 2/5

    The Finnish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Finnish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Finnish includes 1157 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Download Version

    The corpus, which is the Download version of The Finnish Dialect Syntax Archive's Helsinki Korp Version (http://urn.fi/urn:nbn:fi:lb-2016040702), is available in Kielipankki - the Language Bank of Finland Download service korp.csc.fi/download under the license CC BY ND 4.0. For more information see the metadata of The Finnish Dialect Syntax Arhive...
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s (VRT), Ve...

    The corpus is available for Download in Kielipankki - the Language Bank of Finland The data is annotated and identical to the data used as basis for lehdet90ff-v2. A short documentation of the VRT file format can be found via the Documentation section. Reference instructions: See Attribution Details under Documentation. When quoting, also the name of the...
  • Metadata: 2/5

    Yle News Archive Easy-to-read Finnish 2011-2018, source

    This dataset consists of the selkouutiset in Finnish (Yle Easy-to-read Finnish News) published on the Yle news website https://yle.fi. The dataset was created by FIN-CLARIN from the contents of the Yle News Archive harvested on 2019-03-08 for the language code "fi" for each month from the year 2011 to the year 2018, inclusive. The Easy-to-read-Finnish...
  • Metadata: 2/5

    Finnish News Corpus for Named Entity Recognition

    The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event,and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The data sets are available at https://github.com/mpsilfve/finer-data and will be available in the download...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, source

    The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of the material is news articles that vary from short “news flashes” to telegrams and longer articles. News articles are categorized by department...
  • Metadata: 2/5

    Fenno-Ugrica Kielipankki Downloadable Version

    The Kielipankki downloadable version of Fenno-ugrica (http://urn.fi/urn:nbn:fi:lb-2014073056) is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2019032501
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...