Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

289 datasets found
More categories…
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Downloadable Version, September 2017

    This version contains the data that were available on the Oracc project website in September 2017. Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. This version of ORACC contains the following Oracc projects: Corpus of Ancient Mesopotamian Scholarship;...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, Kielipankki Korp Version

    The corpus will be available for non-commercial use in the concordance tool Korp where the context is restricted to sentences or paragraph. The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of...
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401 ja Oulun yliopiston Päätalo-kokoelman tietosivu Kielipankin sivustolta osoitteesta https://www.kielipankki.fi/aineistot/oulun-yliopiston-paatalo-kokoelma/ Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Aineiston on julkaistu konkordanssityökalu Korpissa...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Download Version

    The corpus, which is the Download version of The Finnish Dialect Syntax Archive's Helsinki Korp Version (http://urn.fi/urn:nbn:fi:lb-2016040702), is available in Kielipankki - the Language Bank of Finland Download service korp.csc.fi/download under the license CC BY ND 4.0. For more information see the metadata of The Finnish Dialect Syntax Arhive...
  • Metadata: 1/5

    Hundred Finnish Linguistic Life Stories

    More information about the project is available at https://blogs.helsinki.fi/100finnish/
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s (VRT), Ve...

    The corpus is available for Download in Kielipankki - the Language Bank of Finland The data is annotated and identical to the data used as basis for lehdet90ff-v2. A short documentation of the VRT file format can be found via the Documentation section. Reference instructions: See Attribution Details under Documentation. When quoting, also the name of the...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Team

    The mutable-team subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-team subcorpus consists of appr. 25 h of video of authentic teamwork and the respective...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Art

    The mutable-art subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-art subcorpus consists of appr. 2 h of video of authentic live audio description in art...
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019

    Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results....
  • Metadata: 2/5

    Yle News Archive Easy-to-read Finnish 2011-2018, source

    This dataset consists of the selkouutiset in Finnish (Yle Easy-to-read Finnish News) published on the Yle news website https://yle.fi. The dataset was created by FIN-CLARIN from the contents of the Yle News Archive harvested on 2019-03-08 for the language code "fi" for each month from the year 2011 to the year 2018, inclusive. The Easy-to-read-Finnish...
  • Metadata: 2/5

    Finnish News Corpus for Named Entity Recognition

    The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event,and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The data sets are available at https://github.com/mpsilfve/finer-data and will be available in the download...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection

    Iijoki-korpus on Oulun yliopiston Kielipankkiin tallettama kirjailija Kalle Päätalon (11.11.1919-20.11.2000) omaelämäkerrallinen pääteos. Päätaloa voidaan luonnehtia ainutlaatuiseksi suomalaisen lähihistorian ja työn kuvaajaksi sekä Koillismaan murteen tallentajaksi. Hänen kirjojensa aiheita olivat muun muassa nälkäaika, pula-ajat, metsätyöt,...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, source

    The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of the material is news articles that vary from short “news flashes” to telegrams and longer articles. News articles are categorized by department...
  • Metadata: 2/5

    Fenno-Ugrica Kielipankki Downloadable Version

    The Kielipankki downloadable version of Fenno-ugrica (http://urn.fi/urn:nbn:fi:lb-2014073056) is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2019032501
  • Metadata: 2/5

    Relative frequencies of part-of-speech n-grams in native and translated Finni...

    These files contain data from Matias Tamminen's MA thesis study "Then shall I know fully: Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose" by Matias Tamminen (2018), University of Helsinki. The material is available at the Language Bank of Finland (Kielipankki) download service, access location...
  • Metadata: 2/5

    Citation Database of Fennistic Dialect Dissertations

    The citation database will be published in the Download service in Kielipankki, the Language Bank of Finland korp.csc.fi/download. The citation database consists of 41 bibliographies of dissertations on dialects in the field of Finnish language. The database contains the following information about each reference: author; publication year; title,...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Korp Version

    The corpus, which is the Korp version of The Finnish Dialect Syntax Archive (http://urn.fi/urn:nbn:fi:lb-2014052716), is available in Kielipankki - the Language Bank of Finland, http://urn.fi/urn:nbn:fi:lb-2014052715, under the licence CC BY ND 4.0. For more information see http://urn.fi/urn:nbn:fi:lb-2014052716
  • Metadata: 2/5

    Digital Morphology Archives

    The database is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2016032102 DMA is a digital database containing 401729 morphologically coded dialectal clauses from 160 parish dialects of Finnish. The coded clauses are licensed under Creative Commons Attribution 4.0 International. In addition, access to the word...