Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

9,880 datasets found
More categories…
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Downloadable Version, September 2017

    This version contains the data that were available on the Oracc project website in September 2017. Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. This version of ORACC contains the following Oracc projects: Corpus of Ancient Mesopotamian Scholarship;...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, Kielipankki Korp Version

    The corpus will be available for non-commercial use in the concordance tool Korp where the context is restricted to sentences or paragraph. The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of...
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus will be available in the download service korp.csc.fi/download The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401 ja Oulun yliopiston Päätalo-kokoelman tietosivu Kielipankin sivustolta osoitteesta https://www.kielipankki.fi/aineistot/oulun-yliopiston-paatalo-kokoelma/ Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Aineiston on julkaistu konkordanssityökalu Korpissa...
  • Metadata: 1/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki TDPP Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401. Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Sarjan 26 kirjaa on jäsennetty Kielipankissa kahdella eri jäsentimellä. Molemmat julkaistaan Kielipankin Korp-konkordanssipalvelussa (korp.csc.fi). Tämän aineisto on jäsennetty Turku Dependency Parser Pipeline (TDPP)...
  • Metadata: 2/5

    The Swedish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Swedish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Swedish includes 3354 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 2/5

    The Finnish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Finnish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Finnish includes 1157 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 1/5

    Corpus of Translated Finnish

    The Corpus of Translated Finnish has been compiled in 1999 in the University of Eastern Finland (University of Joensuu at the time and it's School of Translation Studies) in the project Translation Universals led by professor Anna Mauranen. The corpus comprises two parts: texts originally written in Finnish and texts tranlated into Finnish from different...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: elicited narratives, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises elicited narratives from 21 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers three fixed tasks performed by the signers: narrating about short cartoon strips,...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Download Version

    The corpus, which is the Download version of The Finnish Dialect Syntax Archive's Helsinki Korp Version (http://urn.fi/urn:nbn:fi:lb-2016040702), is available in Kielipankki - the Language Bank of Finland Download service korp.csc.fi/download under the license CC BY ND 4.0. For more information see the metadata of The Finnish Dialect Syntax Arhive...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 1/5

    Hundred Finnish Linguistic Life Stories

    More information about the project is available at https://blogs.helsinki.fi/100finnish/
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s (VRT), Ve...

    The corpus is available for Download in Kielipankki - the Language Bank of Finland The data is annotated and identical to the data used as basis for lehdet90ff-v2. A short documentation of the VRT file format can be found via the Documentation section. Reference instructions: See Attribution Details under Documentation. When quoting, also the name of the...
  • Metadata: 2/5

    The Downloadable Version of the Finnish Text Collection - Commercial Use

    This downloadable sub corpus of FTC is available for commercial use. The resource is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-201908071 For information about the licence, see http://urn.fi/urn:nbn:fi:lb-20150304139 The corpus available for commercial use is a subcorpus of the Finnish Text Collection. More...
  • Metadata: 1/5

    The Yle MeMAD Media Corpus

    The corpus contains tv programs and videos from the archives of Yle, The Finnish Broadcasting Company. Journalistic programs (news, current affairs etc, no drama) have been selected on various topics and from time period ranging from 1966 to 2018. Each browse-quality video file is accompanied with their descriptive metadata and subtitles. Main audio and...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Team

    The mutable-team subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-team subcorpus consists of appr. 25 h of video of authentic teamwork and the respective...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Art

    The mutable-art subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-art subcorpus consists of appr. 2 h of video of authentic live audio description in art...
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019

    Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results....