Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

9,931 datasets found
More categories…
  • Metadata: 2/5

    The Swedish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Swedish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Swedish includes 3354 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 2/5

    The Finnish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Finnish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Finnish includes 1157 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 1/5

    Corpus of Translated Finnish

    The Corpus of Translated Finnish has been compiled in 1999 in the University of Eastern Finland (University of Joensuu at the time and it's School of Translation Studies) in the project Translation Universals led by professor Anna Mauranen. The corpus comprises two parts: texts originally written in Finnish and texts tranlated into Finnish from different...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: elicited narratives, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises elicited narratives from 21 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers three fixed tasks performed by the signers: narrating about short cartoon strips,...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Download Version

    The corpus, which is the Download version of The Finnish Dialect Syntax Archive's Helsinki Korp Version (http://urn.fi/urn:nbn:fi:lb-2016040702), is available in Kielipankki - the Language Bank of Finland Download service korp.csc.fi/download under the license CC BY ND 4.0. For more information see the metadata of The Finnish Dialect Syntax Arhive...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 1/5

    Hundred Finnish Linguistic Life Stories

    More information about the project is available at https://blogs.helsinki.fi/100finnish/
  • Metadata: 2/5

    Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s (VRT), Ve...

    The corpus is available for Download in Kielipankki - the Language Bank of Finland The data is annotated and identical to the data used as basis for lehdet90ff-v2. A short documentation of the VRT file format can be found via the Documentation section. Reference instructions: See Attribution Details under Documentation. When quoting, also the name of the...
  • Metadata: 2/5

    The Downloadable Version of the Finnish Text Collection - Commercial Use

    This downloadable sub corpus of FTC is available for commercial use. The resource is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-201908071 For information about the licence, see http://urn.fi/urn:nbn:fi:lb-20150304139 The corpus available for commercial use is a subcorpus of the Finnish Text Collection. More...
  • Metadata: 1/5

    The Yle MeMAD Media Corpus

    The corpus contains tv programs and videos from the archives of Yle, The Finnish Broadcasting Company. Journalistic programs (news, current affairs etc, no drama) have been selected on various topics and from time period ranging from 1966 to 2018. Each browse-quality video file is accompanied with their descriptive metadata and subtitles. Main audio and...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Team

    The mutable-team subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-team subcorpus consists of appr. 25 h of video of authentic teamwork and the respective...
  • Metadata: 2/5

    Multimodal Translation with the Blind: Art

    The mutable-art subcorpus is part of the MUTABLE corpus (Multimodal Translation with the Blind), which entails video recordings of the work processes related to audio description as well as of the interaction between sighted and blind participants. The mutable-art subcorpus consists of appr. 2 h of video of authentic live audio description in art...
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019

    Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results....
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    The Semfinlex corpora published in the Language Bank of Finland is based on the open data made available by the Semantic Finlex project (https://data.finlex.fi/en/project). The resource comprises original statutes of the Parliament of Finland, decisions by the Finnish Supreme Court and Supreme Administrative Court in Finnish and in Swedish, and also a...
  • Metadata: 2/5

    Wanca 2016, Korp Version (BETA)

    The Korp version of Wanca 2016 is a collection of web corpora in small Uralic languages. The collection is composed of 29 sentence corpora in different languages. The corpora have been collected from the Internet using the automated system developed in the Finno-Ugric Languages and the Internet project (SUKI) supported by the Kone foundation from their...
  • Metadata: 2/5

    Yle News Archive Easy-to-read Finnish 2011-2018, source

    This dataset consists of the selkouutiset in Finnish (Yle Easy-to-read Finnish News) published on the Yle news website https://yle.fi. The dataset was created by FIN-CLARIN from the contents of the Yle News Archive harvested on 2019-03-08 for the language code "fi" for each month from the year 2011 to the year 2018, inclusive. The Easy-to-read-Finnish...
  • Metadata: 2/5

    Finnish News Corpus for Named Entity Recognition

    The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event,and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The data sets are available at https://github.com/mpsilfve/finer-data and will be available in the download...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    The Semfinlex corpora published in the Language Bank of Finland is based on the open data made available by the Semantic Finlex project (https://data.finlex.fi/en/project). The resource comprises original statutes of the Parliament of Finland, decisions by the Finnish Supreme Court and Supreme Administrative Court in Finnish and in Swedish, and also a...
  • Metadata: 1/5

    Finnish Supreme and Supreme Administrative Court decisions from 1980-2018 in ...

    The Semfinlex corpora published in the Language Bank of Finland is based on the open data made available by the Semantic Finlex project (https://data.finlex.fi/en/project). The resource comprises original statutes of the Parliament of Finland, decisions by the Finnish Supreme Court and Supreme Administrative Court in Finnish and in Swedish, and also a...