Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

20 datasets found
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 2/5

    Citation Database of Fennistic Dialect Dissertations

    The citation database will be published in the Download service in Kielipankki, the Language Bank of Finland korp.csc.fi/download. The citation database consists of 41 bibliographies of dissertations on dialects in the field of Finnish language. The database contains the following information about each reference: author; publication year; title,...
  • Metadata: 2/5

    Professor Marjatta Wis' Corpus

    The corpus contains i.a. press cuttings, hand-written notes, manuscripts, microfilms and photographs, all in non-electronic format, that belonged to professor Marjatta Wis (1915-2008).
  • Metadata: 2/5

    Written and Oral Data of the TAITO-project

    The corpus contains: a) Texts written by students of German, French, Italian, Swedish or English, who have just started their studies or who are at the end of their first year of study. b) Videos of partially transcribed discussions. In most of the cases the participants in the discussions are two students and one native speaker. The corpus contains...
  • Metadata: 2/5

    The National Certificates Corpus

    The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files). The corpus contains background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances. More...
  • Metadata: 2/5

    The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora

    The corpora is available in Kielipankki - the Language Bank of Finland (http://urn.fi/urn:nbn:fi:lb-2015062301). The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora are: The Helsinki Korp JRC-Acquis Finnish-English Corpus The Helsinki Korp JRC-Acquis Finnish-Swedish Corpus The Helsinki Korp JRC-Acquis Finnish-German Corpus The Helsinki Korp JRC-Acquis...
  • Metadata: 2/5

    Opusparcus: Open Subtitles Paraphrase Corpus for Six Languages (version 1.0)

    Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows. The data in Opusparcus has been extracted from OpenSubtitles2016 (http://opus.nlpl.eu/OpenSubtitles2016.php), which is in...
  • Metadata: 2/5

    Corpus of Spoken Modern French

    Corpus of Spoken Modern French, transcriptions included. More information: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/FinClarinSiteJY log 25.11.2018 link islrn.org/resources/802-128-132-924-0 removed
  • Metadata: 2/5

    Lists of Words Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/words (only Finnish available) The lists of words located at the University of Helsinki Language Corpus Server were generated from the corpora of the following languages:...
  • Metadata: 2/5

    Opus, Helsinki Korp Version

    The Helsinki Korp version of the Opus open parallel corpus (http://opus.lingfil.uu.se/), containing scrambled sentences, has been published in Korp, http://urn.fi/urn:nbn:fi:lb-2016012101 The subcorpora of Opus, Helsinki Korp Version are: OPUS Finnish–Czech OPUS Finnish–Danish OPUS Finnish–Dutch OPUS Finnish–English OPUS Finnish–Estonian OPUS...
  • Metadata: 2/5

    JRC-Acquis Multilingual Parallel Corpus

    The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 official languages. The Acquis...
  • Metadata: 2/5

    The Helsinki Korp Europarl Bilingual Corpora

    The corpora are available in Kielipankki - the Language Bank of Finland (https://korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2015043012. The Helsinki Korp Europarl Bilingual Corpora are: The Helsinki Korp Europarl Finnish-English Corpus The Helsinki Korp Europarl Finnish-Swedish Corpus The Helsinki Korp Europarl Finnish-German Corpus The Helsinki Korp...
  • Metadata: 2/5

    Information in Sign Language on the Tasks of the Parliamentary Ombudsman of F...

    Information available also in Finnish and Finland Swedish sign language on the tasks of the Parliamentary Ombudsman of Finland. log 26.11.2018 link http://islrn.org/resources/881-206-946-973-5 removed
  • Metadata: 2/5

    Lists of Words Corpus (UHLCS), Helsinki Korp Version

    The resource, a variant of Lists of Words Corpus (UHLCS) (http://urn.fi/urn:nbn:fi:lb-201406042), will be made available at korp.csc.fi.
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...
  • Metadata: 2/5

    The University of Helsinki's French E-thesis, Korp Version

    The corpus is available in Kielipankki - the Language Bank of Finland in Korp, http://urn.fi/urn:nbn:fi:lb-2016102803 The corpus contains the University of Helsinki's French master's theses as well as the doctoral theses and their summaries published at https://ethesis.helsinki.fi by September 2016.
  • Metadata: 4/5

    Phrase database for the chatting program Psyk

    This is a data file used by a conversation program (a.k.a. "chatterbot") called Psyk. Psyk is a learning program that remembers every line said to psyk, as well as the conversational context in which it was said. The data is formatted as one phrase per line. Every line has the format (context phrase) where context is a specification of the situation...