Skip to content
Research data finder
FI|EN

FIN-CLARIN

Followers 0

Search for datasets

423 datasets found
  • Metadata: 2/5

    Yle Swedish News Archive 2012-2018, Korp (BETA)

    The corpus, containing the articles from Svenska YLE https://svenska.yle.fi from 2012 onwards up to 2018 inclusive, is available at Korp as a Beta version, which means it may be subject to unannounced changes. The licence is available at http://urn.fi/urn:nbn:fi:lb-2019120401
  • Metadata: 2/5

    Yle Swedish News Archive 2012-2018, scrambled, Korp (BETA)

    The corpus, containing the articles from Svenska YLE https://svenska.yle.fi from 2012 onwards up to 2018 inclusive as scrambled sentences, is available at Korp as a Beta version, which means it may be subject to unannounced changes.
  • Metadata: 2/5

    AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

    AI2D-RST is a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural science, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions. Building on...
  • Metadata: 2/5

    The "Hallituskausi 2011–2015" Translation Memory

    The "Hallituskausi 2011–2015" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites during the ongoing electoral period. The memory features some 11,000 Finnish-to-English translation segments. The translation memory runs...
  • Metadata: 2/5

    The "Hallituskausi 2007–2011" Translation Memory

    The "Hallituskausi 2007–2011" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites. The memory features some 58,000 Finnish-to-English translation segments. The tmx format requires a SDL Trados Studio programme. The...
  • Metadata: 2/5

    Official transcripts of the Plenary Sessions of the Parliament of Finland (an...

    Official transcripts of the Plenary Sessions of the Parliament of Finland, edited and published by the Parliament of FInland.
  • Metadata: 2/5

    Original video recordings of the Plenary Sessions of the Parliament of Finlan...

    The videos of the plenary sessions of the Parliament of Finland from the year 2008 onwards, maintained by the Parliament of Finland.
  • Metadata: 2/5

    Open Richly Annotated Cuneiform Corpus, Downloadable Version, September 2017

    This version contains the data that were available on the Oracc project website in September 2017. Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. This version of ORACC contains the following Oracc projects: Corpus of Ancient Mesopotamian Scholarship;...
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, Kielipankki Korp Version

    The corpus will be available for non-commercial use in the concordance tool Korp where the context is restricted to sentences or paragraph. The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of...
  • Metadata: 2/5

    Finnish Wikipedia 2017, source

    The Finnish Wikipedia 2017 source material corpus is available for download. The corpus contains all the Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor. The corpus has been tokenized and annotated with morpho-syntactic analysis...
  • Metadata: 2/5

    Finnish OpenSubtitles 2017, source

    The Finnish OpenSubtitles 2017 source material corpus is available for download. The corpus contains Finnish subtitles for movies and TV-series from http://www.opensubtitles.org/ The corpus is a derivative of the OPUS OpenSubtitles2018 multilingual corpus. Information on the material processing up to sentence splitting can be found in the original...
  • Metadata: 2/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401 ja Oulun yliopiston Päätalo-kokoelman tietosivu Kielipankin sivustolta osoitteesta https://www.kielipankki.fi/aineistot/oulun-yliopiston-paatalo-kokoelma/ Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Aineiston on julkaistu konkordanssityökalu Korpissa...
  • Metadata: 1/5

    Iijoki, the University of Oulu Päätalo collection, Kielipankki TDPP Korp version

    Iijoki-sarjan kuvaus löytyy sivulta http://urn.fi/urn:nbn:fi:lb-2019041401. Lisenssisivu: http://urn.fi/urn:nbn:fi:lb-2019102106 Sarjan 26 kirjaa on jäsennetty Kielipankissa kahdella eri jäsentimellä. Molemmat julkaistaan Kielipankin Korp-konkordanssipalvelussa (korp.csc.fi). Tämän aineisto on jäsennetty Turku Dependency Parser Pipeline (TDPP)...
  • Metadata: 2/5

    The Swedish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Swedish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Swedish includes 3354 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 2/5

    The Finnish sub-corpus of Elias Lönnrot Letters Online - Kielipankki version

    This corpus will be made available at korp.csc.fi. It comprises letters and drafts written in Finnish, which are part of the correspondence corpus 'Elias Lönnrot Letters Online'. The data set in Finnish includes 1157 letters and drafts out of the whole data set of 4511 letters written in Finnish and Swedish. The letters and drafts of letters belong to the...
  • Metadata: 1/5

    Corpus of Translated Finnish

    The Corpus of Translated Finnish has been compiled in 1999 in the University of Eastern Finland (University of Joensuu at the time and it's School of Translation Studies) in the project Translation Universals led by professor Anna Mauranen. The corpus comprises two parts: texts originally written in Finnish and texts tranlated into Finnish from different...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: conversations, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises conversations from 18 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers four fixed tasks performed by the signers: introductions, discussing work/hobbies,...
  • Metadata: 2/5

    Corpus of Finnish Sign Language: elicited narratives, Download version

    This subcorpus is part of the Corpus of Finnish Sign Language collected in the CFINSL project. The subcorpus comprises elicited narratives from 21 Finnish Sign Language signers who belong to different age groups and live in different parts of Finland. The material covers three fixed tasks performed by the signers: narrating about short cartoon strips,...
  • Metadata: 2/5

    The Finnish Dialect Syntax Archive's Helsinki Download Version

    The corpus, which is the Download version of The Finnish Dialect Syntax Archive's Helsinki Korp Version (http://urn.fi/urn:nbn:fi:lb-2016040702), is available in Kielipankki - the Language Bank of Finland Download service korp.csc.fi/download under the license CC BY ND 4.0. For more information see the metadata of The Finnish Dialect Syntax Arhive...
  • Metadata: 1/5

    The INA MeMAD Media Corpus

    The corpus contains television and radio programs from the archives of INA, the French National Audiovisual Institute. The corpus is made of 8 full days of programs on six French public television channels and radio stations (May 19th to 26th, 2014), corresponding to 2014 European elections. The corpus has been created and licensed for the MeMAD project,...