Skip to content
Research data finder

Search for a Dataset

664 datasets found
More categories…
  • Metadata: 2/5

    FinDe Corpus

    This contrastive language corpus contains German and Finnish literature and press texts and their respective translations into the other language. log 25.11.2018 link removed
  • Metadata: 2/5

    Corpus of Age-related Voice Disguise

    This corpus includes normal and age-related disguised speech uttered by 60 native Finnish speakers (31 females and 29 males). The speakers were asked to read the same text fragments several times, in their modal voice and in two disguised voices, first pretending to be an elderly speaker and then pretending to be a child. The texts consisted of the...
  • Metadata: 2/5

    Samples of Northern Saami

    The corpus contains audio samples of spoken Northern Saami dialects (Sea Saami, Finnmark Saami and Torne Saami). It will be published in LAT ( Each audio file contains one interview. The material has been morphologically glossed and the transcripts have been translated into Finnish and English. log 26.11.2018 link...
  • Metadata: 2/5

    The Advanced Finnish Learners’ Corpus, Downloadable Version

    The resource, which is the downloadable version of the The Advanced Finnish Learners’ Corpus , is available at For more information see The purpose of the resource use must be outlined in a research plan. Distribution of copies is not allowed. If the resource is used as material...
  • Metadata: 3/5

    The News and Comments Corpus

    The corpus is available in the Language Bank's Korp service ( The News and Comments Corpus contains the domestic news of the Helsingin Sanomat website and their comments from 5.9.2011 to 4.9.2012. The corpus starts with the first news of 5.9.2011 and ends with a news published in the morning on 3.9.2012 and...
  • Metadata: 3/5

    Academic publisher costs in Finland 2010–2017

    This dataset includes academic publisher costs paid by Finnish research organizations to publishers and suppliers during the years 2010–2017. Dataset includes total costs of license contracts made with individual publishers or suppliers. Dataset also includes information on the different materials and types the contracts included. Also included is the...
  • Metadata: 2/5

    The Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus, Downlo...

    The downloadable version of the Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus will be made available at Change log: 16.1.2019 updated urn
  • Metadata: 2/5

    Finnish Verbal Colorative Constructions

    The resource contains Finnish verbal colorative constructions from the database of the word notes used when creating the dictionaries Nykysuomen sanakirja and Kielitoimiston sanakirja (, from various literary works, from a query test made by Maria-Magdalena Jürvetson as well as from different Internet sources. The...
  • Metadata: 1/5

    Finnish News Agency Archive

    The Finnish News Agency Archive corpus comprises newswire articles made public by the Finnish News Agency (STT) during the 2000's. More detailed information about the time frame will be available on the publication of the corpus. The corpus will be available through the corpus interface Korp ( as scrambled sentences (CC BY NC) and in the...
  • Metadata: 4/5

    Aineisto artikkeliin “Modernization of Russian district heating systems with ...

    Haastatteluäänitteet, sanomalehtiartikkelit
  • Metadata: 2/5

    Finnish TreeBank 3

    The corpus is available in Kielipankki - the Language Bank of Finland (, and downloadable at The FinnTreeBank project is creating a treebank and a parsebank for Finnish. This work is licensed under Creative Commons Attribution 3.0. A parsebank for Finnish...
  • Metadata: 2/5

    Yle News Archive 2011-

    The corpus, containing the articles from YLE from 2011-2017, will be made available at
  • Metadata: 4/5

    Landsat 1999-2002, TIF

    Landsat is a joint project of NASA and the U.S. Geological Survey. The goal of the project is to provide a comprehensive images collection of the whole Earth. The project began in the early 1970s and after that seven satellites have been launched. The last Landsat satellite - Landsat 7 - made observations with ETM sensor in eight different wavelength...
  • Metadata: 2/5

    Citation Database of Fennistic Dialect Dissertations

    The citation database will be published in the Download service in Kielipankki, the Language Bank of Finland The citation database consists of 41 bibliographies of dissertations on dialects in the field of Finnish language. The database contains the following information about each reference: author; publication year; title,...
  • Metadata: 4/5

    Helsinki Region Travel Time Matrix 2018

    Helsinki Region Travel Time Matrix contains travel time and distance information for routes between all 250 m x 250 m grid cell centroids (n = 13231) in the Capital Region of Helsinki by walking, cycling, public transportation and car. The grid cells are compatible with the statistical grid cells used by Statistics Finland and the YKR (yhdyskuntarakenteen...
  • Metadata: 3/5

    International Corpus of Learner Finnish (ICLFI)

    The International Corpus of Learner Finnish (ICLFI) is a corpus of written learner language. The corpus is available in Kielipankki - the Language Bank of Finland in Korp ( at Access rights instructions: (in Finnish:...
  • Metadata: 2/5

    CEFLING Project Corpus

    Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 - 9) in the project CEFLING - Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information,...
  • Metadata: 2/5

    Classics of Finnish Literature, Kielipankki Version

    Works of established Finnish authors published from 1880s to 1940s. Includes prose fiction, plays, poetry and aphorisms, some written originally in Swedish. The corpus is available in Kielipankki - the Language Bank of Finland, log 26.11.2018 link removed
  • Metadata: 2/5

    MULCOLD, Multilingual Parallel Corpus of Legal Texts

    The corpus is available in Kielipankki - the Language Bank of Finland ( at The sub-corpora containing the Russian, German and Russian texts respectively are available at The corpus contains international conventions and treaties arranged as a parallel corpus aligned...
  • Metadata: 2/5

    Finnish Corpus (Literature) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (, access rights instructions: Contents: HKV corpus: consists of samples of the Finnish literature representing various text types. The corpus is documented in the following publication: Auli Hakulinen & Fred Karlsson &...