Skip to content
Research data finder
FI|EN

Search for a Dataset

165 datasets found
More categories…
  • Metadata: 3/5

    Catalogue of Stone Age hunter-gatherer earth grave finds from Mainland Finlan...

    The catalogue consists of all find material currently (2017) known from Stone Age hunter-gatherer earth graves on mainland Finland. It comprises ca. 3900 finds deriving from 137 graves located at 45 sites. The sites include both cemeteries and single or smaller groups of earth graves discovered in contemporary or multi-periodic settlement sites. The data...
  • Metadata: 4/5

    Benchmark Dataset for Mid-Price Prediction of Limit Order Book Data

    LOB-dataset Synopsis Here we provide the normalized datasets as .txt files. The datasets are divided into two main categories: datasets that include the auction period and datasets that do not. For each of these two categories we provide three normalization set-ups based on z-score, min-max, and decimal-precision normalization. Since we followed the...
  • Metadata: 2/5

    CEFLING Project Corpus

    Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 - 9) in the project CEFLING - Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information,...
  • Metadata: 4/5

    Highly oxygenated molecules (HOMs), sulfuric acid in ambient ions and as neut...

    Sulfuric acid and Highly Oxygenated Molecules were measured by CI-APi-TOF and corresponding ion clusters were measured by APi-TOF at SMEAR II station in Hyytiälä during April-June 2013. The data is combined into one .zip file and contains time series of different groups of compounds as well as their averaged diurnal cycle.
  • Metadata: 2/5

    Information on Social Security Benefits in Finnish Sign Language

    Information available also in Finnish Sign Language on social security benefits provided by Kela (the Finnish Social Insurance Institution).
  • Metadata: 2/5

    The Tampere Bilingual Corpus of Finnish and English

    The Tampere Bilingual Corpus of Finnish and English consists of: a) a fiction sub-corpus, meaning long extracts (15,000 words/50 pages) from 16 English novels (plus their Finnish translations) and similar extracts 16 Finnish novels (plus their English translations); b) a non-fiction sub-corpus, meaning long extracts (and their translations) from...
  • Metadata: 2/5

    Corpus vasorum antiquorum Finlandiae I

    The corpus contains slides and researchers' notes in non-electronic format on ancient Graeco-Roman vases.
  • Metadata: 2/5

    Chuvash Corpus (UHLCS), Helsinki Korp Version

    The resource, a variant of Chuvash Corpus (UHLCS) (see http://urn.fi/urn:nbn:fi:lb-2014032625) will be made available at korp.csc.fi.
  • Metadata: 2/5

    The University of Helsinki's English E-thesis, Korp Version

    The corpus is available in Kielipankki - the Language Bank of Finland in Korp, http://urn.fi/urn:nbn:fi:lb-2016102101. The corpus contains the University of Helsinki's English master's theses as well as the doctoral theses and their summaries published at https://ethesis.helsinki.fi by September 2016.
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki

    The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License: The corpus is available for searching by logged-in staff and students of the FIN-CLARIN member organizations via the...
  • Metadata: 2/5

    The von Wright and Wittgenstein Archives (WWA)

    The archives consist of two parts: the Wittgenstein Archives maintained by Georg Henrik von Wright since the 1960s and von Wright's own literary estate, including a vast amount of letters mainly relating to his work as one of Ludwig Wittgenstein's three literary executors 1951-2003. The main part was donated by G.H. von Wright to the University of...
  • Metadata: 2/5

    JRC-Acquis Multilingual Parallel Corpus

    The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 official languages. The Acquis...
  • Metadata: 2/5

    The Hanken Corpus of Academic Writing

    Contains the written production of students in economic sciences (most of them Masters students) enrolled at the Hanken School of Economics, in Helsinki, Finland. The corpus is designed to be a dynamic 'monitor corpus', in the sense that new texts will periodically be added to it. The corpus will be published in GitHub with a Creative Commons license.
  • Metadata: 2/5

    ERME Erzya and Moksha Extended Corpora

    ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity...
  • Metadata: 2/5

    The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora

    The corpora is available in Kielipankki - the Language Bank of Finland (http://urn.fi/urn:nbn:fi:lb-2015062301). The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora are: The Helsinki Korp JRC-Acquis Finnish-English Corpus The Helsinki Korp JRC-Acquis Finnish-Swedish Corpus The Helsinki Korp JRC-Acquis Finnish-German Corpus The Helsinki Korp JRC-Acquis...
  • Metadata: 2/5

    The National Certificates Corpus

    The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files). The corpus contains background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances.
  • Metadata: 2/5

    Swedish Telegraphese Corpus

    Computer corpus of Swedish telegraphese language (with English interlinears and translation), compiled by Elisabeth Ahlsén (Linguistics, U. Göteborg), and analyzed (tagged & translated) and finalized by Jussi Niemi. The Swedish Telegraphese Corpus is the product of a cross-linguistic study of telegraphic language produced by normal adult subjects...
  • Metadata: 2/5

    The Helsinki Korp Europarl Bilingual Corpora

    The corpora are available in Kielipankki - the Language Bank of Finland (https://korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2015043012. The Helsinki Korp Europarl Bilingual Corpora are: The Helsinki Korp Europarl Finnish-English Corpus The Helsinki Korp Europarl Finnish-Swedish Corpus The Helsinki Korp Europarl Finnish-German Corpus The Helsinki Korp...
  • Metadata: 3/5

    Helsinki Archive of Regional English Speech – Cambridgeshire Sampler

    HARES is a collection of audio-recorded interviews that were gathered in England in the 1970s and 1980s. The fieldworkers were Finnish graduate and post-graduate students from the University of Helsinki, who shared a common interest in the study of dialect syntax. The informants were elderly persons who had lived in the region all their lives and who had...
  • Metadata: 2/5

    Finnish Telegraphese Corpus

    Computer corpus of Finnish telegraphese language (with English interlinears and translation). The Finnish Telegraphese Corpus is a product of a cross-linguistic study of telegraphic language produced by normal adult subjects (university students) to describe a set of states of affairs. The responses were gathered in written questionnaire format, and the...