Skip to content
Research data finder
FI|EN

Search for a Dataset

164 datasets found
More categories…
  • Metadata: 4/5

    Benchmark Dataset for Mid-Price Prediction of Limit Order Book Data

    LOB-dataset Synopsis Here we provide the normalized datasets as .txt files. The datasets are divided into two main categories: datasets that include the auction period and datasets that do not. For each of these two categories we provide three normalization set-ups based on z-score, min-max, and decimal-precision normalization. Since we followed the...
  • Metadata: 2/5

    Information on Social Security Benefits in Finnish Sign Language

    Information available also in Finnish Sign Language on social security benefits provided by Kela (the Finnish Social Insurance Institution).
  • Metadata: 2/5

    The Tampere Bilingual Corpus of Finnish and English

    The Tampere Bilingual Corpus of Finnish and English consists of: a) a fiction sub-corpus, meaning long extracts (15,000 words/50 pages) from 16 English novels (plus their Finnish translations) and similar extracts 16 Finnish novels (plus their English translations); b) a non-fiction sub-corpus, meaning long extracts (and their translations) from...
  • Metadata: 2/5

    Corpus vasorum antiquorum Finlandiae I

    The corpus contains slides and researchers' notes in non-electronic format on ancient Graeco-Roman vases.
  • Metadata: 2/5

    Chuvash Corpus (UHLCS), Helsinki Korp Version

    The resource, a variant of Chuvash Corpus (UHLCS) (see http://urn.fi/urn:nbn:fi:lb-2014032625) will be made available at korp.csc.fi.
  • Metadata: 2/5

    The University of Helsinki's English E-thesis, Korp Version

    The corpus is available in Kielipankki - the Language Bank of Finland in Korp, http://urn.fi/urn:nbn:fi:lb-2016102101. The corpus contains the University of Helsinki's English master's theses as well as the doctoral theses and their summaries published at https://ethesis.helsinki.fi by September 2016.
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki

    The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License: The corpus is available for searching by logged-in staff and students of the FIN-CLARIN member organizations via the...
  • Metadata: 2/5

    The von Wright and Wittgenstein Archives (WWA)

    The archives consist of two parts: the Wittgenstein Archives maintained by Georg Henrik von Wright since the 1960s and von Wright's own literary estate, including a vast amount of letters mainly relating to his work as one of Ludwig Wittgenstein's three literary executors 1951-2003. The main part was donated by G.H. von Wright to the University of...
  • Metadata: 2/5

    JRC-Acquis Multilingual Parallel Corpus

    The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 official languages. The Acquis...
  • Metadata: 2/5

    The Hanken Corpus of Academic Writing

    Contains the written production of students in economic sciences (most of them Masters students) enrolled at the Hanken School of Economics, in Helsinki, Finland. The corpus is designed to be a dynamic 'monitor corpus', in the sense that new texts will periodically be added to it. The corpus will be published in GitHub with a Creative Commons license.
  • Metadata: 2/5

    ERME Erzya and Moksha Extended Corpora

    ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity...
  • Metadata: 2/5

    The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora

    The corpora is available in Kielipankki - the Language Bank of Finland (http://urn.fi/urn:nbn:fi:lb-2015062301). The Helsinki Korp JRC-Acquis Bilingual Parallel Corpora are: The Helsinki Korp JRC-Acquis Finnish-English Corpus The Helsinki Korp JRC-Acquis Finnish-Swedish Corpus The Helsinki Korp JRC-Acquis Finnish-German Corpus The Helsinki Korp JRC-Acquis...
  • Metadata: 2/5

    The National Certificates Corpus

    The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files). The corpus contains background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances.
  • Metadata: 2/5

    Swedish Telegraphese Corpus

    Computer corpus of Swedish telegraphese language (with English interlinears and translation), compiled by Elisabeth Ahlsén (Linguistics, U. Göteborg), and analyzed (tagged & translated) and finalized by Jussi Niemi. The Swedish Telegraphese Corpus is the product of a cross-linguistic study of telegraphic language produced by normal adult subjects...
  • Metadata: 2/5

    The Helsinki Korp Europarl Bilingual Corpora

    The corpora are available in Kielipankki - the Language Bank of Finland (https://korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2015043012. The Helsinki Korp Europarl Bilingual Corpora are: The Helsinki Korp Europarl Finnish-English Corpus The Helsinki Korp Europarl Finnish-Swedish Corpus The Helsinki Korp Europarl Finnish-German Corpus The Helsinki Korp...
  • Metadata: 3/5

    Helsinki Archive of Regional English Speech – Cambridgeshire Sampler

    HARES is a collection of audio-recorded interviews that were gathered in England in the 1970s and 1980s. The fieldworkers were Finnish graduate and post-graduate students from the University of Helsinki, who shared a common interest in the study of dialect syntax. The informants were elderly persons who had lived in the region all their lives and who had...
  • Metadata: 2/5

    Finnish Telegraphese Corpus

    Computer corpus of Finnish telegraphese language (with English interlinears and translation). The Finnish Telegraphese Corpus is a product of a cross-linguistic study of telegraphic language produced by normal adult subjects (university students) to describe a set of states of affairs. The responses were gathered in written questionnaire format, and the...
  • Metadata: 2/5

    The Susanne Corpus (UHLCS)

    The resource is a sub-corpus of the English Corpus (UHLCS). License: http://urn.fi/urn:nbn:fi:lb-2016051201 (in Finnish: http://urn.fi/urn:nbn:fi:lb-2016051202). For more information see http://urn.fi/urn:nbn:fi:lb-2014032610
  • Metadata: 2/5

    The Helsinki Korp Version of the ELFA Corpus

    The corpus, which is the Korp version of the ELFA Corpus (http://urn.fi/urn:nbn:fi:lb-201403262), is available in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-2016061301 For more information see http://urn.fi/urn:nbn:fi:lb-201403262
  • Metadata: 2/5

    Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519 words, 3967 characters Khanty, Kazym dialect, 62766 words, 585659 characters Khanty, Konda dialect, 1115 words,...