Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

201 datasets found
More categories…
  • Metadata: 2/5

    Helsinki Corpus TEI-XML Edition (2011), Korp

    The Helsinki Corpus TEI-XML Edition (2011) is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of...
  • Metadata: 2/5

    Parsed Corpus of Early English Correspondence

    The Parsed Corpus of Early English Correspondence contains 4970 personal letters by 666 writers, altogether 2.2 million words of running text from the years 1410-1681. The letters have been selected to be as socially representative of the literate social ranks of the time as possible. In addition to the flat text version, the corpus has also been provided...
  • Metadata: 2/5

    Opusparcus: Open Subtitles Paraphrase Corpus for Six Languages (version 1.0)

    Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows. The data in Opusparcus has been extracted from OpenSubtitles2016 (http://opus.nlpl.eu/OpenSubtitles2016.php), which is in...
  • Metadata: 2/5

    A Multimodal Corpus of Tourist Brochures Produced by the City of Helsinki, Fi...

    The corpus is available in in Kielipankki - the Language Bank of Finland (ling.helsinki.fi), download location: http://urn.fi/urn:nbn:fi:lb-2015030301 This multimodal corpus, which consists of the tourist brochures produced by the city of Helsinki, Finland, is fully annotated using XML schema provided for the Genre and Multimodality (GeM) model. The GeM...
  • Metadata: 2/5

    ERME Erzya and Moksha Extended Corpora

    ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity...
  • Metadata: 2/5

    Samples of Northern Saami

    The corpus contains audio samples of spoken Northern Saami dialects (Sea Saami, Finnmark Saami and Torne Saami). It is available in LAT (https://lat.csc.fi/). Each audio file contains one interview. The material has been morphologically glossed and the transcripts have been translated into Finnish and English. log 26.11.2018 link...
  • Metadata: 2/5

    Helsinki Corpus of English Texts (1991)

    The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic...
  • Metadata: 4/5

    Product, Manufacturing Resource and Capability Ontologies

    OWL-based information models (ontologies) for representing process taxonomy, product model, manufacturing resources and their capabilities.
  • Metadata: 2/5

    Uzbek-English Dictionary (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/turkic-lgs/south-east-turkic-lgs/uzbek The Uzbek-English dictionary was compiled by Daniel Kimmage. Size of the dictionary: approx....
  • Metadata: 2/5

    Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519...
  • Metadata: 2/5

    English Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/indo-european-lgs/germanic-lgs/english The English Corpus is a part of the UHLCS corpus collection. Contents: The English Gutenberg Corpora...
  • Metadata: 2/5

    Chuvash Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: https://www.kielipankki.fi/access/). The corpus contains the following documents: Gebräuche und Volksdichtung der Tschuwassen. Gesammelt von Heikki Paasonen, herausgeben von Eino Karahka und Matti Räsänen. Mémoires de la Société...
  • Metadata: 4/5

    Wind data from South-Karelia

    Wind data was measured in South Karelia in two locations, Joutseno and Puumala. The measurements were started during the project Development of wind power knowledge and utilization of wind power potential in South Karelia (Tuulivoimaosaamisen kehittäminen ja tuulivoimapotentiaalin hyödyntäminen Etelä-Karjalassa) by LUT University. The measurements were...
  • Metadata: 4/5

    Aššur and His Friends: A Statistical Analysis of Neo-Assyrian Texts

    This is the data used for and generated during our research for the article "Aššur and His Friends: A Statistical Analysis of Neo-Assyrian Texts", published in Journal of Cuneiform Studies 71 (2019). Our data comes from the Open Richly Annotated Cuneiform Corpus (http://oracc.museum.upenn.edu/). Our research and the creation of this dataset were...
  • Metadata: 2/5

    Corpus of Contemporary American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each). License details: Researchers in...
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License details: Researchers in the...
  • Metadata: 2/5

    Corpus of Global Web-Based English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words and 1 800 000 texts from web pages in United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs. License details: Researchers in the...
  • Metadata: 3/5

    Data for Äijälä et al., ACP 2019: Constructing a data-driven receptor model f...

    Contains aerosol chemical composition results from the r-CMB receptor model, for SMEAR II station 2008-2011.
  • Metadata: 3/5

    University of Oulu Kikosa Collection

    The Kikosa Collection consists of video recorded everyday interactions among multicultural families and groups of friends. The collection is housed at the University of Oulu Department of Languages and Literature and it can be used for studies of language and interaction.
  • Metadata: 2/5

    The "Hallituskausi 2011–2015" Translation Memory

    The "Hallituskausi 2011–2015" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites during the ongoing electoral period. The memory features some 11,000 Finnish-to-English translation segments. The translation memory runs...