Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

199 datasets found
More categories…
  • Metadata: 2/5

    Samples of Northern Saami

    The corpus contains audio samples of spoken Northern Saami dialects (Sea Saami, Finnmark Saami and Torne Saami). It is available in LAT (https://lat.csc.fi/). Each audio file contains one interview. The material has been morphologically glossed and the transcripts have been translated into Finnish and English. log 26.11.2018 link...
  • Metadata: 2/5

    Helsinki Corpus TEI-XML Edition (2011), Korp

    Information on the corpus: http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/HC_XML.html The corpus will be made available at https://korp.csc.fi/ For detailed information on the license of the resource see http://urn.fi/urn:nbn:fi:lb-2019061301
  • Metadata: 2/5

    Helsinki Corpus of English Texts (1991)

    The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic...
  • Metadata: 4/5

    Product, Manufacturing Resource and Capability Ontologies

    OWL-based information models (ontologies) for representing process taxonomy, product model, manufacturing resources and their capabilities.
  • Metadata: 2/5

    Uzbek-English Dictionary (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/turkic-lgs/south-east-turkic-lgs/uzbek The Uzbek-English dictionary was compiled by Daniel Kimmage. Size of the dictionary: approx....
  • Metadata: 2/5

    Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519...
  • Metadata: 2/5

    English Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/indo-european-lgs/germanic-lgs/english The English Corpus is a part of the UHLCS corpus collection. Contents: The English Gutenberg Corpora...
  • Metadata: 2/5

    Chuvash Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: https://www.kielipankki.fi/access/). The corpus contains the following documents: Gebräuche und Volksdichtung der Tschuwassen. Gesammelt von Heikki Paasonen, herausgeben von Eino Karahka und Matti Räsänen. Mémoires de la Société...
  • Metadata: 4/5

    Wind data from South-Karelia

    Wind data was measured in South Karelia in two locations, Joutseno and Puumala. The measurements were started during the project Development of wind power knowledge and utilization of wind power potential in South Karelia (Tuulivoimaosaamisen kehittäminen ja tuulivoimapotentiaalin hyödyntäminen Etelä-Karjalassa) by LUT University. The measurements were...
  • Metadata: 4/5

    Aššur and His Friends: A Statistical Analysis of Neo-Assyrian Texts

    This is the data used for and generated during our research for the article "Aššur and His Friends: A Statistical Analysis of Neo-Assyrian Texts", published in Journal of Cuneiform Studies 71 (2019). Our data comes from the Open Richly Annotated Cuneiform Corpus (http://oracc.museum.upenn.edu/). Our research and the creation of this dataset were...
  • Metadata: 2/5

    Corpus of Contemporary American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each). License details: Researchers in...
  • Metadata: 2/5

    Corpus of Historical American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. License details: Researchers in the...
  • Metadata: 2/5

    Corpus of Global Web-Based English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words and 1 800 000 texts from web pages in United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs. License details: Researchers in the...
  • Metadata: 3/5

    Data for Äijälä et al., ACP 2019: Constructing a data-driven receptor model f...

    Contains aerosol chemical composition results from the r-CMB receptor model, for SMEAR II station 2008-2011.
  • Metadata: 3/5

    University of Oulu Kikosa Collection

    The Kikosa Collection consists of video recorded everyday interactions among multicultural families and groups of friends. The collection is housed at the University of Oulu Department of Languages and Literature and it can be used for studies of language and interaction.
  • Metadata: 2/5

    The "Hallituskausi 2011–2015" Translation Memory

    The "Hallituskausi 2011–2015" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites during the ongoing electoral period. The memory features some 11,000 Finnish-to-English translation segments. The translation memory runs...
  • Metadata: 2/5

    The "Hallituskausi 2007–2011" Translation Memory

    The "Hallituskausi 2007–2011" translation memory is intended for those translating administrative texts between Finnish and English. It includes key policy reports published by the Finnish ministries on their websites. The memory features some 58,000 Finnish-to-English translation segments. The tmx format requires a SDL Trados Studio programme, whereas...
  • Metadata: 2/5

    Corpus of Age-related Voice Disguise

    This corpus includes normal and age-related disguised speech uttered by 60 native Finnish speakers (31 females and 29 males). The speakers were asked to read the same text fragments several times, in their modal voice and in two disguised voices, first pretending to be an elderly speaker and then pretending to be a child. The texts consisted of the...
  • Metadata: 3/5

    PGC 30591 MUSE reduced data-cube and stellar and gas kinematics

    This dataset is a tar file that contains the reduced MUSE datacube of PGC 30591. The reduction was made using the MUSE pipeline under the Reflex environment. The individual exposures were manually aligned and the sky residuals were treated with ZAP. The data set also includes the data required to reproduce the stellar and gas kinematic maps as obtained...
  • Metadata: 3/5

    PGC 28308 MUSE reduced data-cube and stellar and gas kinematics

    This dataset is a tar file that contains the reduced MUSE datacube of PGC 28308. The reduction was made using the MUSE pipeline under the Reflex environment. The individual exposures were manually aligned and the sky residuals were treated with ZAP. The data set also includes the data required to reproduce the stellar and gas kinematic maps as obtained...