Skip to content
Research data finder
FI|EN

IMPORTANT INFORMATION ABOUT ETSIN! Old Etsin (etsin.avointiede.fi) will be migrated into new Etsin (etsin.fairdata.fi) at the end of June 2019. After the migration all PUBLISHED datasets will be visible in new Etsin.
Describing the datasets in Etsin will not be possible after 12th June 2019. Instead, describing the datasets will be done in new metadata tool, Qvain, which will be launched at the begin of July 2019.
Note! Remember to publish your dataset if you want it to be migrated into new Etsin.

Search for a Dataset

5 datasets found
  • Metadata: 2/5

    ERME Erzya and Moksha Extended Corpora

    ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity...
  • Metadata: 2/5

    Helsinki Corpus TEI-XML Edition (2011), Korp

    The Helsinki Corpus TEI-XML Edition (2011) is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of...
  • Metadata: 2/5

    Helsinki Corpus of English Texts (1991)

    The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic...
  • Metadata: 2/5

    Nganasan Speech Corpus

    The corpus contains video and audio recordings from 1986-2013 of fairy tales, songs, biographies, recollections and stories, as well as discussions on everyday issues in Nganasan and their linguistic transcripts. The corpus contains also photographs. The Nganasan Speech Corpus will be made available in LAT (https://lat.csc.fi). For detailed information on...
  • Metadata: 2/5

    Helsinki Corpus of Scottish Correspondence (1540-1750)

    The Helsinki Corpus of Scottish Correspondence comprises circa 0.4 million words (0.5 million tokens) of early Scottish correspondence by male and female writers dating from the period 1540-1750. Unlike the majority of digital resources available for historical linguistics at present, the corpus consists of transcripts of original letter manuscripts,...