Skip to content
Research data finder
FI|EN

Search for a Dataset

109 datasets found
  • Metadata: 2/5

    Nenets Corpus (Tundra Nenets) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/samoyedic-lgs/nenets Contents: Fragments of the Gospel of Luke in the Nenets Language. Translation: Barmich, Mariya...
  • Metadata: 2/5

    Komi Zyrian Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/permic-lgs/komi Contents: 1. Jesus Friend of Children. ISBN 91-88394-64-6, ISBN 952-9790-13-9. Institute...
  • Metadata: 2/5

    Uzbek-English Dictionary (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/turkic-lgs/south-east-turkic-lgs/uzbek The Uzbek-English dictionary was compiled by Daniel Kimmage. Size of the dictionary: approx....
  • Metadata: 2/5

    Lists of Words Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/words (only Finnish available) The lists of words located at the University of Helsinki Language Corpus Server were generated from the corpora of the following languages:...
  • Metadata: 2/5

    North Saami Corpus (Sámikultuvradoaibmagotti smiehttamush) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/saami-lgs/north-saami/report The corpus contains a fragment of the Report of the Saami Cultural Committee...
  • Metadata: 2/5

    North Saami Corpus (Literature) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/saami-lgs/north-saami The North Saami Corpus contains Kerttu Vuolab's novel Cheppari cháráhus written in...
  • Metadata: 2/5

    Ume Saami Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/saami-lgs/ume-saami The corpus contains a morphologically analyzed document of the Ume Sami language. The...
  • Metadata: 2/5

    Lude (Ludian) Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/lude The corpus contains samples of folklore of the Lude (Ludian) dialect of Karelian....
  • Metadata: 2/5

    Finnish News Agency Archive 1992-2018, source

    The Finnish News Agency Archive corpus comprises newswire articles in Finnish sent to media outlets by the Finnish News Agency (STT) between 1992-2018. The corpus includes about 2,8 million items in total. Most of the material is news articles that vary from short “news flashes” to telegrams and longer articles. News articles are categorized by department...
  • Metadata: 2/5

    Erzya and Moksha Mordvin Word List Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/mordvin-lgs Contents: The Erzya corpus contains a historical word list of Erzya Mordvin documented in...
  • Metadata: 2/5

    Corpus of Erzya and Moksha Mordvin Literature and Journals and Komi Zyrian Li...

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: https://www.kielipankki.fi/access). Locations: - /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/mordvin-lgs -...
  • Metadata: 2/5

    Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519...
  • Metadata: 2/5

    English Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/indo-european-lgs/germanic-lgs/english The English Corpus is a part of the UHLCS corpus collection. Contents: The English Gutenberg Corpora...
  • Metadata: 2/5

    Finnish Corpus (Literature) (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/finnish Contents: HKV corpus: consists of samples of the Finnish literature representing various...
  • Metadata: 2/5

    Chuvash Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: https://www.kielipankki.fi/access/). The corpus contains the following documents: Gebräuche und Volksdichtung der Tschuwassen. Gesammelt von Heikki Paasonen, herausgeben von Eino Karahka und Matti Räsänen. Mémoires de la Société...
  • Metadata: 2/5

    Estonian Corpus 1 (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/estonian/viro1 The corpus contains excerpts from articles published in Estonian newspapers,...
  • Metadata: 2/5

    Ingrian Corpus (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/ingrian The Ingrian text corpus on Heva dialect consists of samples collected by Arvo...
  • Metadata: 2/5

    Estonian Corpus 2 (UHLCS)

    The corpus is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/kielipankki/mrc-uhlcs/general-linguistics/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/estonian/viro2 The corpus contains 20 short stories (or, in some cases, novel excerpts) published...
  • Metadata: 2/5

    The Finnish Parole Corpus

    The resource is a sub-corpus of the Finnish Corpus (Literature) (UHLCS) (http://urn.fi/urn:nbn:fi:lb-2014032622) and is available in Kielipankki - the Language Bank of Finland (taito-shell.csc.fi, access rights instructions: https://www.kielipankki.fi/support/access/). License: http://urn.fi/urn:nbn:fi:lb-2016051101 (in Finnish:...
  • Metadata: 2/5

    Corpus of Contemporary American English - Kielipankki download version 2017H1

    The corpus is available in Kielipankki - the Language Bank of Finland for download. The Corpus of Contemporary American English (COCA) contains about 440 million words and 190 000 texts from the years 1990-2012. The corpus is evenly divided into spoken, fiction, magazine, newspaper, academic genres (~88 million words each). License details: Researchers in...