This web is for holding topics deemed as old or irrelevant for KitWiki. If you think the topic doesn't belong here, please check that it's named properly (is a WikiWord) and descriptively, contains relevant data, and is put back to a relevant web.


Multilingual resource collection of UHLCS


Mrc-uhlcs is a multilingual data bank located at CSC and maintained by the University of Helsinki Department of General Linguistics. It was founded late in 1980 as University of Helsinki Language Corpus Server UHLCS.

Mrc-uhlcs's main use is for minority language research and corpus practices research. It containst most of the texts in UHLCS, even though some of these are maintained in a more up-to-date form in for example the Finnish text collection.

At present, mrc-uhlcs contains computer corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types. The use of the corpora and the software is restricted to research and teaching. There are also tools that can be used in analyzing the corpora.

Home Page:

Version and Size

Version: 2007-09-01

Size: 2500MB of text and other materials, with approximately 230 million words.

Content and Structure

Corpora of avar, azerbaijan, balkar, bashkir, chukchi, chuvash, crimean-turkish, enets, english, estonian, even, evenki, finnish, finnish, gagauz, greek, hebrew, ingrian, kalmyk, kamas, karelian, khakas, khanty, kirghiz, komi, koryak, kurdish, lak, latin, livonian, mansi, mari, mordvin, nanai, nenets, ossete, quechua-cuzco, russian, saami, selkup, somali, swahili, swedish, swedish, swedish, tabassaran, tajik, tatar, turkmen, tuvin, udmurt, uighur, ukrainian, uzbek, vepsian, yakut, and yiddish.

Directory in the Corpus Server


Directory Listing

general-linguistics general-linguistics/cushitic-lgs general-linguistics/semitic-lgs general-linguistics/uralic-lgs general-linguistics/indo-european-lgs general-linguistics/multilingual-data general-linguistics-kotus/ general-linguistics-kotus/indo-european-lgs general-linguistics-kotus/uralic-lgs language-departments/ language-departments/niger-congo-lgs language-departments/germanic-lgs language-departments/slavonic-and-baltic-lgs linguistics-permission/ linguistics-permission/uralic-lgs linguistics-permission/mongolic-lgs linguistics-permission/tungusic-lgs linguistics-permission/turkic-lgs linguistics-permission/chukotko-kamchatkan-lgs linguistics-permission/caucasian-lgs linguistics-permission/indo-european-lgs linguistics-permission/quechuan-lgs


Access Rights and Conditions

Conditions of Use.

The Group of Unix Users Having Access to the Resource: ld1-a1, ld1-c3, ld2-a2, ld3-c3, li-a2, li-adm, li-b2, li-c2, li-c3, lik-a1, lipr-a2, lipr-adm, lipr-c2, lipr-c3


Making Bibliographical Reference to the Material:

Referring to mrc-uhlcs.

Other References

Release Notes and Details

Sending Bug Reports

To be copied to:
To be seen at:
*See also other resources: in KitWiki, in
All users may add their comments to Resource__Comments

Topic revision: r2 - 2008-11-07 - HennaRiikkaLaitinen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback