Yhteystiedot

FIN-CLARIN
Nykykielten laitos
PL 24 (Unioninkatu 40)
00014 Helsingin yliopisto
p. 02941 40599 / 02941 29317
fin-clarin ( ATT ) helsinki.fi


Kielipankissa on käytössä
137 aineistoa
5 työkalua

Tulossa 90 kielivaraa

Kerro meille omasta kieliaineistostasi!




FIN-CLARIN    CSC - Tieteen tietotekniikan keskus
Kansainvälinen CLARIN-projekti

META-NORD -verkosto

FIN-CLARIN Site Description: The Research Institute for the Languages of Finland

  • Contact persons: Pirkko Nuolijärvi, Elisa Stenvall and Toni Suutari, firstname.lastname[a]kotus.fi, http://www.kotus.fi

The main resources and activities are listed with

  • an informative name,
  • a short description of what the material is,
  • the size of the resource (in recorded hours, word tokens or other measures, source program lines),
  • contact person, contact information,
  • a link to further information about the resource.

Text corpora

Contact persons: Mikko Lounela and Toni Suutari
  1. Corpus of Old Literary Finnish
  2. Corpus of Early Literary Finnish
  3. Corpus of Finnish Literary Classics
  4. Corpus of Magazines and Periodicals
    • period: 20th century
    • about 8,6 million word tokens in total
    • Mikko Lounela, firstname.lastname[a]kotus.fi
    • requires user authorisation
    • http://www.kotus.fi/index.phtml?s=222 (in Finnish)
  5. Corpus of the Finnish Language = Finnish Text Collection (CSC, Language Bank)
    • This corpus contains written Finnish from 1990s. The collection has been gathered by the Research Institute for the Languages in Finland, the Department of General Linguistics of the University of Helsinki and the Foreign Languages Department of the University of Joensuu. Web user interfaces available at the Scientist's Interface (CSC). Access also via Unix server (corpus.csc.fi).
    • about 180 million word tokens in total
    • Mikko Lounela, firstname.lastname[a]kotus.fi
    • requires user authorisation, access via Language Bank (CSC)
    • http://www.csc.fi/english/research/software/ftc
    • https://hotpage.csc.fi/ (Scientist's Interface)
  6. Finland Swedish Text Corpus = Finnish-Swedish Textcollection (CSC, Language Bank)
  7. Swedish-Finnish Parallel Text Corpus (CSC, Language Bank)
  8. Syntax Archive Data (= Lauseopin arkisto)
    • The data is owned by the Research Institute for the Languages in Finland and the Department of Finnish and Generel Linguistics at the University of Turku. The Syntax Archive Data contains dialects from 132 Finnish parishes (one hour from each parish) and literary Finnish (40 units).
    • about 1 million word tokens in total
    • Mikko Lounela and Toni Suutari, firstname.lastname[a]kotus.fi
    • http://www.hum.utu.fi/oppiaineet/suomi/arkistot/lauseopin_arkisto.html (in Finnish)
  9. Oulu Corpus (CSC, Language Bank)
    • The corpus is a representative sample of the Finnish language in the 1960s media.
    • 5 800 short samples, 429 058 word tokens and some 29 000 sentences
    • Mikko Lounela, firstname.lastname[a]kotus.fi
    • requires user authorisation, access via Language Bank (CSC)
    • http://www.csc.fi/english/research/software/oulu
  10. Corpus of Proverbs and Other Colloquial Expressions
  11. Text from the Samples of Finnish Dialects Collection
  12. New Year Speechs of the President of the Republic of Finland

Frequency lists

Contact person: Mikko Lounela
  1. Frequency list: Old Literary Finnish
  2. Frequency list: Early Modern Finnish
  3. Parole frequency list

Digital audio and video

Contact person: Toni Suutari
  1. Audio Recordings Archive
    • The Audio Recordings Archive holds over 23,000 hours of recordings collected since 1959, providing authentic samples of Finnish dialects, languages related to Finnish, and other world languages. The collection additionally includes samples of Finnish dialects spoken in Sweden, Norway, Ingria, the United States and Australia. Digitisation of the audio bank was undertaken in 1999. Over half of its content has been digitised, totalling about 13,000 hours of recordings.
    • about 13,000 hours digitised audio recordings and 150 hours digitised video recordings (+ key word lists, contents lists, transcripts)
    • Toni Suutari, firstname.lastname[a]kotus.fi
    • more information: http://www.kotus.fi/?l=en&s=193
    • collection database: http://kaino.kotus.fi/naark (in Finnish)

Electronic dictionaries

Contact person: Toni Suutari
  1. Álgu – Origins of Saami Words
    • The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.
    • about 86 000 words, 180 000 relations
    • Klaas Ruppel, firstname.lastname[a]kotus.fi
    • freely accessible on-line data service, Kaino
    • more information: http://www.kotus.fi/index.phtml?l=en&s=143
    • database: http://kaino.kotus.fi/algu/
  2. Origin of the Finnish words (= Suomen sanojen alkuperä)
    • Klaas Ruppel, firstname.lastname[a]kotus.fi
  3. Dictionary of Finnish Dialects (= Suomen murteiden sanakirja)
    • Ulla Takala and Outi Lehtinen, firstname.lastname[a]kotus.fi
  4. Dictionary of Old Literary Finnish (= Vanhan kirjasuomen sanakirja)
  5. Dictionary of Carelian (= Karjalan kielen sanakirja)
    • Marja Torikka and Jari Vihtari, firstname.lastname[a]kotus.fi
  6. Dictionary of Finno-Swedish dialects (= Ordbok över Finlands svenska folkmål)
    • Peter Slotte, firstname.lastname[a]focis.fi

Electronic lexicons

Contact person: Toni Suutari
  1. Modern Finnish Lexicon
  2. Headwords in the Dictionary of Modern Finnish (= Nykysuomen sanakirja 1–6, 1951–1961)
    • about 210 000 headwords
    • Toni Suutari, firstname.lastname[a]kotus.fi
    • requires user authorisation, access via Unix server (suomi.kotus.fi)
  3. Digital Listing of Headwors in the Dictionary of Carelian (= Karjalan kielen sanakirja 1–6, 1968–2005)
    • 94 534 headwords
    • Jari Vihtari and Marja Torikka, firstname.lastname[a]kotus.fi
    • freely accessible on-line data service, Kaino
    • http://kaino.kotus.fi/sanat/kkss/ (in Finnish)
  4. Electronic Vepsian Word List
  5. Lexicon of the Finno-Swedish place name endigs (= Namnledslexicon)

Other databases and lists

Contact person: Elisa Stenvall and Toni Suutari
  1. Lexical Data from the Archive of Modern Finnish
    • Elisa Stenvall, firstname.lastname[a]kotus.fi
    • A detailed account of the intended use should be given when applying for user authorisation.
  2. Toponymic Database
  3. Johan Habermans land survey register of Pien-Savo district from 1620s (= Ed. Timo Alanen 2004: Johan Habermanin maantarkastusluettelo Pien-Savosta 1620-luvulta)
  4. Land survey register of Sääminki and Rantasalmi parishes from years 1562–1563 (= Ed. Timo Alanen [2006]: Säämingin ja Rantasalmen maantarkastusluettelo vuosilta 1562–1563)
  5. Geographic Names Register of the National Land Survey
    • 720 000 names are Finnish, 75 000 Swedish, 4 500 North Saami, 3 800 Inari Saami and 150 Skolt Saami
    • Toni Suutari, firstname.lastname[a]kotus.fi
    • The register may be accessed for research purposes at the Research Institute.
    • http://www.maanmittauslaitos.fi/en/default.asp?id=829#a1
  6. Population Register Centre’s register of personal names
    • Toni Suutari, firstname.lastname[a]kotus.fi
    • The register is available only for scientific research purposes. A detailed account of the intended use should be given when applying for user authorisation.

Reference databases

Contact person: Toni Suutari
  1. Etymological Reference Database
  2. AV collection database

Maps and pictures

Contact person: Terhi Ainiala
  1. Atlas of Place Names

...


For the purpose of transferring resource data from the KitWiki to the CLARIN ad hoc inventory, the data is gathered here in TWiki forms. Create a new topic with the following tools, and fill in the data. You may add extra information on the page, but only the data in the form will be transferred to the ad hoc inventory.

ALERT!: Some of the forms contain date or year fields. Take care to input these in the correct format (2008-12-01 or 1 Dec 2008) or use the calendar next to the field. For years, give them with four digits, i.e. 2008.

Add a Multimodal Corpus, Spoken Corpus, Written Corpus, Aligned Corpus, Treebank, or N-gram Model:

Add a Terminological Resource, Lexicon / Knowledge Source:

Add a Web Service:

Add a Grammar or any other resource:

The following resources have been added on this site: