Digitaalisten ihmistieteiden osasto
PL 24 (Unioninkatu 40)
00014 Helsingin yliopisto

fin-clarin ( ATT ) helsinki.fi

Kielipankissa on käytössä
138 aineistoa
13 työkalua

Tulossa 91 kielivaraa

Kerro meille omasta kieliaineistostasi!

FIN-CLARIN    CSC - Tieteen tietotekniikan keskus
Kansainvälinen CLARIN-projekti

FIN-CLARIN Site Description: University of Oulu

Departments and other parties involved in collecting, producing or using language resources. For each, the main resources and activities are listed with

  • an informative name,
  • a short description of what the material is,
  • the size of the resource (in recorded hours, word tokens or other measures, source program lines)
  • how much labour (in person months) has been invested at the site for collecting, producing or improving the resource,
  • contact person, contact information
  • a link to further information about the resource (please, enter more detailed entries in AddMaterialEn).

Finnish language (Faculty of Humanities)

1. The Audio Recordings Archive of Oulu

  • The Audio Recordings Archive of Oulu stores analogical and digital recordings. The recordings are samples of Finnish dialects, cultural history, modern colloquial language, child language, Finnic minority languages and Saami languages.
  • Total amount of recordings is 7000 hours of which approximately 5000 hours is unique material. The oldest recordings are from early 1960s. Copies are from Research Centre for Languages in Finland. All of the analogical recordings have been digitised.
  • The Audio Recordings Archive of Oulu (Oulun nauhoitearkisto = ONA) is founded in 1967. Recordings are partly made by studentes and partly by staff of Finnish Language.

  • Harri Mantila, harri.mantila(at)oulu.fi
  • (please, enter more detailed entries in MaterialEn).

Finnish language (Faculty of Humanities)

1. ICLFI - International Corpus of Learner FInnish

  • The International Corpus of Learner Finnish (ICLFI) is being compiled at the University of Oulu since 2007. The data is compiled with the help of the foreign universities in which Finnish is studied as a foreign language. The corpus consists of texts that language learners have spontaneously produced in language learning situations. The corpus helps to clarify the specific characteristics of learner language and also supports material production for learning, such as dictionaries and text books.
  • The size of the data is c. 1 million words (11/2011).

  • Jarmo Jantunen jarmo.jantunen(at)oulu.fi
  • (please, enter more detailed entries in MaterialEn).


For the purpose of transferring resource data from the KitWiki to the CLARIN ad hoc inventory, the data is gathered here in TWiki forms. Create a new topic with the following tools, and fill in the data. You may add extra information on the page, but only the data in the form will be transferred to the ad hoc inventory.

ALERT!: Some of the forms contain date or year fields. Take care to input these in the correct format (2008-12-01 or 1 Dec 2008) or use the calendar next to the field. For years, give them with four digits, i.e. 2008.

Add a Multimodal Corpus, Spoken Corpus, Written Corpus, Aligned Corpus, Treebank, or N-gram Model:

Add a Terminological Resource, Lexicon / Knowledge Source:

Add a Web Service:

Add a Grammar or any other resource:

The following resources have been added on this site: