Yhteystiedot

FIN-CLARIN
Digitaalisten ihmistieteiden osasto
PL 24 (Unioninkatu 40)
00014 Helsingin yliopisto

fin-clarin ( ATT ) helsinki.fi


Kielipankissa on käytössä
138 aineistoa
13 työkalua

Tulossa 91 kielivaraa

Kerro meille omasta kieliaineistostasi!




FIN-CLARIN    CSC - Tieteen tietotekniikan keskus
Kansainvälinen CLARIN-projekti



FIN-CLARIN Site Description: University of Eastern Finland

Tentative version - significant items may be missing - please report corrections to JussiNiemi

Departments and other parties involved in collecting, producing or using language resources. For each, the main resources and activities are listed with

  • an informative name,
  • a short description of what the material is,
  • the size of the resource (in recorded hours, word tokens or other measures, source program lines)
  • how much labour (in person months) has been invested at the site for collecting, producing or improving the resource,
  • contact person, contact information
  • a link to further information about the resource (please, enter more detailed entries in AddMaterialEn).

Department of Linguistics (Philosophical Faculty)

Contact persons: Jussi Niemi (jussi.niemi (ät) uef.fi) and Stefan Werner (stefan.werner (ät) uef.fi
  1. The Karjalainen Corpus
    • computer corpus of Finnish newspaper texts of the 1990s (newspaper Karjalainen, Joensuu)
    • about 35.8 million word tokens; about 24 person months
    • availability through the Language Bank (of Finland) at http://www.csc.fi/english (SGML transformation carried out by the Department of General Linguistics, University of Helsinki)
    • Relevant publications based on the corpus: Used as basis of frequency counts for psycholinguistic studies of Finnish morphology/lexicon by Jussi Niemi and Matti Laine (Psychology, Åbo Akademi University) and their associates
    • Contact information at Joensuu: jussi.niemi (ät) uef.fi; contact information at the Language Bank: ling (ät) csc.fi
  2. Joensuu Corpus of Finnish Compounds
    • computer corpus (full list) of compounds of CD-perussanakirja (electronic version of Suomen kielen perussanakirja, the most comprehensive dictionary of contemporary Finnish, see http://www2.lingsoft.fi/cdps/), with morphological category information
    • about 52000 word tokens in total, about 2 person months
    • Relevant publication(s) using the corpus: J. Niemi: Compounds in Finnish. Lingue e Linguaggio 8: 237-256. Part of cross-linguistic study of compounds, co-ordinated by Sergio Scalise (Linguistics, U. Bologna), see http://morbo.lingue.unibo.it/mmm/enlm.php
    • Contact person: jussi.niemi (ät) uef.fi
  3. Joensuu Language Acquisition Corpus
    • computer corpus of spoken language output of a child acquiring Finnish (age 2;4 to 6;7)
    • about 6000 word tokens in total, about 6 person months
    • Relevant publication(s) using the corpus: Niemi, Jussi & Sinikka Niemi: Acquisition of inflectional marking: A case study of Finnish. Nordic Journal of Linguistics 10: 59-89 (1987). * Contact person: jussi.niemi (ät) uef.fi
  4. Corpus of Spoken Southwestern Finnish
    • audio corpus of spoken Finnish across the traditional Tavastia - Southwest dialect boundary, speakers: over 300 schoolchildren in 12 communities, recorded 1978 and 2006 by Sinikka Niemi and Jussi Niemi
    • about 10 hrs. of audio recordings (one structured text transformed to PRAAT format for acoustic analysis)
    • Relevant publication(s) using the corpus: Niemi, Jussi & Sinikka Niemi: Word tone and related matters in the Finnish Southwest. In: C.- Ch. Elert, I. Johansson & E. Strangert (eds.): Nordic Prosody III, pp. 187-200. Umeå 1984.
    • Contact person: jussi.niemi (ät) uef.fi
  5. Joensuu Wernicke Aphasia Corpora
    • computer corpora of semi-spontaneous speech of two Finnish Wernicke aphasics (one's transcriptions with English morphological interlinears and translations)
    • about 20000 word tokens in total, about 14 person months
    • Relevant publication(s) using the corpus: Niemi, Jussi & Matti Laine: Syntax and Inflectional Morphology in Aphasia: Quantitative Aspects of Wernicke Speakers' Narratives. Journal of Quantitative Linguistics 4: 181- 189 (1997).
    • Contact person: jussi.niemi (ät) uef.fi
  6. Joensuu Agrammatic Aphasia Corpus
    • computer corpora of semi-spontaneous speech of two Finnish agrammatic (Broca) aphasics (with English morphological interlinears and translations)
    • about 2000 word tokens in total, about 14 person months
    • Relevant publication(s) using the corpus: Niemi, Jussi, Matti Laine, Ritva Hänninen & Päivi Koivuselkä- Sallinen: Agrammatism in Finnish: Two Case Studies. In: L. Menn & L. K. Obler (eds.): Agrammatic Aphasia: A Cross-Language Narrative Sourcebook. Pp. 1013 - 1085. Benjamins, Amsterdam 1990. Supplement to Chapter 14 - Finnish-Language Materials: Control Subjects, pp. 1775-1818.
    • Contact person: jussi.niemi (ät) uef.fi
  7. Finnish Telegraphese Corpus
    • computer corpus of Finnish telegraphese language (with English interlinears and translation)
    • about 3000 word tokens in total, about 4 person months
    • Relevant publication(s) using the corpus: Tesak, Jürgen, Elisabeth Ahlsén, Gábor Györi, Päivi Koivuselkä-Sallinen, Jussi Niemi & Livia Tonelli: Patterns of ellipsis in telegraphese: A study of six languages. Folia Linguistica 24: 297-316 (1995); Tesak, Jürgen & Jussi Niemi: Telegraphese and agrammatism: A cross-linguistic study. Aphasiology 11: 145-155 (1997).
    • Contact person: jussi.niemi (ät) uef.fi
  8. Swedish Telegraphese Corpus
    • computer corpus of Swedish telegraphese language (with English interlinears and translation), compiled by Elisabeth Ahlsén (Linguistics, U. Göteborg), and analyzed (tagged & translated) and finalized by Jussi Niemi
    • about 5000 word tokens in total, about 5 person months
    • Relevant publication(s) using the corpus: Tesak, Jürgen, Elisabeth Ahlsén, Gábor Györi, Päivi Koivuselkä-Sallinen, Jussi Niemi & Livia Tonelli: Patterns of ellipsis in telegraphese: A study of six languages. Folia Linguistica 24: 297-316 (1995)
    • Contact person: jussi.niemi (ät) uef.fi

Department of Swedish (Philosophical Faculty)

Contact person: Sinikka Niemi (sinikka.niemi (ät) uef.fi)
  1. Joensuu Corpus of Swedish Compounds
    • computer corpus (list) of Swedish compounds in Göteborgs-Posten (a Swedish newspaper) data-base of 24.2 million word tokens originally collected by Elisabeth Ahlsén (Linguistics, Göteborg University) and eventually morphologically tagged by Matti Laine’s and Patrick Virtanen’s WordMill Lexical Search program (Center for Cognitive Neuroscience, U. Turku)
    • about 3800 compound tokens, with their WordMill variables (incl. frequency of use in the Göteborgs-Posten), about 3 person months
    • Relevant publication(s) using the corpus: S. Niemi: Compounds in Swedish. Lingue e Linguaggio 8: 257-269. Part of cross-linguistic study of compounds, co-ordinated by Sergio Scalise (Linguistics, U. Bologna), see http://morbo.lingue.unibo.it/mmm/enlm.php
    • Contact person: sinikka.niemi (ät) uef.fi

Department of Translation Studies (Faculty of Humanities)

Contact person: Jukka Mäkisalo (jukka.makisalo (at) uef.fi)
  1. The Finnish Broadcasting Company Corpus of Subtitles
    • Digital research material of translated subtitles compiled by Jukka Mäkisalo and Sonja Tirkkonen-Condit, 2005
    • Size: ca. 100 million word tokens
    • Languages: mostly Finnish, Finland-Swedish and Saame
    • Contact person: jukka.makisalo (at) uef.fi

... ...


For the purpose of transferring resource data from the KitWiki to the CLARIN ad hoc inventory, the data is gathered here in TWiki forms. Create a new topic with the following tools, and fill in the data. You may add extra information on the page, but only the data in the form will be transferred to the ad hoc inventory.

ALERT!: Some of the forms contain date or year fields. Take care to input these in the correct format (2008-12-01 or 1 Dec 2008) or use the calendar next to the field. For years, give them with four digits, i.e. 2008.

Add a Multimodal Corpus, Spoken Corpus, Written Corpus, Aligned Corpus, Treebank, or N-gram Model:

Add a Terminological Resource, Lexicon / Knowledge Source:

Add a Web Service:

Add a Grammar or any other resource:

The following resources have been added on this site: