Digitaalisten ihmistieteiden osasto
PL 24 (Unioninkatu 40)
00014 Helsingin yliopisto

fin-clarin ( ATT ) helsinki.fi

Kielipankissa on käytössä
138 aineistoa
13 työkalua

Tulossa 91 kielivaraa

Kerro meille omasta kieliaineistostasi!

FIN-CLARIN    CSC - Tieteen tietotekniikan keskus
Kansainvälinen CLARIN-projekti

FIN-CLARIN Site Description: University of Jyväskylä

Tentative version - significant items may be missing - please report corrections to Ari Huhta

Departments and other parties involved in collecting, producing or using language resources. For each, the main resources and activities are listed with

  • an informative name,
  • a short description of what the material is,
  • the size of the resource (in recorded hours, word tokens or other measures, source program lines)
  • how much labour (in person months) has been invested at the site for collecting, producing or improving the resource,
  • contact person, contact information
  • a link to further information about the resource (please, enter more detailed entries in MaterialEn).

Centre for Applied Language Studies (Faculty of Humanities)

  1. The National Certificates corpus
    • The NC test results, background information, speaking and writing performances in 9 lforeign / second languages. A web-based data base (htlm files).
    • background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances
    • Tiina Lammervo tiina.lammervo(at)jyu.fi
    • http://yki-korpus.jyu.fi/

Department of Languages (Faculty of Humanities)

  1. VARIENG survey corpus
    • A national survey on Finns’ uses of and attitudes to English by the Jyväskylä unit of VARIENG (the Centre of Excellence for the Study of Variation, Contacts and Change in English). The survey was carried out in cooperation with Statistics Finland.
    • 1495 respondents, 15-74-year-olds
    • Sirpa Leppänen, sirpa.leppanen(at)jyu.fi
    • https://www.jyu.fi/hum/laitokset/kielet/varieng/en/survey/
  2. CEFLING project corpus
    • Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 - 9) in the project CEFLING - Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information, self-assessments of proficiency
    • several hundred learners
    • Maisa Martin, maisa.martin(at)jyu.fi
    • http://www.jyu.fi/cefling/
  3. Northern multilingualism
    • digitized interviews (children, adults; in pairs or in groups), written narratives, drawings and pictures from children
    • over 30 hours of speech; 15 narratives
    • Sari Pietikäinen sari.pietikainen(at)campus.jyu.fi
    • http://www.northernmultilingualism.fi/
  4. CLIL (content-and-language-integrated learning) corpus
    • videotaped lessons (history, religion, chemistry, physics) conducted in English in lower secondary schools; six biology lessons in Finnish; in .avi and .mpg2 formats
    • 30 hours
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  5. EFL (English as a Foreign Language) corpus
    • video taped English lessons from lower secondary and upper secondary (gymnasium) schools; in .avi and .mpg2 formats
    • 24 hours
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  6. Talk show corpus
    • recordings of talk shows (Yölento, HardTalk, Newsnight)
    • 9 hours
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  7. Reality tv corpus
    • recordings of two weeks of Big Brother 2006; in .vow format
    • two weeks' programmes
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  8. Weblog corpus
    • weblogs written by Finns; as HT track files
    • about 300 blogs
    • Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  9. Fan fiction corpus
    • fan fiction texts written by Finns
    • 700 texts
    • Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  10. IRC corpus
    • IRC discussion data written by Finns, in several channels; as HT track files
    • 110 hours (actual discussions comprise only part of these)
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  11. Gaming corpus
    • video taped PC and game console game sessions by young Finnish boys, speaking Finnish and English; in .avi and .mpg2 format
    • 17 hours
    • Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi
  12. Corpora of spoken Finnish
    • several corpora of spoken Finnish in open reel tape or C-cassette format (Finnish dialects, American Finnish, modern spoken Finnish from 1970s to 1990s)
    • from about 12 hours (American Finnish) to about 1 200 hours (Finnish dialects)
    • Maisa Martin maisa.martin[at)jyu.fi
  13. Corpora of Finno-Ugric languages
    • three corpora of Finno-Ugric languages (Karelian, Sami, Joenperän vatja) in open reel format
    • 24 hours, 12 hours, 30 minutes, respectively
    • Jouko Koivisto koivisto(at)campus.jyu.fi
  14. Corpus of Middle French
    • a digitized corpus for the study of the lexis and syntax of Middle French (1300s and 1400s) and for text editions
    • 29 texts; about 1 000 000 words
    • Terho Joutsen Terho.Joutsen(at)jyu.fi; also available via http://www.csc.fi/
  15. Corpus of spoken modern French
    • corpus of spoken modern French; transcriptions included
    • 20 hours
    • Terho Joutsen Terho.Joutsen(at)jyu.fi
  16. Intas corpus
  17. FinSveStud 79-80 (Studentsvenska 79-80) corpus
    • Swedish language essays / compositions written by Finnish-speaking students taking the Matriculation examination in 1979-80; tagged in a number of ways
    • 799 compositions, 120 000 words
    • Matti Rahkonen mrahkone(at)gmail.com
  18. FinStud86 corpus
    • Finnish language essays / compositions written by Finnish-speaking students taking the Matriculation examination in 1986
    • 210 compositions, 100 000 words
    • Matti Rahkonen mrahkone(at)gmail.com
  19. Longi corpus
    • a longitudinal corpus Swedish language compositions written by Finnish-speaking upper secondary school (gymnasium) students in 1991-93; parts of speech tagged
    • 100 students; 8 compositions from each, a total of 150 000 words
    • Matti Rahkonen mrahkone(at)gmail.com
  20. Corpora of Swedish language textbooks
    • three corpora of popular Swedish language textbooks (Toppen, Nya vindar (1980s), and Medvind (1991-93)); morphological and syntactic tagging
    • three textbooks
    • Matti Rahkonen mrahkone(at)gmail.com
  21. FinDE corpus

Department of Music (Faculty of Humanities)

  1. Digital archive of Finnish Folk Tunes
    • Digitalized versions of Finnish folk tunes and their relevant details (notation, key, meter, place of collection, lyrics, collector)
    • 8613 Finnish folk tunes (including part of the lyrics)
    • Petri Toiviainen ptoiviai(at)campus.jyu.fi
    • http://esavelmat.jyu.fi//

Department of Psychology (Faculty of Social Sciences)

  1. From a child to an adult: the interview of the 14-year-olds in 1974
  2. From a child to an adult: the interview of the teachers of the 14-year-olds in 1974
  3. From a child to an adult: the interview of the 20-year-olds in 1980
  4. From a child to an adult: the interview of the 27-year-olds in 1986
  5. Human Development and Its Risk Factors
    • Corpus gathered in the Jyväskylä Longitudinal Study of Dyslexia. Large database of video and audiotaped sessions (tests, reading tasks) with 200 dyslexic and non-dyslexic children; tested twice a year over several years; most of the information converted into SPSS files
    • about 2 000 tapes
    • Kenneth.Eklund(at)psyka.jyu.fi
    • http://www.jyu.fi/humander/dyslexia.shtml

Department of Educational Sciences (Faculty of Education)

  1. Argumentation in studying problem-solving skills in social work education in Finnish Polytechnics
    • Essays and discussions by students in Polytechnics and comprehensive schools; video and audio recordings; text files
    • 75 essays, 216 online discussion turns, 260 minutes of video and audio recordings; 2-3 months
    • Kati Vapalahti kati.vapalahti(at)mamk.fi
    • http://www.jyu.fi/coalition/
  2. Collaborative writing
    • Corpus of group discussions in Finnish by university students when writing a text as a group; recordings and transcriptions; text files
    • 188 pages; 8 177 turns of speech; two weeks
    • miika.marttunen(at)edu.jyu.fi, minna.pulkkinen(at)edu.jyu.fi
    • http://www.jyu.fi/coalition/
  3. Argumentation and argument visualisation in promoting strategic reading and decision-making

For the purpose of transferring resource data from the KitWiki to the CLARIN ad hoc inventory, the data is gathered here in TWiki forms. Create a new topic with the following tools, and fill in the data. You may add extra information on the page, but only the data in the form will be transferred to the ad hoc inventory.

ALERT!: Some of the forms contain date or year fields. Take care to input these in the correct format (2008-12-01 or 1 Dec 2008) or use the calendar next to the field. For years, give them with four digits, i.e. 2008.

Add a Multimodal Corpus, Spoken Corpus, Written Corpus, Aligned Corpus, Treebank, or N-gram Model:

Add a Terminological Resource, Lexicon / Knowledge Source:

Add a Web Service:

Add a Grammar or any other resource:

The following resources have been added on this site: