title,field_resource_type,field_languages,field_languages_other,field_description,field_country,field_institute,field_creator,field_year,field_end_creation_date,field_format,field_metadata_link,field_publications,field_reference_link,field_ethical_reference,field_legal_reference,field_license_type,field_description_0,field_contact_person,field_longterm_preservation,field_working_languages,field_location_0,field_content_type,field_format_detailed,field_quality,field_applications,field_project,field_size,field_distribution_form,field_access,field_source_0,field_date_0,field_type,field_format_detailed_1,field_schema_reference,field_size_0,field_working_languages_0,field_access_1,field_date_2,field_location_webservice,field_interface_specification,field_input,field_input_schema_reference_0,field_output,field_output_schema_0,field_devdescription_0,field_access_3
"Argumentation and argument visualisation in promoting strategic reading and decision-making","Spoken Corpus","Finnish",""," Corpus of upper secondary school students' think-aloud performances when searching the Internet for information","Finland","Department of Educational Sciences, University of Jyväskylä","Carita Kiili capasuki@cc.jyu.fi","","","recordings and transcriptions; text files","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Argumentation in studying problem-solving skills in social work education in Finnish Polytechnics","Spoken Corpus||Written Corpus","Finnish","","Essays and discussions by students in Polytechnics and comprehensive schools","Finland","Department of Educational Sciences, University of Jyväskylä","Kati Vapalahti kati.vapalahti(at)mamk.fi","","","video and audio recordings; text files","","","","","","","","","","","","","","","","","75 essays, 216 online discussion turns, 260 minutes of video and audio recordings; 2-3 months","","","","","","","","","","","","","","","","","","",""
"Audio Recordings Archive","Multimodal Corpus||Spoken Corpus","Finnish","","The Audio Recordings Archive holds over 23,000 hours of recordings collected since 1959, providing authentic samples of Finnish dialects, languages related to Finnish, and other world languages. The collection additionally includes samples of Finnish dialects spoken in Sweden, Norway, Ingria, the United States and Australia. Digitisation of the audio bank was undertaken in 1999. Over half of its content has been digitised, totalling about 13,000 hours of recordings.","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/naark/","","","","","","","","","","","","","","about 13,000 hours digitised audio recordings and 150 hours digitised video recordings (+ key word lists, contents lists, transcripts)","","","","","","","","","","","","","","","","","","",""
"CEFLING project corpus","Written Corpus","English||Finnish","","Finnish as a second language and English as a foreign language writing performances collected from comprehensive school students (grades 7 - 9) in the project CEFLING - Linguistic Basis of the Common European Framework for L2 English and L2 Finnish. Data from several hundred learners; 4-5 writing tasks from each learner; background information, self-assessments of proficiency","","Department of Languages, University of Jyväskylä","Maisa Martin, mmartin(at)campus.jyu.fi","","","","","","","","","","","","","","","","","","","","several hundred learners (data gathering not completed yet)","","","","","","","","","","","","","","","","","","",""
"CLIL (content-and-language-integrated learning) corpus","Multimodal Corpus||Spoken Corpus","Finnish","","videotaped lessons (history, religion, chemistry, physics) conducted in English in lower secondary schools; six biology lessons in Finnish","Finland","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi","","","video","","","","","","","","","","","","",".avi and .mpg2","","","","30 hours","","","","","","","","","","","","","","","","","","",""
"Collaborative writing","Spoken Corpus||Written Corpus","Finnish","","Corpus of group discussions in Finnish by university students when writing a text as a group","Finland","Department of Educational Sciences, University of Jyväskylä","miika.marttunen(at)edu.jyu.fi, minna.pulkkinen(at)edu.jyu.fi","","","recordings and transcriptions; text files","","","","","","","","","","","","","","","","","188 pages; 8 177 turns of speech; two weeks","","","","","","","","","","","","","","","","","","",""
"Comparable Russian-Finnish corpus of juridical texts","Written Corpus","Finnish||Russian","","","","School of Modern Languages and Translation Studies, University of Tampere ","Nina Isolahti, Mihail Mihailov (ät uta.fi) ","","","","","","https://mustikka.uta.fi","","","","","","","","","","","","","","About 2,000,000 word tokens. ","","Registration required. ","","","","","","","","","","","","","","","","",""
"Corpora of Finno-Ugric languages","Spoken Corpus","-- language not in list --","Karelian, Sami, Joenperän vatja","three corpora of Finno-Ugric languages (Karelian, Sami, Joenperän vatja) in open reel format","","Department of Languages, University of Jyväskylä","Jouko Koivisto koivisto(at)campus.jyu.fi ","","","open reel","","","","","","","","","","","","","","","","","24 hours, 12 hours, 30 minutes, respectively","","","","","","","","","","","","","","","","","","",""
"Corpora of Newspaper Texts","Written Corpus","English||Finnish||Swedish","","Computer corpora in Finnish, Swedish and English languages (newspaper texts), with requests and relevance information used in information retrieval evaluation.","","Department of Information Studies, University of Tampere","Eija Airio (ät) uta.fi","","","","","","","","","","","","","","","","","","","","About 142.2, 42.5, and 251 million word tokens respectively; or 1088MB, 281 MB, and 1530 MB respectively.","","","","","","","","","","","","","","","","","","",""
"Corpora of spoken Finnish","Spoken Corpus","Finnish","","several corpora of spoken Finnish in open reel tape or C-cassette format (Finnish dialects, American Finnish, modern spoken Finnish from 1970s to 1990s)","Finland||United States","Department of Languages, University of Jyväskylä","Jouko Koivisto koivisto(at)campus.jyu.fi ","","","open reel tape or C-cassette","","","","","","","","","","","","","","","","","from about 12 hours (American Finnish) to about 1 200 hours (Finnish dialects)","","","","","","","","","","","","","","","","","","",""
"Corpora of Swedish language textbooks","Written Corpus","Swedish","","three corpora of popular Swedish language textbooks (Toppen, Nya vindar (1980s), and Medvind (1991-93)); morphological and syntactic tagging","Finland","Department of Languages, University of Jyväskylä","Matti Rahkonen mrahkone(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","three textbooks","","","","","","","","","","","","","","","","","","",""
"Corpus of Early Literary Finnish","Written Corpus","Finnish","","period: 1809–1899","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xml","","","","","","","","","","","","","","about 7 million word tokens in total","freely accessible on-line data service, Kaino","freely accessible on-line data service, Kaino","","","","","","","","","","","","","","","","",""
"Corpus of Finnish Literary Classics","Written Corpus","Finnish","","period: 1880s–1930s","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xml","","","","","","","","","","","","","","about 1,4 million word tokens in total","freely accessible on-line data service, Kaino","freely accessible on-line data service, Kaino","","","","","","","","","","","","","","","","",""
"Corpus of Magazines and Periodicals","Written Corpus","Finnish","","period: 20th century","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","about 8,6 million word tokens in total","","requires user authorisation","","","","","","","","","","","","","","","","",""
"Corpus of Middle French","Written Corpus","-- language not in list --||French","Middle French","a digitized corpus for the study of the lexis and syntax of Middle French (1300s and 1400s) and for text editions","","Department of Languages, University of Jyväskylä","Terho Joutsen Terho.Joutsen(at)campus.jyu.fi; also available via http://www.csc.fi/","","","","","","","","","","","","","","","","","","","","29 texts; about 1 000 000 words","","","","","","","","","","","","","","","","","","",""
"Corpus of Old Literary Finnish","Written Corpus","Finnish","","period: 1543–1809","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xml","","","","","","","","","","","","","","about 3,4 million word tokens in total","freely accessible on-line data service, Kaino","freely accessible on-line data service, Kaino","","","","","","","","","","","","","","","","",""
"Corpus of Proverbs and Other Colloquial Expressions","Written Corpus","Finnish","","","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/korpus/sp/meta/sp_coll_rdf.xml ","","","","","","","","","","","","","","about 86 000 proverbs, 1,3 million word tokens in total","freely accessible on-line data service, Kaino","freely accessible on-line data service, Kaino","","","","","","","","","","","","","","","","",""
"Corpus of spoken modern French","Spoken Corpus","French","","corpus of spoken modern French; transcriptions included","","Department of Languages, University of Jyväskylä","Terho Joutsen Terho.Joutsen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","20 hours","","","","","","","","","","","","","","","","","","",""
"Corpus of Spoken Southwestern Finnish","Spoken Corpus","Finnish","","audio corpus of spoken Finnish across the traditional Tavastia - Southwest dialect boundary, speakers: over 300 schoolchildren in 12 communities, recorded 1978 and 2006 by Sinikka Niemi and Jussi Niemi","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi","","","","","Niemi, Jussi & Sinikka Niemi: Word tone and related matters in the Finnish Southwest. In: C.- Ch. Elert, I. Johansson & E. Strangert (eds.): Nordic Prosody III, pp. 187-200. Umeå 1984.","","","","","","","","","","","","","","","about 10 hrs. of audio recordings (one structured text transformed to PRAAT format for acoustic analysis)","","","","","","","","","","","","","","","","","","",""
"Corpus of the Finnish Language = Finnish Text Collection (CSC, Language Bank)","Written Corpus","Finnish","","This corpus contains written Finnish from 1990s. The collection has been gathered by the Research Institute for the Languages in Finland, the Department of General Linguistics of the University of Helsinki and the Foreign Languages Department of the University of Joensuu. Web user interfaces available at the Scientist's Interface (CSC). Access also via Unix server (corpus.csc.fi).","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","https://hotpage.csc.fi/","","","","","","","","","","","","","","about 180 million word tokens in total","","requires user authorisation, access via Language Bank (CSC)","","","","","","","","","","","","","","","","",""
"Early English Books Online","Written Corpus","English||Latin","","Digitized printed texts from 1480-1700; mainly in English but also in Latin. Full text search not available.","","Department of History and Ethnology, University of Jyväskylä","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","about 100 000 entries","","","","","","","","","","","","","","","","","","",""
"EFL (English as a Foreign Language) corpus","Multimodal Corpus||Spoken Corpus","English||Finnish","","video taped English lessons from lower secondary and upper secondary (gymnasium) schools","Finland","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi","","","video","","","","","","","","","","","","",".avi and .mpg2","","","","24 hours","","","","","","","","","","","","","","","","","","",""
"Eighteenth Century Collections Online","Written Corpus","English||French||Latin","","Largest text database in the world. Digitized printed texts; mainly in English but also in other languages such as French and Latin, in particular. Full text search.","","Department of History and Ethnology, University of Jyväskylä ","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","33 million pages, about 150 000 entries","","","","","","","","","","","","","","","","","","",""
"Fan fiction corpus","Written Corpus","English||Finnish","","fan fiction texts written by Finns","Finland","Department of Languages, University of Jyväskylä","Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","700 texts","","","","","","","","","","","","","","","","","","",""
"FinDE corpus","Written Corpus","Finnish||German","","A two-way parallel corpus comprising mainly literary texts (Finnish / German), both original texts and their translations","","Department of Languages, University of Jyväskylä","Kirsi Pakkanen-Kilpiä Kirpakk(at)campus.jyu.fi","","","","","","","","","","","","","","","","","","","","791 171 words","","","","","","","","","","","","","","","","","","",""
"FinStud86 corpus","Written Corpus","Swedish","","Swedish language essays / compositions written by Finnish-speaking students taking the Matriculation examination in 1986","Finland","Department of Languages, University of Jyväskylä","Matti Rahkonen mrahkone(at)campus.jyu.fi ","1986","","","","","","","","","","","","","","","","","","","210 compositions, 100 000 words","","","","","","","","","","","","","","","","","","",""
"FinSveStud 79-80 (Studentsvenska 79-80) corpus","Written Corpus","Swedish","","Swedish language essays / compositions written by Finnish-speaking students taking the Matriculation examination in 1979-80; tagged in a number of ways","Finland","Department of Languages, University of Jyväskylä","Matti Rahkonen mrahkone(at)campus.jyu.fi ","1979","","","","","","","","","","","","","","","","","","","799 compositions, 120 000 words","","","","","","","","","","","","","","","","","","",""
"Finland Swedish Text Corpus = Finnish-Swedish Textcollection (CSC, Language Bank)","Written Corpus","Swedish","","period: 1997–2000","Finland","Research Institute for the Languages of Finland","Nina Martola, firstname.lastname[a]focis.fi","","","","","","https://hotpage.csc.fi/","","","","","","","","","","","","","","about 34 million word tokens in total","","requires user authorisation, access via Language Bank (CSC)","","","","","","","","","","","","","","","","",""
"Finnish Telegraphese Corpus","Written Corpus","Finnish","","computer corpus of Finnish telegraphese language (with English interlinears and translation)","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi ","","","","","Tesak, Jürgen, Elisabeth Ahlsén, Gábor Györi, Päivi Koivuselkä-Sallinen, Jussi Niemi & Livia Tonelli: Patterns of ellipsis in telegraphese: A study of six languages. Folia Linguistica 24: 297-316 (1995); Tesak, Jürgen & Jussi Niemi: Telegraphese and agrammatism: A cross-linguistic study. Aphasiology 11: 145-155 (1997).","","","","","","","","","","","","","","","about 3000 word tokens in total","","","","","","","","","","","","","","","","","","",""
"From a child to an adult: the interview of the 14-year-olds in 1974","Written Corpus","Finnish","","Interviews of 14-year-olds on their social behaviour and general circumstances of growing; transcribed","Finland","Department of Psychology, University of Jyväskylä","Katja Kokko katja.kokko(at)psyka.jyu.fi","1974","","rtf-format","","","","","","","","","","","","","","","","","786 pages of text (154 + 152 interviews)","","","","","","","","","","","","","","","","","","",""
"From a child to an adult: the interview of the 20-year-olds in 1980","Written Corpus","Finnish","","Interviews of the 20-year-olds in this longitudinal study on different aspects of their lives; transcribed","Finland","Department of Psychology, University of Jyväskylä","Katja Kokko katja.kokko(at)psyka.jyu.fi","1980","","rtf-format","","","","","","","","","","","","","","","","","740 pages; 134 files","","","","","","","","","","","","","","","","","","",""
"From a child to an adult: the interview of the 27-year-olds in 1986","Written Corpus","Finnish","","Interviews of the 27-year-olds in this longitudinal study on different aspects of their lives; transcribed","Finland","Department of Psychology, University of Jyväskylä","Katja Kokko katja.kokko(at)psyka.jyu.fi","1986","","rtf-format","","","","","","","","","","","","","","","","","4012 pages; 292 files","","","","","","","","","","","","","","","","","","",""
"From a child to an adult: the interview of the teachers of the 14-year-olds in 1974","Written Corpus","Finnish","","Interviews of teachers of the 14-year-olds; transcribed","Finland","Department of Psychology, University of Jyväskylä","Katja Kokko katja.kokko(at)psyka.jyu.fi","1974","","rtf-format","","","","","","","","","","","","","","","","","279 interviews; 1-3 pages each","","","","","","","","","","","","","","","","","","",""
"Gallica corpus of the French National Library","Written Corpus","French","","Digitized printed French texts from the collections of the French National Library; especially good coverage of the period of the revolution. Full text search not available.","","Department of History and Ethnology, University of Jyväskylä","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","tens of thousands of entries","","","","","","","","","","","","","","","","","","",""
"Gaming corpus","Multimodal Corpus||Spoken Corpus","English||Finnish","","video taped PC and game console game sessions by young Finnish boys, speaking Finnish and English","Finland","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi ","","","video","","","","","","","","","","","","",".avi and .mpg2","","","","17 hours","","","","","","","","","","","","","","","","","","",""
"Hansard Online 1804-2004","Written Corpus","English","","Digitized British parliamentary discussions / debates, mainly from 1890-2000 (in progress)","","Department of History and Ethnology, University of Jyväskylä","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Human Development and Its Risk Factors","Multimodal Corpus||Spoken Corpus","Finnish","","Corpus gathered in the Jyväskylä Longitudinal Study of Dyslexia. Large database of video and audiotaped sessions (tests, reading tasks) with 200 dyslexic and non-dyslexic children; tested twice a year over several years","Finland","Department of Psychology, University of Jyväskylä","Kenneth.Eklund(at)psyka.jyu.fi","","","most of the information converted into SPSS files","","","","","","","","","","","","","","","","","about 2 000 tapes","","","","","","","","","","","","","","","","","","",""
"Russian INTAS corpus","Spoken Corpus","Russian","","A corpus of spontaneous discussions and read-aloud performances from native Russian speakers of different ages. Contains audio files (WAV), the phonetic annotations (Praat TextGrid) and text files.","Russia","Department of Languages, University of Jyväskylä","Riikka Ullakonoja, Riikka.Ullakonoja(at)campus.jyu.fi","2002","2009","audio files (WAV), the phonetic annotations (Praat TextGrid) and text files","http://www.speech.pu.ru/instructions_intas.php ","","http://www.speech.pu.ru/results.php","","","","research use","","","","St. Petersburg, Russia","","","","","","10 minutes of spontaneous speech per person (5 men, 5 women per language) + the read-aloud tasks","","http://www.speech.pu.ru/results.php","","","","","","","","","","","","","","","","",""
"ICLFI - International Corpus of Learner FInnish","Written Corpus","Finnish","","The International Corpus of Learner Finnish (ICLFI) is being compiled during the project in 2008 - 2011. The data for the corpus will be compiled with the help of the foreign universities around the world in which Finnish is studied as a foreign language. The corpus will consist of Finnish learners’ spontaneously produced texts in language learning situations. The corpus will help to clarify the specific characteristics of LL and also in order to make more relevant support material for learning, such as dictionaries and educational material.","","Finnish language, University of Oulu","Jarmo Jantunen jarmo.jantunen(at)oulu.fi ","","","","","","","","","","","","","","","","","","","","The size of the data is 137.000 words at the moment. ","","","","","","","","","","","","","","","","","","",""
"IRC corpus","Written Corpus","English||Finnish","","IRC discussion data written by Finns, in several channels","Finland","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi ","","","HT track files","","","","","","","","","","","","","","","","","110 hours (actual discussions comprise only part of these)","","","","","","","","","","","","","","","","","","",""
"Joensuu Agrammatic Aphasia Corpus","Spoken Corpus","Finnish","","computer corpora of semi-spontaneous speech of two Finnish agrammatic (Broca) aphasics (with English morphological interlinears and translations)","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi ","","","","","Niemi, Jussi, Matti Laine, Ritva Hänninen & Päivi Koivuselkä- Sallinen: Agrammatism in Finnish: Two Case Studies. In: L. Menn & L. K. Obler (eds.): Agrammatic Aphasia: A Cross-Language Narrative Sourcebook. Pp. 1013 - 1085. Benjamins, Amsterdam 1990. Supplement to Chapter 14 - Finnish-Language Materials: Control Subjects, pp. 1775-1818.","","","","","","","","","","","","","","","about 2000 word tokens in total","","","","","","","","","","","","","","","","","","",""
"Joensuu Language Acquisition Corpus","Spoken Corpus","Finnish","","computer corpus of spoken language output of a child acquiring Finnish (age 2;4 to 6;7)","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi ","","","","","Niemi, Jussi & Sinikka Niemi: Acquisition of inflectional marking: A case study of Finnish. Nordic Journal of Linguistics 10: 59-89 (1987).","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Joensuu Wernicke Aphasia Corpora","Spoken Corpus","Finnish","","computer corpora of semi-spontaneous speech of two Finnish Wernicke aphasics (one's transcriptions with English morphological interlinears and translations)","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi ","","","","","Niemi, Jussi & Matti Laine: Syntax and Inflectional Morphology in Aphasia: Quantitative Aspects of Wernicke Speakers' Narratives. Journal of Quantitative Linguistics 4: 181- 189 (1997).","","","","","","","","","","","","","","","about 20000 word tokens in total","","","","","","","","","","","","","","","","","","",""
"The Karjalainen Corpus","Written Corpus","Finnish","","computer corpus of Finnish newspaper texts of the 1990s (newspaper Karjalainen, Joensuu)","Finland","Department of Linguistics, University of Joensuu, SGML transformation carried out by the Department of General Linguistics, University of Helsinki","Language Bank, contact information at Joensuu: jussi.niemi (ät) uef.fi ","","","","","Used as basis of frequency counts for psycholinguistic studies of Finnish morphology/lexicon by Jussi Niemi and Matti Laine (Psychology, Åbo Akademi University) and their associates","","","","","","","","","availability through the Language Bank (of Finland) at http://www.csc.fi/english","","","","","","about 35.8 million word tokens","","","","","","","","","","","","","","","","","","",""
"Longi corpus","Written Corpus","Swedish","","a longitudinal corpus Swedish language compositions written by Finnish-speaking upper secondary school (gymnasium) students in 1991-93; parts of speech tagged","Finland","Department of Languages, University of Jyväskylä","Matti Rahkonen mrahkone(at)campus.jyu.fi ","1991","","","","","","","","","","","","","","","","","","","100 students; 8 compositions from each, a total of 150 000 words","","","","","","","","","","","","","","","","","","",""
"Multilingual corpus of juridical texts","Written Corpus","-- language not in list --","multilingual","","","School of Modern Languages and Translation Studies, University of Tampere ","Nina Isolahti, Mihail Mihailov (ät uta.fi) ","","","","","","https://mustikka.uta.fi","","","","","","","","","","","","","","About 1,200,000 word tokens.","","Registration required. ","","","","","","","","","","","","","","","","",""
"The National Certificates corpus","Spoken Corpus||Written Corpus","-- language not in list --||English||Finnish||French||German||Italian||Russian||Spanish||Swedish","Sami","The NC test results, background information, speaking and writing performances in 9 foreign / second languages. A web-based data base (html files).","Finland","Centre for Applied Language Studies, University of Jyväskylä","Mirja Tarnanen tarnanen(at)campus.jyu.fi","","","","","","http://yki-korpus.jyu.fi/","","","","","","","","","","","","","","background information and test results (5 sub-tests, 9 different languages) from 14 000 test takers as SPSS files, 2 000 writing performances, and 700 speaking performances","A web-based data base (html files).","","","","","","","","","","","","","","","","","",""
"New Year Speechs of the President of the Republic of Finland","Written Corpus","Finnish","","text corpus, period 1935–2007","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/korpus/teko/meta/presidentti/presidentti_coll_rdf.xml","","","","","","","","","","","","","","63 110 word tokens","freely accessible on-line data service, Kaino","freely accessible on-line data service, Kaino","","","","","","","","","","","","","","","","",""
"Northern multilingualism","Multimodal Corpus||Spoken Corpus||Written Corpus","-- language not in list --","","digitized interviews (children, adults; in pairs or in groups), written narratives, drawings and pictures from children","","","Sari Pietikäinen sari.pietikainen(at)campus.jyu.fi","","","","","","","","","","","","","","","","","","","","over 30 hours of speech; 15 narratives","","","","","","","","","","","","","","","","","","",""
"Old Bailey Proceedings","Written Corpus","English","","Digitized database of texts from trials at Old Bailey from 1600s to 1800s.","","Department of History and Ethnology, University of Jyväskylä","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Oulu Corpus (CSC, Language Bank)","Written Corpus","Finnish","","The corpus is a representative sample of the Finnish language in the 1960s media.","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","5 800 short samples, 429 058 word tokens and some 29 000 sentences","","requires user authorisation, access via Language Bank (CSC)","","","","","","","","","","","","","","","","",""
"Reality tv corpus","Multimodal Corpus||Spoken Corpus","Finnish","","recordings of two weeks of Big Brother 2006","Finland","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi","","","video","","","","","","","","","","","","",".vow","","","","two weeks' programmes","","","","","","","","","","","","","","","","","","",""
"Resource Collection of human-computer dialogues","Spoken Corpus","English||Finnish","","Computer corpora of different spoken dialogue applications (e.g., timetable systems) in English and Finnish, collected both in laboratory experiments and real usage.","","Department of Computer Sciences, Tampere Unit for Computer-Human Interaction, University of Tampere ","Markku Turunen (mturunen ät cs.uta.fi)","","","","","","","","","","","","","","","","","","","","Thousands of dialogues (depending on the application)","","","","","","","","","","","","","","","","","","",""
"Russian-Finnish parallel corpus of literary texts","Written Corpus","Finnish||Russian","","","","School of Modern Languages and Translation Studies, University of Tampere ","Mihail Mihailov (ät) uta.fi ","","","","","","https://mustikka.uta.fi","","","","","","","","","","","","","","About 5,000,000 word tokens.","","Registration required.","","","","","","","","","","","","","","","","",""
"Swedish-Finnish Parallel Text Corpus (CSC, Language Bank)","Written Corpus","Finnish||Swedish","","period: 21th century","","Research Institute for the Languages of Finland","Nina Martola, firstname.lastname[a]focis.fi","","","","","","","","","","","","","","","","","","","","about 4 million word tokens in total","","requires user authorisation, access via Language Bank (CSC)","","","","","","","","","","","","","","","","",""
"Swedish Telegraphese Corpus","Written Corpus","Swedish","","computer corpus of Swedish telegraphese language (with English interlinears and translation), compiled by Elisabeth Ahlsén (Linguistics, U. Göteborg), and analyzed (tagged & translated) and finalized by Jussi Niemi","Sweden","Department of Linguistics, University of Joensuu; Linguistics, U. Göteborg","jussi.niemi (ät) uef.fi ","","","","","Tesak, Jürgen, Elisabeth Ahlsén, Gábor Györi, Päivi Koivuselkä-Sallinen, Jussi Niemi & Livia Tonelli: Patterns of ellipsis in telegraphese: A study of six languages. Folia Linguistica 24: 297-316 (1995)","","","","","","","","","","","","","","","about 5000 word tokens in total","","","","","","","","","","","","","","","","","","",""
"Syntax Archive Data (= Lauseopin arkisto)","Spoken Corpus||Written Corpus","Finnish","","The data is owned by the Research Institute for the Languages in Finland and the Department of Finnish and Generel Linguistics at the University of Turku. The Syntax Archive Data contains dialects from 132 Finnish parishes (one hour from each parish) and literary Finnish (40 units).","Finland","Research Institute for the Languages of Finland","Mikko Lounela and Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","about 1 million word tokens in total","","","","","","","","","","","","","","","","","","",""
"Talk show corpus","Multimodal Corpus||Spoken Corpus","English||Finnish","","recordings of talk shows (Yölento, HardTalk, Newsnight)","","Department of Languages, University of Jyväskylä","Terhi Paakkinen, terpaak(at)campus.jyu.fi and Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi ","","","video","","","","","","","","","","","","","","","","","9 hours","","","","","","","","","","","","","","","","","","",""
"Text from the Samples of Finnish Dialects Collection","Written Corpus","Finnish","","forthcoming (2009?)","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","text and audio, about 100 hours","","","","","","","","","","","","","","","","","","",""
"The Audio Recordings Archive of Oulu","Spoken Corpus","-- language not in list --||Finnish","Sami","The Audio Recordings Archive of Oulu stores analogical and digital recordings. The recordings are samples of Finnish dialects, cultural history, modern colloquial language, child language, Finnic minority languages and Saami languages. The oldest recordings are from early 1960s. Copies are from Research Centre for Languages in Finland. Less than half of the analogical recordings is digitised. The Audio Recordings Archive of Oulu (Oulun nauhoitearkisto = ONA) is founded in 1967. Recordings are partly made by studentes and partly by staff of Finnish Language. ","Finland","Finnish language, University of Oulu","Marketta Harju-Autti marketta.harju-autti(at)oulu.fi","","","","","","","","","","","","","","","","","","","","Total amount of recordings is 7000 hours of which approximately 5000 hours is unique material.","","","","","","","","","","","","","","","","","","",""
"The Finnish Broadcasting Company Corpus of Subtitles","Written Corpus","-- language not in list --||Finnish||Swedish","Sami","Digital research material of translated subtitles compiled by Jukka Mäkisalo and Sonja Tirkkonen-Condit, 2005","Finland","Department of Translation Studies, University of Joensuu","jukka.makisalo (at) uef.fi ","2005","","","","","","","","","","","","","","","","","","","ca. 100 million word tokens","","","","","","","","","","","","","","","","","","",""
"The Making of the Modern Economy","Written Corpus","Dutch||English||French||German||Swedish","","Digitized printed texts related to economic history from 1520 - 1820, mainly in English but also in French, German, Dutch, and Swedish. Limited full text search.","","Department of History and Ethnology, University of Jyväskylä","Pasi Ihalainen pasi.ihalainen(at)campus.jyu.fi ","","","","","","","","","","","","","","","","","","","","about 65 000 entries","","","","","","","","","","","","","","","","","","",""
"Weblog corpus","Written Corpus","English||Finnish","","weblogs written by Finns","Finland","Department of Languages, University of Jyväskylä","Leila Kääntä, Leila.Kaanta(at)campus.jyu.fi ","","","HT track files","","","","","","","","","","","","","","","","","about 300 blogs","","","","","","","","","","","","","","","","","","",""
"Atlas of Place Names","other","Finnish","","","Finland","Research Institute for the Languages of Finland","Terhi Ainiala, firstname.lastname[a]kotus.fi","","","freely accessible on-line data service, Kaino","","","http://kaino.kotus.fi/nikar/","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Digital archive of Finnish Folk Tunes","other","Finnish","","Digitalized versions of Finnish folk tunes and their relevant details (notation, key, meter, place of collection, lyrics, collector), 8613 Finnish folk tunes (including part of the lyrics)","Finland","Department of Music, University of Jyväskylä","Petri Toiviainen ptoiviai(at)campus.jyu.fi","","","","","","http://esavelmat.jyu.fi// ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Etymological Reference Database","other","Finnish","","","Finland","Research Institute for the Languages of Finland","Klaas Ruppel and Toni Suutari, firstname.lastname[a]kotus.fi","","","freely accessible on-line data service, Kaino","","","http://kaino.kotus.fi/sanat/evita/","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Frequency list: Early Modern Finnish","other","Finnish","","Frequency list of the Corpus of Early Modern Finnish, 4 862 190 words","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","freely accessible on-line data service, Kaino","","","http://kaino.kotus.fi/sanat/taajuuslista/vns.php","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Frequency list: Old Literary Finnish","other","Finnish","","Frequency list of the Corpus of Old Literary Finnish, 3 425 382 words","Finland","Research Institute for the Languages of Finland","Mikko Lounela, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/sanat/taajuuslista/vks.php","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Parole frequency list","other","Finnish","","frequency list of the Parole corpus, 1 339 787 words","Finland","","","","","","","","http://kaino.kotus.fi/sanat/taajuuslista/parole.php","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Álgu – Origins of Saami Words","Lexicon / Knowledge Source","-- language not in list --","Sami","The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.","Finland","Research Institute for the Languages of Finland","Klaas Ruppel, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/algu/ ","","","","","","","","","","","","","","","","","","","","","","about 86 000 words, 180 000 relations","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Dictionary of Carelian (= Karjalan kielen sanakirja)","Lexicon / Knowledge Source","-- language not in list --","Carelian","","","Research Institute for the Languages of Finland","Marja Torikka and Jari Vihtari, firstname.lastname[a]kotus.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Dictionary of Finnish Dialects (= Suomen murteiden sanakirja)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Ulla Takala and Outi Lehtinen, firstname.lastname[a]kotus.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Dictionary of Finno-Swedish dialects (= Ordbok över Finlands svenska folkmål)","Lexicon / Knowledge Source","Swedish","","","Finland","Research Institute for the Languages of Finland","Peter Slotte, firstname.lastname[a]focis.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Dictionary of Old Literary Finnish (= Vanhan kirjasuomen sanakirja)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Pirkko Kuutti and Risto Widenius, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Digital Listing of Headwors in the Dictionary of Carelian (= Karjalan kielen sanakirja 1–6, 1968–2005)","Lexicon / Knowledge Source","-- language not in list --","Carelian","","","Research Institute for the Languages of Finland","Jari Vihtari and Marja Torikka, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/sanat/kkss/","","","","","","","","","","","","","","","","","","","","","","94 534 headwords","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Electronic Vepsian Word List","Lexicon / Knowledge Source","-- language not in list --","Vepsian","","","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/sanat/vepsa/","","","","","","","","","","","","","","","","","","","","","","about 15 000 headwords","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Geographic Names Register of the National Land Survey","Lexicon / Knowledge Source","Finnish||Swedish","North Saami, Inari Saami, Skolt Saami","","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","720 000 names are Finnish, 75 000 Swedish, 4 500 North Saami, 3 800 Inari Saami and 150 Skolt Saami","","The register may be accessed for research purposes at the Research Institute.","","","","","","","","",""
"Headwords in the Dictionary of Modern Finnish (= Nykysuomen sanakirja 1–6, 1951–1961)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","about 210 000 headwords","","requires user authorisation, access via Unix server (suomi.kotus.fi) ","","","","","","","","",""
"Joensuu Corpus of Finnish Compounds","Lexicon / Knowledge Source","Finnish","","computer corpus (full list) of compounds of CD-perussanakirja (electronic version of Suomen kielen perussanakirja, the most comprehensive dictionary of contemporary Finnish, see http://www2.lingsoft.fi/cdps/), with morphological category information","Finland","Department of Linguistics, University of Joensuu","jussi.niemi (ät) uef.fi ","","","","","To be used in a comparative study of compounds co-ordinated by Sergio Scalise (Linguistics, U. Bologna), see http://morbo.lingue.unibo.it/mmm/enlm.php","","","","","","","","","","","","","","","","","","","","","","","about 52000 word tokens in total","","","","","","","","","","",""
"Joensuu Corpus of Swedish Compounds","Lexicon / Knowledge Source","Swedish","","computer corpus (list) of Swedish compounds collected from XXX","","Department of Swedish, University of Joensuu","sinikka.niemi (ät) uef.fi ","","","","","To be used in a comparative study of compounds co-ordinated by Sergio Scalise (Linguistics, U. Bologna), see http://morbo.lingue.unibo.it/mmm/enlm.php","","","","","","","","","","","","","","","","","","","","","","","about 30000 (to be checked) word tokens in total","","","","","","","","","","",""
"Johan Habermans land survey register of Pien-Savo district from 1620s (= Ed. Timo Alanen 2004: Johan Habermanin maantarkastusluettelo Pien-Savosta 1620-luvulta)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Timo Alanen, firstname.lastname[a]kotus.fi","","","","","","http://scripta.kotus.fi/www/verkkojulkaisut/julk2/","","","","","","","","","","","","","","","","","","","","","","about 7000 place names, 3 000 personal names","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Land survey register of Sääminki and Rantasalmi parishes from years 1562–1563 (= Ed. Timo Alanen [2006]: Säämingin ja Rantasalmen maantarkastusluettelo vuosilta 1562–1563)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Timo Alanen, firstname.lastname[a]kotus.fi","","","","","","http://scripta.kotus.fi/www/verkkojulkaisut/julk4/","","","","","","","","","","","","","","","","","","","","","","about 6 000 place names, 1 500 personal names","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Lexical Data from the Archive of Modern Finnish","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Elisa Stenvall, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","A detailed account of the intended use should be given when applying for user authorisation. ","","","","","","","","",""
"Lexicon of the Finno-Swedish place name endigs (= Namnledslexicon)","Lexicon / Knowledge Source","Swedish","","","Finland","Research Institute for the Languages of Finland","Outi Lehtinen, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/svenska/ledlex/","","","","","","","","","","","","","","","","","","","","","","","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Modern Finnish Lexicon","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Minna Haapanen, firstname.lastname[a]kotus.fi","","","","","","http://kaino.kotus.fi/sanat/nykysuomi/","","","","","","","","","","","","","","","","","","","","","","about 94 000 headwords","","freely accessible on-line data service, Kaino","","","","","","","","",""
"Origin of the Finnish words (= Suomen sanojen alkuperä)","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Klaas Ruppel, firstname.lastname[a]kotus.fi ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
"Population Register Centre’s register of personal names","Lexicon / Knowledge Source","-- language not in list --||Finnish||Swedish","","","Finland","Research Institute for the Languages of Finland","Toni Suutari, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","The register is available only for scientific research purposes. A detailed account of the intended use should be given when applying for user authorisation.","","","","","","","","",""
"Toponymic Database","Lexicon / Knowledge Source","Finnish","","","Finland","Research Institute for the Languages of Finland","Raija Miikkulainen, firstname.lastname[a]kotus.fi","","","","","","","","","","","","","","","","","","","","","","","","","","","","91 570 place names","","requires user authorisation, access via Unix server (suomi.kotus.fi)","","","","","","","","",""

Topic revision: r4 - 2011-11-14 - KimmoKoskenniemi
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback