SPRÅKVIS - VISMANSRAPPORT - EXPERT PANEL REPORT

The Nordic Countries - A Leading Region in Language Technology

Edited by Krister Lindén, Kimmo Koskenniemi and Torbjørn Nordgård

The Nordic Council of Ministers has commissioned a ten-year plan in the form of an expert panel report (= vismansrapport) for making the Nordic Countries a leading region in language technology (LT). LT means a number of technologies used by computers for processing human language, e.g. spell-checking, machine translation and speech recognition to mention only the most well-known. Applications are diverse. The aim of the report is to identify the common key areas which need to be addressed when making the Nordic countries into a leading region. The report highlights key areas, magnitudes of investments, suggested partners, modes of cooperation and some initial key actions.

The Nordic Council of Ministers has recently concluded a successful LT Research Programme, which is briefly outlined as background information. This investment should be seen in relation to the investments the Nordic Countries have made in university-lead LT development projects in Denmark, Finland, Iceland, Norway and Sweden. Information on these was collected from public databases in the Nordic Countries and the information was circulated for comments among the contributors to the report.

We sent out a questionnaire among 70 invited experts from the Nordic Countries collecting comments on an initial vision for LT in 2016 and its prerequisites as well as current obstacles for LT development and general trends influencing LT development and its applications. In the questionnaire we also asked for recommendations on the order of magnitude of investments and modes of cooperation needed. Of the invited experts, 30 contributed their comments, which we hereby gratefully acknowledge. When analyzing the background and the comments on the questionnaire, we identified six key areas: LT Policy, LT Resources, LT Research and Development, LT Training and Education, LT Legislation and LT Business Aspects for which we present our recommendations and an action plan in this Expert Panel Report.

Contents

For printing the full report [55 p], you may wish to use the following Book View and its Printable button in the top right corner.


SpråkVis - Språkteknologisk vismansrapport

av Krister Lindén, Kimmo Koskenniemi och Torbjørn Nordgård

Sammanfattning

Nordiska Ministerrådet har beställt en tioårsplan i form av en vismansrapport för att göra de nordiska länderna till en ledande region i språkteknologi. Sex nyckelområden har identifierats: Policy, Resurser, Forskning och utveckling, Utbildning och undervisning, Lagstiftning och Affärsverksamhet, för vilka vi presenterar rekommendationer och en åtgärdsplan i denhär vismansrapporten.

Policy: Vi måste sprida insikten att språkteknologi har en nyckelposition för att bevara och upprätthålla våra språk och vår kultur. Språkteknologi behövs t.ex. i den digitala infrastrukturen för den humanvetenskapliga och den socialvetenskapliga forskningen. Det är ingen skillnad om språkteknologin har utvecklats akademiskt, med öppen källkod eller kommersiellt, så länge den finns och språkteknologimodulerna är kompatibla och tillgängliga för att bygga stora system och tillämpningar. Små språksamfund kommer inte att få språkteknologi på kommersiella grunder, så de flesta (eller alla) språk i regionen behöver åtminstone en viss mängd offentligt stöd och somliga kommer kanske att vara helt beroende av det. På nordisk nivå behöver vi komma överens om rekommendationer för hur vi skall agera på det nationella planet. För att utvärdera situationen för språkspecifika och språkoberoende resurser för språken i regionen, borde en BLARK-rapport utarbetas där de grundläggande språkresurserna i Norden kartläggs. Norden behöver hålla sig ajour med utvecklingen inom EU för att inte upprepa redan gjorda insatser och för att fokusera på det specifikt nordiska. Deltagarna i NODALIDA 2005 beslöt grunda en förening för tal- och språkteknologi, som skall kallas NEALT (Northern European Association for Language Technology). En sådan organisation vore idealisk för att koordinera olika initiativ och nätverk.

Resurser: Den mest uppenbara och viktigaste investeringen vore att skapa en lämplig infrastruktur som har tillräckligt med språkteknologiska resurser för relevanta språk i regionen. Resurserna bör kunna användas fritt för såväl forskning och undervisning som för kommersiell produktutveckling. På basen av den utvärdering av situationen som framkommer av BLARK-rapporten bör de viktigaste korpusarna skapas på nationell nivå med samarbete på nordisk nivå kring utveckling och utbyte av viktiga språkoberoende redskap och metoder.

Forskning och utveckling: Finansiärer av akademisk forskning bör anamma rekommendationer och regler för språkresurser som skapas (eller har skapats) med allmänna medel. Det borde vara normal praxis att forskare gör språkresurserna tillgängliga för övriga forskare med så fria villkor och licenser som möjligt. Gemensamma gränssnitt och redskap bör skapas i samarbete med både kommersiella och akademiska parter.

Utbildning och undervisning: Mera samarbete behövs kring akademisk utbildning mellan universiteten i den nordiska och baltiska regionen. En tillräcklig mängd specialister med doktors- och kandidatexamen bör behärska de mest avancerade färdigheterna och alla regionens länder och språkgrupper bör delta inklusive minoriteter och små språkgrupper.

Lagstiftning: Nuvarande lagstiftning om kopieringsskydd gör det onödigt svårt och dyrt att samla korpus. Vissa privilegier ges för tillfället åt några nationella bibliotek för att arkivera elektroniska kopior av böcker, tidningar, osv. och ett liknande privilegium behövs för att skapa språkteknologiresurser. Lagstiftningen borde ändras så att det blir möjligt att samla in text- och talkorpus som används för forskning och utveckling av språkteknologiredskap. Att använda dylika korpus bör anses vara förenligt med principerna om kopieringsskydd när återpublicering av korpusen utesluts.

Affärsverksamhet: Licensvillkoren för språkteknologiresurser måste tillåta och uppmuntra både kommersiell och akademisk användning. Tillämpad forskning på medellång sikt i samarbete mellan universitet och industri bör uppmuntras.

Åtgärdsplan: Målet med rapporten var att identifiera nyckelområden, storleken på finansieringen, berörda parter och former för samarbete. För att förverkliga målen och för att utarbeta mer detaljerade planer och tidsramar för områdena i 10-årsplanen, föreslår vi att resurser allokeras för:

  1. etablering av NEALT och dess arbetsutskott
  2. mandat för att utarbeta BLARK-rapporter för de nordiska språken
  3. nordisk finansiering av samarbete inom språkteknologisk utbildning och undervisning
  4. nationell finansiering av tillämpad forskning på medellång sikt i samarbete mellan universitet och industri

När BLARK-rapporterna har färdigställts, bör resurser under NEALTs koordinering allokeras för:

  1. nordisk finansiering av språkteknologiska redskap baserade på BLARK-rapporternas rekommendationer
  2. nordisk och nationell finansiering av korpus, trädbanker, och lexikon i enlighet med BLARK-rapporterna

-- KristerLinden - 01 Jun 2006


SpråkVis - Language Technology Expert Panel Report

by Krister Lindén, Kimmo Koskenniemi och Torbjørn Nordgård

Summary

The Nordic Council of Ministers has commissioned a ten-year plan in the form of an expert panel report for making the Nordic Countries a leading region in language technology (LT). Six key areas were identified: LT Policy, LT Resources, LT Research and Development, LT Training and Education, LT Legislation and LT Business Aspects, for which we present recommendations and an action plan in this Expert Panel Report.

LT Policy: We need to raise awareness that LT has a key position for protecting and maintaining our languages and our culture. LT is necessary e.g. for developing a digital infrastructure for research in the humanities and the social sciences. It does not matter whether LT is academic, open source or commercial, as long as it exists and its modules are compatible and available for building large systems and applications. Small language communities will not get LT on a commercial basis alone, so most (or all) languages in the area need at least some public support and some may be totally dependent on it. At the Nordic level, we need to establish recommendations for the actions on the national level. To assess the situation for language-specific and language-independent resources for the languages in the area, a Basic Language Resource Kit (BLARK) report for the Nordic languages should be prepared. The Nordic region needs to stay abreast with the development in the EU in order not to duplicate efforts and in order to focus on the aspects that are specifically Nordic. The participants of the NODALIDA 2005 decided to establish an association for speech and language technology which will be called NEALT (Northern European Association for Language Technology). Such an association would be ideal for coordinating various initiatives and networks.

LT Resources: The most obvious and substantial investment would be to create an appropriate infrastructure which has sufficient LT resources for relevant languages of the area. The resources belonging to the infrastructure should be freely available for research and training as well as for commercial product development. Based on the assessment of the situation in the BLARK report the most urgent gaps in availability of corpora should be filled in using national funding with cooperation on the Nordic level for developing and exchanging language-independent tools and methods.

LT Research and Development: The academic funding institutions ought to adopt recommendations or rules concerning linguistic resources which will be (or have been) developed using public funding. It ought to be a normal requirement that the researchers make the linguistic resources available for the rest of the research community with as free conditions or licenses as possible. Common interfaces and tools must be created in cooperation between both commercial and academic parties.

LT Training and Education: More cooperation is needed in academic training among the universities in the Nordic/Baltic region. A sufficient number of highly skilled PhDs and Masters ought to be trained with the best possible LT skills and all countries and language groups should be participating, including minorities and small language communities.

LT Legislation: Current copyright legislation makes the collection of resources unnecessarily difficult and costly. Certain privileges are currently granted to a few national libraries for archiving electronic copies of books, journals etc. and similar privileges are needed for creating LT resources. The legislation should be changed so that the collection of text and speech corpora for the purposes of research and development is possible. The use of such corpora should be deemed to conform to the principles of copyright when excluding republication.

LT Business Aspects: The licensing conditions of LT resources must allow and encourage both their commercial and academic use. Medium term applied research projects involving university and industrial partners should be encouraged.

Action Plan: The aim of the report was to identify key areas, magnitude of funding, parties involved and modes of cooperation. To implement the goals and to further specify the areas and their time-frames in the 10-year plan, we suggest that resources are allocated for:

  1. Establishing of NEALT and its working groups
  2. Commissioning BLARK reports for the Nordic languages
  3. Nordic funding for cooperation on LT training and education
  4. National funding of medium-term applied research projects involving university and industrial partners

When the BLARK reports have been delivered, resources coordinated by NEALT should be allocated for

  1. Nordic funding of LT tools according to the recommendations of the BLARK reports
  2. Nordic and national funding of corpora, treebanks and lexicons based on the BLARK report recommendations

-- KristerLinden - 01 Jun 2006


SpråkVis - Språkteknologisk vismansrapport

Krister Lindén, Kimmo Koskenniemi och Torbjørn Nordgård

Utvidgad sammanfattning

Mandat

Nordiska Ministerrådet och Nordens Språkråd beställde en tioårsplan i form av en vismansrapport av prof. Kimmo Koskenniemi och prof. Torbjørn Nordgård över hur de nordiska (och baltiska) länderna kan göras till en ledande region i språkteknologi.

Med språkteknologi avses sådan teknologi som används av datorer för att bearbeta och stöda användningen av mänskligt språk. Traditionell språkteknologi är stavnings- och grammatikkontroll, maskinell översättning och taligenkänning. Tillämpningar för slutanvändare är många och skiftande, t.ex. skrivstöd i textbehandling, informationssökning i myndighetsportaler, dialoger i datorspel och hemelektronik, datorstödd språkinlärning, etc.

Avsikten med rapporten är att identifiera gemensamma nyckelområden för olika former av språkteknologi, storleken på nödvändiga investeringar, samarbetspartners och samarbetsformer som skapar förutsättningar för att göra Norden till en ledande region.

Arbetsform

Vi samlade in finansiell bakgrundsinformation om tidigare projekt i Norden och i de enskilda nordiska länderna (Danmark, Finland, Island, Norge, Sverige) för att få en överblick över tidigare investeringar. Informationen hämtades från offentliga databaser i de nordiska länderna och verifierades av inbjudna experter. Vi samlade även in policydokument och rapporter.

Vi sammanställde ett frågeformulär där vi bad experter kommentera och formulera en vision för 2016, identifiera hinder och trender. Vi bad även experterna ange storleken på de nödvändiga åtgärderna och investeringarna. Vi bjöd in 70 experter, varav 30 svarade. På basen av dessa svar identifierade vi olika nyckelområden.

Vi identifierade sex nyckelområden: policy, resurser, forskning och utveckling, utbildning och undervisning, lagstiftning och företagsaspekter, för vilka vi lägger fram rekommendationer i vismansrapporten. Avslutningsvis föreslår vi även en följd av åtgärder.

Bakgrund

Nordiska rådet har just avslutat ett forskningsprogram "Nordisk Sprogteknologisk Forskningsprogram 2000-2004" med avsikt att höja profilen för det nordiska språksamfundet och säkerställa god nordisk språkteknologi för användarna. Mera specifikt innebar det tre mål för att stöda forskning och forskningsbaserad undervisning:

  • förbättra kommunikationen mellan de nordiska forskarna i språkteknologi,
  • förbättra samarbetet inom forskarutbildningen,
  • etablera dokumentationscenter för att garantera tillgången till och spridningen av forskningsresultat, insamlade data och utvecklade redskap.

För att nå dessa mål valdes tre specifika prioritetsområden:

  • CALL (Computer-Aided Language Learning) - datorstödd språkundervisning,
  • CLIM (Cross-Lingual Information Management) - tvärspråklig informationshantering,
  • NLHCI (Natural Language Human Computer Interaction) - kommunikation med datorer på naturligt språk.

För att uppnå detta mål avsatte Nordiska rådet ca. 5 miljoner DKK årligen (23 278 500 DKK) dvs. Norden 0,6 M¤/år (tot. 3,1 M¤) under 2001-2004.

Satsningar i de nordiska länderna

För att jämföra forskningsfinansieringen i de enskilda nordiska länderna, sökte vi i de nordiska ländernas offentliga databaser och valde att titta på den statliga finansieringen av universitetsledda projekt, eftersom den fanns tillgänglig för alla de nordiska länderna under perioden 2003-2005. Siffrorna verifierades genom att cirkulera dem bland de berörda experterna i rapporten. Generellt kan sägas att grundsatsningarna i Sverige, Norge och Danmark har varit på samma nivå räknat per capita. I Norge och Island har man dock gjort strategiska tilläggssatsningar på språkteknologi under perioden. I jämförelse med de nationella satsningarna har den nordiska satsningen bidragit med ungefär en tiondel per capita.

Land Årligen Per invånare
Danmark 0,9 M¤ 0,2 ¤
Finland 2,1 M¤ 0,4 ¤
Island 0,2 M¤ 0,7 ¤
Norge 3,1 M¤ 0,7 ¤ (0,2 ¤ utan strategisk tilläggssatsning)
Sverige 1,6 M¤ 0,2 ¤
Norden 0,6 M¤ 0,02 ¤

I dessa siffror ingår inte statliga bidrag till kommersiellt ledd forskning. Inte heller EU-finansierad forskning ingår. Totalt har de enskilda Nordiska länderna finansierat universitetsledda forskningsprojekt för ca 24 M¤ under 2003-2005.

Vad gjordes för pengarna?

De olika länderna har dock betonat olika typer av språkteknologi. En grov bild av satsningarna kan man få genom att dela in dem i t.ex. textbaserade och talbaserade teknologier. Alla länder har gjort något i båda kategorierna men endast Norge har satsat ungefär lika mycket på båda.

Land Text Tal
Danmark x (x)
Finland (x) x
Island x (x)
Norge x x
Sverige x (x)
Norden x (x)

Danmark

I Danmark finansierar Videnskabsministeriet forskning i språkteknologi under byrån för Forskning, teknologi och innovation, som sköter sekretariatuppgifter för ett antal självständiga råd. De två råden som sköter språkteknologi är det danska rådet för fri forskning (Danish Council for Independent Research) and det danska rådet för strategisk forskning (Danish Council for Strategic Research). Under 2003-2005 har Danmark spenderat ungefär 2,6 M¤ huvudsakligen på textbaserad språkteknologisk forskning.

Finland

I Finland är de två statliga huvudfinansiärerna av forskning Finlands Akademi och TEKES (Finnish Funding Agency for Technology and Innovation). Finlands Akademi finansieras av Undervisningsministeriet and TEKES finansieras av Handels- och industriministeriet. Under 2003-2005 har Finland spenderat ungefär 6,3 M¤ med betoning på talteknologisk forskning.

Island

På Island har under 2003-2005 investerats ungefär 0,7 M¤ med betoning på grundläggande textbaserade redskap och resurser.

Norge

I Norge är den huvudsakliga finansiären av universitetsledd forskning Norges forskningsråd (Norwegian Research Council). Under 2003-2005 har Norge haft ett strategiskt forskningsprogram för språkteknologi "Kunnskapsutvikling for norsk språkteknologi (KUNSTI, 2001-2006)", vilket svarar för 70 % av finansieringen under perioden. Dessutom har Norge ett antal fristående projekt. Under 2003-2005 har Norge spenderat ungefär 9,2 M¤ med en tämligen jämbördig täckning av text- och talbaserad språkteknologisk forskning.

Sverige

I Sverige sköts finansieringen av flera olika instanser, av vilka de huvudsakliga instanserna är Sveriges forskningsråd (Swedish Research Council), VINNOVA (Swedish Governmental Agency for Innovation Systems) och i lite mindre utsträckning Kunskapsstiftelsen (Knowledge Foundation). En strategisk investering i språkteknologi avslutades före den valda jämförelseperioden. Under 2003-2005, har Sverige spenderat ungefär 4,8 M¤ huvudsakligen på textbaserad språkteknologisk forskning.

Vad borde göras?

Man kan kanske begrunda huruvida det är lämpligt att på nordisk nivå göra precis som i de enskilda nordiska länderna? Kan man fördela arbetet mellan länderna? Det finns ju gott om uppgifter. Finns det en specifikt nordiska och mellanstatliga uppgifter? Vad bör och kan man göra med offentliga medel på nordisk nivå som gynnar alla parter och samtidigt gynnar en marknad för språkteknologi i Norden?

Vi har identifierat vissa gemensamma nyckelområden på mellanstatlig nivå, som skapar förutsättningar för att göra Norden till en ledande region för olika former av språkteknologi. Dessa nyckelområden är:

  • policy
  • resurser
  • forskning och utveckling
  • utbildning och undervisning
  • lagstiftning och
  • affärsverksamhet

Policy

Vi måste sprida insikten att språkteknologi har en nyckelposition för att bevara och upprätthålla våra språk och vår kultur. Språkteknologi behövs t.ex. i den digitala infrastrukturen för den humanvetenskapliga och den socialvetenskapliga forskningen. Det är ingen skillnad om språkteknologin har utvecklats akademiskt, med öppen källkod eller kommersiellt, så länge den finns och språkteknologimodulerna är kompatibla och tillgängliga för att bygga stora system och tillämpningar. Vi behöver en språkteknologisk infrastruktur.

Små språksamfund kommer inte att få språkteknologi på kommersiella grunder, så de flesta (eller alla) språk i regionen behöver åtminstone en viss mängd offentligt stöd och somliga kommer kanske att vara helt beroende av det.

På nordisk nivå behöver vi komma överens om rekommendationer för hur vi skall agera på det nationella planet. För att utvärdera situationen för språkspecifika och språkoberoende resurser för språken i regionen, borde en BLARK-rapport utarbetas (Basic Language Resource Kit), där de grundläggande språkresurserna i Norden kartläggs (10-25 k¤/språk). Norden behöver hålla sig ajour med utvecklingen inom EU för att inte upprepa redan gjorda insatser och för att fokusera på det specifikt nordiska. På nordisk nivå kan vi stöda sådant som alla har nytta av, dvs. metoder, standarder, avtalsmodeller, medan korpus och data bör samlas in på nationell nivå.

Deltagarna i NODALIDA 2005 beslöt grunda en förening för tal- och språkteknologi, som skall kallas NEALT (Northern European Association for Language Technology). En sådan organisation vore idealisk för att koordinera olika initiativ och nätverk (50 k¤). Av specifikt nordiskt intresse är:

  • att starta upp och etablera NEALT och en elektronisk publikation under dess ledning,
  • någon form av fortsättning för NorDocNet centren (jfr. Utbildning och undervisning),
  • någon form av fortsättning för NGSLT via NordForsk (jfr. Utbildning och undervisning), och
  • individuella småprojekt (koordinerade och möjligen utförda av NEALT), t.ex. för att förbereda mera detaljerade rekommendationer för att
    • ändra lagstiftningen för immateriella rättigheter (IPR, jfr. Lagstiftning),
    • rekommendationer för finansierande institutioner för att garantera tillgång och återanvändning av språkteknologiska resurser skapade med offentliga medel (jfr. Forskning och utveckling), och
    • rekommendationer för forskning och/eller kommersiell användning av ordböcker och ordlistor skapade som en del offentligt finansierad kompilering av ordböcker (jfr. Resurser).

Resurser

Den mest uppenbara och viktigaste investeringen vore att skapa en lämplig infrastruktur som har tillräckligt med språkteknologiska resurser för relevanta språk i regionen. Resurserna bör kunna användas fritt för såväl forskning och undervisning som för kommersiell produktutveckling. På basen av den utvärdering av situationen som framkommer av BLARK-rapporten bör de viktigaste korpusarna skapas på nationell nivå med samarbete på nordisk nivå kring utveckling och utbyte av viktiga språkoberoende redskap och metoder.

Resurser för språkteknologisk infrastruktur:

  • färdig uppsättning moduler såsom morfologiska och syntaktiska analysatorer och generatorer (2-5 M¤),
  • redskap för att bygga moduler (2-5 M¤).
  • korpus annoterade och oannoterade (10-15 M¤ per språk),
  • lexikon för tal och skriftspråk (10 M¤ per språk).

OBS! Vi måste göra något för att få ner utvecklingskostnaderna på korpus och lexikon för språkteknologisk forskning och produktutveckling t.ex. genom lagstiftning och avtal.

Moduler

Både kommersiellt och akademiskt skapade språkteknologiska moduler behöver kompatibilitet och gemensamma gränssnitt för att kunna återanvända fristående moduler och resurser. Språkoberoende redskap kan användas för att skapa både moduler och resurser. Gemensamma programvarugränssnitt gör det möjligt att använda modulkombinationer som befrämjar samkörbara och mångspråkiga produkter och system.

Redskap

Fritt användbara och uppdaterbara språkoberoende redskap behövs för att investeringarna i språkteknologi inte skall gå förlorade på långsikt. Samkörbara komponenter och mångspråkiga produkter kan åstadkommas med sådana redskap. T.ex. teorin och teknologin kring ändliga finita automater ger förutsättningar för mycket effektiva och modulära implementationer för ett antal olika uppgifter.

Korpus

Tal- och textkorpus och deras kombinationer är nödvändiga som utgångspunkt för många typer av språkteknologiska moduler och tillämpningar. Den nödvändiga kvantiteten av bearbetade korpusdatasamlingar har växt med flera magnituder på senare år, när man skapat metoder där datorer automatiskt kan lära sig från data. Olika typer av annotering av korpusdata är nödvändiga för olika metoder och forskningsändamål. Ofta utesluter tillgången till korpusmaterial kommersiell användning av slutresultatet, vilket omöjliggör utvecklandet av återanvändbara språkmoduler. Gemensamma modellkontrakt för att samla in copyright-skyddade korpusdata som garanterar möjligheterna att använda materialet på lämpligt sätt, borde skapas för alla de nordiska länderna, vilket kunde reducera utvecklingskostnaderna för språkmoduler betydligt.

Lexikon

Ordböcker och ordboksmaterial som har utvecklats med offentliga medel borde publiceras som öppen källkod så att de kan användas för att skapa språkteknologiska moduler så som morfologiska och syntaktiska analysatorer. Mer specifikt borde ordlistor med ord- och böjningsklass göras användbara så fritt som möjligt både för akademiskt och kommersiellt bruk. Hela texten i publicerade ordböcker kan reserveras för akademiskt bruk, men det får inte finnas begränsningar på metoder, regler och program, som har utvecklats på basen av dylikt material, om de inte innehåller bitar som är skyddade av copyright av original.

Forskning och utveckling

Finansiärer av akademisk forskning bör anamma rekommendationer och regler för språkresurser som skapas (eller har skapats) med allmänna medel. Det borde vara normal praxis att forskare gör språkresurserna tillgängliga för övriga forskare med så fria villkor och licenser som möjligt, vilket kan stödas med modellavtal (50 k¤).

Dessutom bör vi överväga att öppna upp språkteknologiska resurser som utvecklats med offentliga medel för att bygga en nordisk språkteknologisk infrastruktur. Detta kan jämföras med att vi inte heller bygger offentligt finansierade vägar enbart för privat bruk!

Gemensamma gränssnitt och redskap bör skapas i samarbete med både kommersiella och akademiska parter. Vi bör utveckla API-standarder, kvalitetsstandarder och testmetoder för kvalitetsgranskning av färdiga moduler (15 M¤).

På nationell nivå bör det även satsas på tillämpningar och vidareutveckling för olika specialområden där de olika länderna har kärnkompetens fördelat både på grundforskning (15 M¤) och tillämpad forskning (50-80 M¤).

Utbildning och undervisning

Mera samarbete behövs kring akademisk utbildning mellan universiteten i den nordiska och baltiska regionen. Som en del av det nordiska språkteknologiska forskningsprogrammet startades NorDocNet i de fem nordiska länderna, vilket bör få en fortsättning och en utvidgning till en mera internationell dimension så som http://www.lt-world.org/ eller som en baltisk eller en gemensam nordisk-baltisk insats.

En tillräcklig mängd specialister med doktors- och kandidatexamen bör behärska de mest avancerade färdigheterna och alla regionens länder och språkgrupper bör delta inklusive minoriteter och små språkgrupper.

För att stöda utbildning och undervisning bör vi:

  • dokumentera existerande resurser (1 M¤),
  • utveckla material för undervisning av formell språkkunskap i skolorna (1 M¤),
  • producera introduktionsmaterial för att distansutbilda personalen inom IT-industrin i språkteknologi (50 k¤),
  • publicera en vetenskaplig tidskrift på internet för NEALT (50 k¤),
  • diversifiera och specialisera Master's utbildningen genom distansundervisning, utbytesprogram, och gemensamma utbildningsprogram (2 M¤),
  • koordinera doktorsutbildningen: NGSLT (1 M¤).

Lagstiftning

Nuvarande lagstiftning om kopieringsskydd gör det onödigt svårt och dyrt att samla in och annotera text- och talkorpus. Vissa privilegier ges för tillfället åt några nationella bibliotek för att arkivera elektroniska kopior av böcker, tidningar, osv. och ett liknande privilegium behövs för att skapa språkteknologiresurser. Lagstiftningen borde ändras så att det blir möjligt att samla in text- och talkorpus som används för forskning och utveckling av språkteknologiredskap. Att använda dylika korpus bör anses vara förenligt med principerna om kopieringsskydd när återpublicering av korpusen utesluts. En arbetsgrupp för att driva saken borde upprättas (10 k¤). Detta kunde göra det mera produktivt att samla tal- och textkorpus genom att garantera bredare spridning och bättre användningsmöjligheter för forskningsmaterial som samlats in av olika centra (t.ex. nationella språkbanker) eller genom att låta enskilda forskare utbyta material.

Dessutom måste vi på olika sätt motarbeta tendensen att det utfärdas programvarupatent på uppenbara eller publicerade lösningar och idéer.

Affärsverksamhet

Licensvillkoren för språkteknologiresurser måste tillåta och uppmuntra både kommersiell och akademisk användning. Tillämpad forskning på medellång sikt i samarbete mellan universitet och industri bör uppmuntras nationellt för att skapa tillämpningar som utnyttjar språkteknologi (5 M¤).

Man kunde stimulera marknaden för mera ambitiösa språkteknologiska tillämpningar genom att anslå medel för den offentliga sektorn att utveckla service med språkteknologiska hjälpmedelmedel för eget bruk (5 M¤).

Åtgärdsplan

Målet med rapporten var att identifiera nyckelområden, storleken på finansieringen, berörda parter och former för samarbete. För att förverkliga målen och för att utarbeta mer detaljerade planer och tidsramar för områdena i 10-årsplanen, föreslår vi att resurser allokeras för:

  1. etablering av NEALT och dess arbetsutskott,
  2. mandat för att utarbeta BLARK-rapporter för de nordiska språken, som inventerar existerande språkresurser och resursbehov,
  3. nordisk finansiering av samarbete inom språkteknologisk utbildning och undervisning,
  4. nationell finansiering av tillämpad forskning på medellång sikt i samarbete mellan universitet och industri.

När BLARK-rapporterna har färdigställts, bör resurser under NEALTs koordinering allokeras för:

  1. nordisk finansiering av språkteknologiska redskap baserade på BLARK-rapporternas rekommendationer,
  2. nordisk och nationell finansiering av korpus, trädbanker, och lexikon i enlighet med BLARK-rapporterna.

-- KristerLinden - 21 Aug 2006


SpråkVis - Language Technology Expert Panel Report

by Krister Lindén, Kimmo Koskenniemi och Torbjørn Nordgård

Extended Summary

The Nordic Council of Ministers has commissioned a ten-year plan in the form of an expert panel report for making the Nordic Countries a leading region in language technology (LT). Six key areas were identified: LT Policy, LT Resources, LT Research and Development, LT Training and Education, LT Legislation and LT Business Aspects, for which we present recommendations in this Expert Panel Report. Finally, we also suggest an action plan.

LT Policy

We need to raise awareness that LT has a key position for protecting and maintaining our languages and our culture. LT is necessary e.g. for developing a digital infrastructure for research in the humanities and the social sciences. It does not matter whether LT is academic, open source or commercial, as long as it exists and its modules are compatible and available for building large systems and applications. Small language communities will not get LT on a commercial basis alone, so most (or all) languages in the area need at least some public support and some may be totally dependent on it. At the Nordic level, we need to establish recommendations for the actions on the national level. To assess the situation for language-specific and language-independent resources for the languages in the area, a Basic Language Resource Kit (BLARK) report for the Nordic languages should be prepared. The Nordic region needs to stay abreast with the development in the EU in order not to duplicate efforts and in order to focus on the aspects that are specifically Nordic.The participants of the NODALIDA 2005 decided to establish an association for speech and language technology which will be called NEALT (Northern European Association for Language Technology). Such an association would be ideal for coordinating various initiatives and networks.

Action areas, where Nordic funding is needed instead of national funding, are:

  • establishing and starting NEALT and establishing a scientific electronic journal by NEALT,
  • some form of continuation for the Nordic LT documentation centers, see awareness under LT Training and Education,
  • some continuity for the NGSLT, by NordForsk, see LT Training and Education, and
  • individual small-scale projects (possibly carried out and coordinated by NEALT) e.g. to prepare more detailed recommendations for
    • altering the legislation of intellectual property rights (IPR, see LT Legislation),
    • guidelines for funding agencies to guarantee access and reuse of LT resources created with public funding (see LT Research and Development), and
    • guidelines for research and/or commercial use of dictionaries and word lists created as part of publicly funded dictionary compilation (see LT Resources).

Key Area Magnitude of funding needed Parties involved Mode of cooperation
NEALT start-up 50 kEUR NMR for funding association, working groups
BLARK Report 10-25 kEUR per language NorDokNet, NEALT national projects coordinated at the Nordic level

LT Resources

The most obvious and substantial investment would be to create an appropriate infrastructure which has sufficient LT resources for relevant languages of the area. The resources belonging to the infrastructure should be freely available for research and training as well as for commercial product development. Based on the assessment of the situation in the BLARK report the most urgent gaps in availability of corpora should be filled in using national funding with cooperation on the Nordic level for developing and exchanging language-independent tools and methods.

LT modules

Both commercially and academically created LT modules need compatibility and capabilities for reusing other modules and resources. Language-independent tools can be used for creating both kinds of modules, and common API interfaces make it possible to utilize module combinations in order to facilitate interoperable and multilingual products and systems.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Openly available LT modules and common APIs 2-5 MEUR open source community, universities, public and private institutions, NEALT Nordic LT network

LT tools

Freely usable language-independent state of the art tools are needed so that investments in LT modules are not lost in the long term. Interoperable components and multilingual products and systems can be achieved through such tools. E.g. finite-state technology provides very efficient and modular implementations for a number of tasks.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Openly available LT tool 2-5 MEUR open source community, universities, public and private institutions, NEALT Nordic LT network

LT corpora

Speech and text corpora and their combinations are necessary starting points for many types of LT modules and applications. The required quantities have grown in magnitude. Different levels of annotation are necessary for various methods and research topics. The availability of corpus material is often too restricted excluding all commercial use and, at the same time, any development of LT modules. Model contracts for collections of copyright-protected corpora should be created for all countries, and these model contracts should guarantee the necessary ways to use the materials.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Model contracts 50 kEUR research organizations, lawyers, NEALT networking across countries
Corpus collection, written text 10-15 MEUR pr language universities, NEALT networking across countries
Corpus collection, spoken data 10-20 MEUR pr language universities, NEALT networking across countries

LT lexicons

Dictionary materials which have been developed with public funding ought to be published as open source material so that they can be used for creating LT modules such as parsers and analyzers. More specifically, lists of headwords annotated with part of speech and inflectional class should be made available under very free conditions permitting their use in both academic and commercial contexts. The full text of dictionaries published as books may be reserved for academic use, but there must not be limitations on further use of methods, rules or programs which have been developed using such material, provided that they do not contain parts infringing on the copyright of the original work.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Lexicon development 10 MEUR per language universities, NEALT networking across countries

LT Research and Development

The academic funding institutions ought to adopt recommendations or rules concerning linguistic resources which will be (or have been) developed using public funding. It ought to be a normal requirement that the researchers make the linguistic resources available for the rest of the research community with as free conditions or licenses as possible. In addition we may need to open up language resources on all levels (lexicons, grammars, written language corpora and speech corpora, etc.) which have been created through public funding. Common interfaces and tools must be created in cooperation between both commercial and academic parties.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Recommendations for research result materials 50 kEUR funding organizations, universities, NEALT working groups
Joint effort for standardization 15 MEUR universities, industry, NEALT academia/industry collaboration
Basic technology research 15 MEUR universities joint programme, researcher exchange, workshop, division of research tasks
R&D Funding 50-80 MEUR universities, research institutes, industry Nordic projects

The R&D funding can be further specified into various fields of services and applications for the society.

LT Training and Education

As a part of the Nordic Language Technology Research Program 2000-2004, a LT documentation centre was established in each of the five Nordic countries. Some continuation for them is needed, either in conjunction with some world-wide effort such as the LT world or as a Nordic or Nordic-Baltic effort. More cooperation is needed in academic training among the universities in the Nordic/Baltic region. A sufficient number of highly skilled PhDs and Masters ought to be trained with the best possible LT skills and all countries and language groups should be participating, including minorities and small language communities.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Nordic LT documentation 1 MEUR NMR network of LT documentation centres
NEALT Journal start-up 50 kEUR NEALT, Nordisk Publiceringsnämnd scientific electronic journal
Coordinated PhD education 1 MEUR Nordic/Baltic universities NGSLT
Master's level education 2 MEUR Nordic/Baltic universities distance education, exchange programs for teachers and students, common curriculum
Distant learning courses for commercial developers 50 kEUR Nordic/Baltic universities production of the material
Popularization 1 MEUR R&D, Government, Industry, Secondary Education professional PR assignment

LT Legislation

The development of LT tools depends on the availability of language resources such as corpora. Current copyright legislation makes the collection of resources unnecessarily difficult and costly. Certain privileges are currently granted to a few national libraries for archiving electronic copies of books, journals etc. and similar privileges are needed for creating LT resources. The legislation should be changed so that collecting, annotating and sharing of text and speech corpora for the purposes of research and development becomes easier. The use of such corpora should be deemed to conform to the principles of copyright when excluding republication. Changing the copyright legislation would make collecting corpora more productive by guaranteeing that corpora and annotated material are available for research and development purposes. Availability can be achieved either by allowing centres (such as national language banks) share materials with each other or by allowing individual researchers to share them.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Preparation of changes in the legislation 10 kEUR relevant ministries, universities, NEALT working groups

LT Business Aspects

The licensing conditions of LT resources must allow and encourage both their commercial and academic use. Medium term applied research projects together with industrial partners should continue. Funding should be provided for creating and purchasing LT applications and services for the public sector. This funding is intended to stimulate the LT service and application market uptake. Such services could include more ambitious goals using LT-enhanced applications.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
LT module uptake 5 MEUR industry and universities action plan managed at Nordic level
Web services 5 MEUR industry and universities academia/industry collaboration

Action plan

The aim of the report was to identify key areas, magnitude of funding, parties involved and modes of cooperation. However, we are still left with questions regarding further specification of the plans as well as priorities and time-frames within the 10-year period. Some answers have been sketched for the organization of the work, but more detail is needed as well as some further consideration of the division of national and Nordic funding. To implement the goals and to further specify the areas and their time-frames in the 10-year plan, we suggest the following steps in allocating resources:

  1. Establishing NEALT and its working groups
  2. Commissioning BLARK reports for the Nordic languages
  3. Nordic funding for cooperation on LT training and education
  4. National funding of medium-term applied research projects involving university and industrial partners

When the BLARK reports have been delivered, resources coordinated by NEALT should be allocated for

  1. Nordic funding of LT tools according to the recommendations of the BLARK reports
  2. Nordic and national funding of corpora, treebanks and lexicons based on the BLARK report recommendations

-- KristerLinden - 18 Jun 2006


Mandate

To assess the situation for language-specific and language-independent resources for the languages in the area, a Basic Language Resource Kit (BLARK) report for the Nordic languages should be prepared and the most urgent gaps in availability of corpora should be filled in using national funding with cooperation on the Nordic level for exchanging best practices, whereas gaps in tools and methods could be filled in using funding on a Nordic level (see LT Resources). There are plenty of gaps and they must be filled with public funding in most cases. Some languages exist in several countries and it is especially important that the allocated resources be coordinated on a Nordic level for these languages.

The Nordic region needs to stay abreast with the development in the EU in order not to duplicate efforts and focus on the aspects that are specifically Nordic. For this purpose it is important to keep contact with organizations like CLARIN, whose aim is to establish an integrated and interoperable research infrastructure of language resources and its technology by lifting the current fragmentation, offering a stable, persistent, accessible and extendable digital language infrastructure.

Comment:

  • "Språkteknologin har betydelse för att ta fram digital infrastruktur för hela det humanvetenskapliga (och till viss del också det socialvetenskapliga) forskningsområdet. Språkteknologin kan bidra med metoder och verktyg för att samla in, strukturera, märka upp, lagra, hantera och tillgängliggöra stora digitala text- och taldatabaser med betydelse för många discipliner som språkvetenskap, litteraturvetenskap, filosofi, filologi m.m. Språkteknologin kan dessutom bidra med kunskaper om hur man hittar och söker i dessa. CLARINs vision är att språkteknologin ska få en sådan nyckelroll för den humanvetenskapliga forskningens infrastruktur inom EU. Det skulle förändra synen på språkteknologi som ett udda och marginellt område till ett angeläget område med konsekvenser för den humanvetenskapliga forskningens framåtskridande. Det här är något som innebär stora möjligheter också för nordisk språkteknologi." -- Rickard Domeij

Key Area Magnitude of funding needed Parties involved Mode of cooperation
NEALT start-up 50 kEUR NMR for funding association
BLARK Report 10-25 kEUR per language NorDokNet, NEALT national projects coordinated at the Nordic level

-- KristerLinden - 12 Jun 2006


LT resources

Current situation in 2006

On the whole, there is a shortage of adequate LT resources both in terms of their quantity and quality. There are not enough speech and text corpora, especially those with proper annotation, i.e. treebanks. Programs or LT modules exist for many languages, but they are incompatible. Some necessary tools for building LT modules and parsers are not available or they have severe restrictions on their use. On the whole, the environment is far from favorable for LT research and product development.

Language resources are an essential part of the LT infrastructure, and they are necessary for building further parts of the infrastructure. Corpora and dictionaries are necessary and useful in building parsers and analyzers, and they are equally useful for statistically oriented and rule based LT methods whether they are used for academic or commercial purposes. The language resources are also needed for creating new applications and products. Furthermore, language resources are often needed for evaluating the performance and quality of applications and systems.

In most countries, there are few public funding channels suitable for building LT infrastructure and LT resources, because building LT resources are neither like machinery nor equipment, nor are they comparable to commercial product development, nor even like usual basic research. LT infrastructure is more like ongoing public service processes or road building and maintenance, so new forms of funding are needed.

Comments:

  • Obstacles are the availability of adequate language resources and the access to existing language resources.
  • The proprietary nature of many LT resources for the region's languages is a major weakness: language processing resources as well as lexica and other databases are only made available to a few persons and groups, often at very high price levels (remarkably, this also applies to resources that have been developed with public funding).
  • Existing resources are not necessarily adapted to LT purposes.
  • For further development, we need willingness to fund and maintain and renew already established resources.
  • An infrastructure to support the distribution of the language resources will also be needed, it may be centralized or distributed, but it has to be set up. This could be a Nordic effort, or it could be done at a European level (e.g. by making special agreements with ELRA, or by joining other initiatives).
  • It is also important to assist smaller language communities in building basic resources.

Currently there are ongoing efforts to create open-source runtime support for LT modules, e.g. spellers for OpenOffice. We need additional efforts to create open source tools for building LT modules. The LT modules built with open source tools can either be proprietary or open source.

Comments:

  • Business friendly open source alternatives such as MIT or LGPL licenses should be promoted.
  • We should remember that open source does not necessarily imply free of charge, it only implies access to the source code.
  • When financing research, it is important to have explicit requirements on making the results and resources available.
  • To be able to share information and speed up development the infrastructure development needs to be accompanied by analysis software and methods for easy access.
  • The announcement of an open source project does not necessarily create a community of users to take part in the development, and national funding programmes would not be sufficient to support 'various application areas', so one or two focused projects that invites (i) public funding, (ii) private funding, and (iii) public interest (i.e. a community of 'volunteers'). An example may be something like a talking robot that any user could teach new words, or new languages.
  • A coordinating function is an important prerequisite for organizing cooperation and conflicts of interest between researchers, industry, and IPR owners when making resources publicly available.
  • Assessing quality and quality assurance of LT resources and products are underdeveloped disciplines.

LT modules

Parsers, analyzers, taggers, recognizers, generators and other LT modules exist for major Nordic languages - for some languages there are even several competing modules. Most of them are proprietary and some can be licensed either for academic use or for commercial use - but usually as binaries which cannot and may not be modified. For different applications and for research, the ability to modify and tune would often be necessary. There seem to be excessive obstacles in the further development and integration of LT modules.

Comments:

  • The LT modules are often incompatible with each other using different application programming interfaces and different tags and tagging principles.
  • The further development and variation of existing LT modules for research or production purposes is mostly possible only for the owner who usually has no interest to develop the product further at its own cost and initiative. Development may be possible if a customer pays the costs.
  • Using LT modules in different applications might require changes or further developing, but this may result in a stalemate.
  • Even if the source-code is available, the LT modules are often built on different principles, using different tools.
  • Applications for a wide Nordic audience presuppose that LT modules are developed for the smaller Nordic communities (Greenlandic, Faroese, Sámi, etc.)

LT tools

The tools include generic programs for building parsers, analyzers, taggers, recognizers, generators and other LT modules. Several tools represent substantial development efforts, sometimes up to 100 person years. Currently, many widely used LT tools are proprietary. Open source tools exist, but they represent lesser efforts (maybe 2 to 5 person years per tool). Even if they are less complete and mature, their availability is guaranteed with no time limits and there are no restrictions on the use of LT modules created with them.

Comments:

  • There are no guarantees for the long term availability of proprietary tools. Even big companies may lose their interest in them while still preventing others from getting them. In the worst case, those companies may go bankrupt, and it may become extremely difficult or impossible to extend the licenses.
  • SMEs do not have enough capacity to develop good LT tools or compile full dictionaries themselves even for official languages, not to mention languages for smaller communities.
  • We lack learner tools and tools adapted to the requirements of the mobile handset industries.
  • Proprietary solutions and tools will always exist, and innovative applications will often require that new tools and methods are developed.
  • One reason why the tools are incompatible is that we disagree on what is the best solution, but the disagreement shrinks as the functionality criterion grows in importance.

LT corpora and treebanks

Corpus resources include at least written language corpora, speech corpora, and multimedia corpora combining text and/or speech with video recording. Corpora may contain annotation to varying degrees including e.g. morphological, syntactic and pragmatic information. All Nordic corpus and treebank collections are modest in their volume. Some languages lack treebanks almost entirely.

Comments:

  • Parallel texts and corpora (raw as well as annotated) are important because they are necessary in order to further develop or evaluate monolingual and multilingual lexicons, taggers, parsers, and many other resources and tools.
  • Currently one of the most significant obstacles is lack of linguistically annotated data.
  • Large annotated and manually checked corpora with e.g. syntactic and semantic information are scarce or non-existent.
  • Linguistic research is needed on spoken language varieties (registers, dialects, non-native) and on non-standard written varieties (computer-mediated communication, non-native, borderline literate)
  • Availability of other language resources, i.e. huge amounts of speech and text, are needed.

LT lexicons

Lexicons contain lexical information. In simpler cases they are just word lists containing entry words from some (possibly printed) dictionary and their part-of-speech and inflectional codes. Sometimes the full text of the word definitions is included. Dictionaries may be monolingual or bilingual. Publishers and compilers of dictionaries usually do not provide their dictionary material for academic purposes, because they fear that electronic copies of their dictionaries might be used for competing products or publications. On the whole, the lack of electronic dictionaries with sufficiently free terms for modification is severe.

Comments:

  • To the extent that there are proprietary lexicon resources, it should be considered if, and how (and to what extent) such resources can be made publicly available.
  • SMEs do not have the capacity to develop tools or dictionaries on their own even for official languages, not to mention languages for smaller communities.
  • Dictionaries for LT research and LT module development must often be created from scratch (and they remain less comprehensive). Current methods in LT can make the collection of dictionary content easier, but still, the duplication is a waste of effort.
  • Lexicon development should be done with speech technology in mind, i.e. lexicons should include phonetic information, such as a phonetic transcriptions and stress.

Vision for 2016

In 2016, a common understanding has been reached about the domain of LT infrastructure vs. applications and products, and an understanding of the roles of the public and commercial sectors has been established. The public sector has found ways to allocate the necessary and sufficient funds to develop the resources of the LT infrastructure. A relevant infrastructure has been developed for both text and speech to cover all languages and dialects in the region, and the data has been properly annotated at all levels. Building on the open-source lexicons and open-source tools, the next step would naturally be to harmonize these resources to really benefit from one another.

Recommendations

The most obvious and substantial investment would be to create an appropriate infrastructure which has the sufficient LT resources for relevant languages of the area in such a manner that they can be used freely both for research, training and for creating commercial products. The function of asessing quality and setting up quality standards should be part of the coordination and reviewing work by NEALT.

Based on the assessment of the situation in a Basic Language Resource Kit (BLARK) report for the Nordic languages the most urgent gaps in availability of corpora should be filled in using national funding with cooperation on the Nordic level for exchanging best practices, whereas gaps in tools and methods could be filled in using funding on a Nordic level. In addition, one should consider opening up language resources on all levels (lexicons, grammars, written language corpora and speech corpora, etc.) which have been created through public funding.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Basic Language Resource Kit 5-10 MEUR per language Universities, research institutes, industry, NEALT National projects coordinated at the Nordic level, exchange of researchers

These investments can be further subdivided into the areas related to LT modules, LT tools, LT corpora and LT lexicons.

LT modules

Both commercially and academically created LT modules need compatibility and capabilities for reusing other modules and resources. Language-independent tools can be used for creating both kinds of modules, and common API interfaces make it possible to utilize module combinations in order to facilitate interoperable and multilingual products and systems.

  • Distributed openly available modules and APIs
  • Interoperability of language modules and tools

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Openly available modules with common APIs 2-5 MEUR open source community, universities, public and private institutions, NEALT Nordic LT network

LT tools

Freely usable language-independent state of the art tools are needed so that investments in LT modules are not lost in the long term. Interoperable components and multilingual products and systems can be achieved through such tools. E.g. finite-state technology provides very efficient and modular implementations for a number of tasks.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Openly available LT tool 2-5 MEUR Open source community, universities, public and private institutions, NEALT Nordic LT network

LT corpora

Speech and text corpora and their combinations are necessary starting points for many types of LT modules and applications. The required quantities have grown in magnitude. Different levels of annotation are necessary for various methods and research topics. The availability of corpus material is often too restricted excluding all commercial use and, at the same time, any development of LT modules. Changing the copyright legislation would make the collecting, annotating and sharing of corpora for research purposes more fruitful, see LT Legislation.

Model contracts for collections of copyright-protected corpora should be created for all countries, and these model contracts should guarantee the necessary ways to use the materials including:

  • sufficient rights for the end users to create LT modules and other results (which do not infringe on the copyright of the works),
  • permission to create LT modules both for academic and for commercial purposes,
  • ability to deposit the compiled corpus with one (or a restricted number of) computing centre(s) protecting the corpora from unauthorized access, and
  • permission to use the corpora according to an agreement granted by the compiling party.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Model contracts 50 kEUR Research organizations, lawyers, NEALT Networking across countries
Corpus collection, written text 10-15 MEUR pr language Universities, NEALT Networking across countries
Corpus collection, spoken data 10-20 MEUR pr language Universities, NEALT Networking across countries

LT lexicons

Dictionaries which have been developed with public funding ought to be published as open source material so that they can be used for creating LT modules such as parsers and analyzers. Lexemes including the part of speech and inflectional codes as well as other mark-up should be moved to the open source domain so that anybody can alter and make use of them for research or commercial purposes. More specifically, lists of headwords annotated with part of speech and inflectional class should be made available under very free conditions permitting their use in both academic and commercial contexts. The full text of dictionaries published as books may be reserved for academic use, but there must not be limitations on further use of methods, rules or programs which have been developed using such material, provided that they do not contain parts infringing on the copyright of the original work.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Lexicon development 10 MEUR per language Universities, NEALT Networking across countries

-- KristerLinden - 12 Jun 2006


LT Research and Development

Current situation in 2006

Multilingualism and the interplay between academic and research parties makes the reuse and interoperability more difficult and demanding than what is customary in other environments. Obviously several aspects have to be taken care of:

  • Awareness of existing standards, recommendations and standardization efforts should be promoted.
  • Documentation of the resources and the annotation and coding used in them is vital.
  • Standardization of resources and APIs, as well as tools for interchange and conversion of data from one format to another should be readily available.
  • Knowledge and information for integrating LT with other technologies and design disciplines should be easily accessible.
  • Lack of low cost language resources for most small languages is a major obstacle for both research and development.
  • Lack of cooperation between different research groups is a weakness in the region (both nationally and regionally).

We need stimulating LT research for various application areas. National funding programs should provide the basis, and a Nordic/Baltic framework program for networking could provide the necessary regional infrastructure and communication.

Comments:

  • Preference should be given to research funding that integrates all research groups in a given area for a given country, or the Nordic area rather than supporting a centralized funding approach.
  • Sufficient funding for both long term (university) research and support for industrial development.
  • Good progress in the LT field needs support for joint projects and networks on the Nordic level.
  • In addition to open source, we also need open standards and publicly available APIs.

Vision for 2016

In 2016, basic tools and resources are available as open source and provide a platform for further innovation and new products due to a substantial economical effort provided from the governments in the Nordic and Baltic countries. Availability of necessary language resources improves the quality of LT research and application development and LT research and applications can develop freely in several directions in a stimulating research and business environment. Mono- and multilingual LT modules with uniform APIs for a wide array of languages are smooth and easy to integrate into software products and services. LT modules will be integrated in multimedia systems (e.g. aligned with video systems for video retrieval) and the use quality of LT systems is high, so that the citizens of the region are able to access software-mediated services in their mother tongue. Permanent LT research and development forums have been set up in the bigger Nordic countries in support of Nordic and Baltic languages with lesser volume in economic as well as human terms. For public funding of research and development projects, it is required that the projects either make the publicly funded efforts openly available or contribute resources to some ongoing open source software project.

Recommendations

The academic funding institutions ought to adopt recommendations or rules concerning linguistic resources which will be (or have been) developed using public funding. It ought to be a normal requirement that the researchers make the linguistic resources (e.g. tools and annotated corpora) available for the rest of the research community with as free conditions or licenses as possible. There ought to be a common goal in all Nordic countries to collect, produce and make available linguistic resources using terms which allow both academic use and the use of the resources for creating language technological products, even commercial ones, provided that the resources are used within the limits of copyright laws. In addition we may need to open up language resources on all levels (lexicons, grammars, written language corpora and speech corpora, etc.) which have been created through public funding. Common interfaces and tools should be created in cooperation between both commercial and academic parties.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Recommendations for research result materials 50 kEUR funding organizations, universities, NEALT working groups
Joint effort for standardization 15 MEUR universities and industry Academia/industry collaboration
Basic technology research 15 MEUR Universities Joint programme, Researcher exchange, workshop, division of research tasks
R&D Funding 50-80 MEUR Universities, Research institutes, industry Nordic projects

The R&D funding can be further specified into various fields of services and applications for the society:

  • (statistical) machine translation and automatic methods for multilingual information processing
  • information retrieval
    • public information tools adapted to the mobile life of users
    • cross-language information retrieval (CLIR) tools, focused CLIR tools for recent immigrants
    • bioinformatics
  • speech technology in multimodal applications
  • language learning

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Several 5-10 MEUR per area public bodies, research partners, industry projects

-- KristerLinden - 12 Jun 2006


LT Training and Education

Current situation in 2006

All parties, i.e. the researchers, teachers, students as well as developers of applications and products in commercial companies, need to be aware of the basic possibilities of LT and where to find resources, partners and other information. The information on contacts and references must be available with up to date facts and pointers. The Nordic area, especially the Nordic-Baltic area, is not so small that all parties would know each other in advance.

Comments:

  • In several Nordic countries, formal language knowledge in schools has been a low priority over several decades, which may hamper LT development and market uptake in the long run due to lack of basic formal linguistics skills.
  • Potential users in all sectors and walks of life must be convinced that LT is something they need. Only powerful demand from the public will make politicians prioritize the area in question.
  • It is necessary to raise public awareness about the importance of LT in our daily lives, and to get commercial companies interested in LT research and development.
  • We need commercial and industrial recognition of the advantages of LT and a broad involvement of these parties through all phases of development.
  • Development and deployment of LT modules presupposes a technical staff with a high level of competency in computational linguistics.
  • Documentation of language resources is a prerequisite, if they are to be open source. If the user does not understand the categories used, he/she will fail in the use of the data and in their further development.

Each Nordic (and Baltic) country is a rather small unit for creating curricula for Master's level and PhD level teaching in language and speech technology. Some have more established Bologna system Bachelor's and Master's level studies available, but perhaps equally many cannot offer such education in their own country. The first level PhD courses offered by the Swedish GSLT have actually been courses which could be part of a Master's program in LT, and they have been used by students from countries where LT is not offered at the MA/MSc level. By adjusting the university teaching to the needs, we may achieve better quality and wider availability of teaching and supervision on all special areas through cooperation at master's level teaching (perhaps as a Nordic/Baltic masters program beginning through cooperation between neighboring universities) and in a Nordic/Baltic PhD teaching network (NGSLT).

Comments:

  • For fruitful cooperation involving all the Nordic languages, it is necessary to create some minimal common ground by funding exchange of education.
  • One should reach people already working in the industry that will integrate LT modules, and universities must create programmes for lifelong learning in LT.
  • There is a need for cooperation in master's level teaching - both cooperation between universities and countries, and also cooperation between different fields such as linguistics, computer science, statistics, etc.
  • We should include the BA-level as well and try to develop common teaching material, compendia and curricula using the idea of a common core with local variations.

Two kinds of problems can be identified:

  1. not enough students receive the training needed for development of the LT field and
  2. unnecessarily much effort is needed for creating materials and delivering similar courses at different sites.

Vision for 2016

In 2016, skilled IT staff has a high level of LT competency for careful tuning of the modules to the application context. There is focus on language awareness and multilingual awareness in primary and secondary schools, as well as better school training in analytical and formal aspects of native and foreign languages - as a prerequisite for a strong LT competency in the upcoming generation of application builders.

Recommendations

As a part of the Nordic Language Technology Research Program 2000-2004, a LT documentation centre was established in each of the five Nordic countries. Some continuation for them is needed, either in conjunction with some world-wide effort such as the LT world or as a Nordic or Nordic-Baltic effort. In contrast to the previous effort, only a single implementation for collecting, storing and disseminating the data, would be preferable, possibly based on Wiki techniques. This would let the national units concentrate on keeping the info up to date and maintaining its accuracy. It would be quite natural to apply the best methods of LT to make this kind of information easier to access and use. Such a site might also be a showroom of the infrastructure, applications and products.

More cooperation is needed in academic training among the universities in the Nordic/Baltic region. A sufficient number of highly skilled PhDs and Masters ought to be trained to master the best skills and all countries and language groups should be participating, including the minorities and small communities:

  • Coordinated PhD education: NGSLT
  • Master's level education: Distance education, exchange programs for teachers and students, common curriculum, programming skills with LT competency
  • A set of introductory distant learning courses on LT directed to commercial developers and decision makers in all Nordic and Baltic countries.
  • Language awareness and formal language knowledge in schools: development and empirical studies in a cross-institutional framework
  • Strengthen and modernize formal mother tongue training at all levels in education: national and Nordic support at the attitude level
  • Popularization: Professional PR assignment, 'sell' the idea of diversity to a much wider audience

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Nordic LT documentation 1 MEUR NMR, NEALT network of LT documentation centres
Journal start-up 50 kEUR NEALT, Nordisk Publiceringsnämnd scientific electronic journal
Coordinated PhD education 1 MEUR Nordic/Baltic universities NGSLT
Master's level education 2 MEUR Nordic/Baltic universities Distance education, exchange programs for teachers and students, common curriculum
Distant learning courses for commercial developers 50 kEUR Nordic/Baltic universities Production of the material
Popularization 1 MEUR R&D, Government, Industry, Secondary Education Professional PR assignment

-- KristerLinden - 12 Jun 2006


LT Legislation

Current situation in 2006

The copyright and other IPR legislation has been an obstacle for collecting research materials and sharing them for academic purposes. Schemes and model contracts exist for collecting text and speech corpora, but they are laborious to use and often limit the use of the materials. Some recent changes in copyright legislation have made it even more difficult to collect and digitize material (by forgetting research and develpment uses).

Patenting of computer programs and algorithms has become harmful for LT. Early publishing of research results and applying open source policies will help in part but do not fully solve the problem. Lots of careful study and new research is needed because some patents protect the most obvious ways to solve common problems. It is beyond the financial resources of researchers and the small and medium-sized enterprises to resolve software patent conflicts even if the patent is obviously invalid.

Comments:

  • Current copyright law and IPRs are an obstacle to the creation of quality resources.
  • LT modules require complicated and costly licensing.
  • The tools for creating LT modules are difficult and costly to acquire.
  • Many development efforts are in stand still, as others will not or cannot develop proprietary resources or products owned by a competitor.

Vision for 2016

In 2016, there is legislation and an infrastructure where text and speech corpora can be freely collected, annotated and used for the purposes of research and development. The arrangements make it possible for any published source to be stored and processed for the purpose of creating research results and LT products without compromising the copyright of the source. In addition, patenting obvious ways of solving problems with programs is no longer possible, and such patents have been declared invalid.

Recommendations

The survival of cultures and languages with a relatively small number of speakers depends on the ability to use the language in daily life. This depends more and more on the availability of LT. The development of LT tools depends on the availability of language resources such as corpora. The copyright legislation should enable collecting, annotating and sharing of resources for research purposes. Currently certain privileges are granted to a few national libraries to archive electronic copies of books, journals etc. and similar privileges are needed for developing LT resources. E.g. the Finnish library for the blind has a privilege to make electronic copies of copyrighted materials for the purposes of that library. In a similar vein, it is recommended that the legislation be changed so that the collection of text and speech corpora for the purposes of research and production of LT tools is possible. The use of such corpus collections would be deemed to conform to the principles of copyright when no longer passages are republished. Changing the copyright legislation would make collecting corpora more productive by guaranteeing that corpora and annotated material are available for research and development purposes. Availability can be achieved either by allowing centres (such as national language banks) share materials with each other or by allowing individual researchers share them.

Key Area Magnitude of funding needed Parties involved Mode of cooperation
Preparation of changes in the legislation 10 kEUR Relevant Ministries, Universities, NEALT working groups

-- KristerLinden - 12 Jun 2006


LT Business Aspects

Current situation in 2006

There are quite a number of small (and medium-sized) commercial enterprises in the Nordic and the Baltic area. Many of them have an academic or research origin. Few of them are capable of major investments in LT tools or resources.

The roles of the public and the commercial sectors need clarification and their cooperation and interplay should be strengthened. The public sector needs to know its responsibility and provide adequate funding and continuity. The commercial sector is essentially needed for creating some of the products and applications. The commercial entrepreneurs use the infrastructure for building products. The infrastructure and the applications and services must meet each other in a well understood way and there must not be significant gaps between the two. The following might be a guideline for this partition:

  • Long and medium term research of LT is and will be funded by various public sources and part of it will contribute to the building of the infrastructure. The research feeds the industry with new methods and ideas for new applications.
  • Short term applied research and product development is funded by the commercial side with possible partial support from the public industrial funding agencies.
  • The development of the LT infrastructure ought to be coordinated and mostly funded by the public sector on open source principles with shared efforts from the commercial side. Collecting corpora for languages with so few speakers as the Nordic languages have is clearly a public matter for the local governments and the Nordic Council of Ministers. The initial investments in open source software tools of the infrastructure are a matter of public funding, but the later investments will be shared with the commercial players.
  • Publicly funded resources are freely available on equal terms for everybody.
  • The opportunity to be able to make money on LT IPRs must be protected to attract people and money to this field.

Comments:

  • It is also important to increase cooperation between universities and research institutes on the one hand and private companies on the other.
  • Few LT endeavors and LT entrepreneurial businesses have found the means to grow and prosper.
  • Currently the market for LT is small. We need to develop viable business models.
  • If LT is to be a viable option for attracting talent and funds, the business potential will need to be developed and represent an interesting enough prospect.

Vision for 2016

In 2016, the availability of compatible LT modules and interfaces give the software industry and the service providers in the Nordic/Baltic region a competitive edge in the global market place, by facilitating the process of tailoring products and services to language-specific requirements in new international markets. The Nordic language councils continue their long and successful cooperation and have extended this to cooperation with LT companies. Applications develop freely in a business-friendly environment, but applications to the benefit of people with special needs, e.g. the elderly and impaired may develop in a non-competitive environment with public support.

The principles of open source are widely understood and various parties are aware of the practices. Commercial enterprises have adopted viable business strategies for living side by side with and benefiting from the open source efforts, which are seen as an important part of the third sector in the language communities of the Nordic/Baltic region. We have viable business models for sustaining the LT business despite small market sizes and the limited availability of common resources. The joint efforts in the Nordic countries have resulted in healthy industries that can support applications in all Nordic languages with a command of spontaneous spoken interaction.

Recommendations

The licensing conditions of LT resources must allow and encourage both their commercial and academic use. Medium term applied research projects together with industrial partners should continue. Funding should be provided for creating and purchasing LT applications and services for the public sector. This funding is intended to stimulate the LT service and application market by allowing for competition (and possible cooperation) among commercial players while aiming for real and useful public service. Such services could include more ambitious goals using LT-enhanced applications.

  • Web services: tool sharing, hosted products
  • LT module distribution

Key Area Magnitude of funding needed Parties involved Mode of cooperation
LT module uptake 5 MEUR industry, universities and language councils Action plan managed at Nordic level
Web services 5 MEUR industry and universities Academia/industry collaboration

-- KristerLinden - 12 Jun 2006


Initial Action plan

The aim of the report was to identify key areas, magnitude of funding, parties involved and modes of cooperation. However, we are still left with questions regarding further specification of the plans as well as priorities and time-frames within the 10-year period. Some answers have been sketched for the organization of the work, but more detail is needed as well as some further consideration of the division of national and Nordic funding. To implement the goals and to further specify the areas and their time-frames in the 10-year plan, we suggest the following steps in allocating resources:

  1. Establishing NEALT and its working groups
  2. Commissioning BLARK reports for the Nordic languages
  3. Nordic funding for cooperation on LT training and education
  4. National funding of medium-term applied research projects involving university and industrial partners

When the BLARK reports have been delivered, resources coordinated by NEALT should be allocated for

  1. Nordic funding of LT tools according to the recommendations of the BLARK reports
  2. Nordic and national funding of corpora, treebanks and lexicons based on the BLARK report recommendations

-- KristerLinden - 18 Jun 2006


Acknowledgements

We are grateful to the following persons for contributing time and comments to this Expert Panel Report. The original ideas and contributions of the persons below can be found in an initial vision for LT in 2016 and its prerequisites as well as current obstacles for LT development and general trends influencing LT development and its applications. In the questionnaire we also asked for their recommendations. The synthesis of these opinions is that of the editors of the report.

Name Affiliation
Knut Aasrud Microsoft Norway a.s.
Lars Ahrenberg Linköping University
Eckhard Bick University of Southern Denmark
Lars Borin Dept. of Swedish Language and Språkbanken, Göteborg University
Bernt A. Bremdal CognIT a.s, Norway
Rolf Carlson KTH, Royal Technical University, Stockholm
Rickard Domeij Svenska språknämnden
Tron Espeli Research Council of Norway (Innovation Division)
Björn Gambäck SICS, Swedish Institute of Computer Science, Stockholm
Arnor Gudmundsson Ministry of Education, Science and Culture, Norway
Henrik Holmboe Aarhus School of Business
Timo Honkela Helsinki University of Technology
Jan Hoel The Norwegian language council
Janne Bondi Johannessen University of Oslo
Jussi Karlgren SICS, Stockholm
Kimmo Koskenniemi University of Helsinki
Mikko Kurimo Helsinki University of Technology, Finland
Per Langgård Oqaasileriffik, Greenland
Krister Lindén University of Helsinki
Bente Maegaard University of Copenhagen
Sjur Nørstebø Moshagen Sámi Diggi
Joakim Nivre Växjö University and Uppsala University
Torbjørn Nordgård NTNU Trondheim, Norway
Mikael Reuter Forskningscentralen för de inhemska språken, Finland
Eiríkur Rögnvaldsson University of Iceland
Koenraad de Smedt University of Bergen
Torbjørn Svendsen NTNU Trondheim, Norway
Trond Trosterud University of Tromsø, Norway
Martti Vainio Department of Speech Sciences, University of Helsinki
Martin Volk Stockholm University

We are also grateful to the Nordic Council of Ministers for sponsoring the Department of Linguistics at the University of Helsinki when working on the Report.

-- KristerLinden - 01 Jun 2006


References

Nordisk Sprogteknologisk Forskningsprogram 2000-2004. Epilog. Editor: Henrik Holmboe. Copenhagen.

Nordisk Sprogteknologi 2001/2002/2003/2004/2005. Editor: Henrik Holmboe. Copenhagen.

Språk i Norden 2006. Språkenemdene i Norden. Oslo.

Eckhard Bick. LT-tools such as parsers and corpora for 8 languages (Research tools). [http://beta.visl.sdu.dk]

Alea M. Fairchild and Bruno de Vuyst. 2004. Hot Spot Implosion: The Decline and Fall of Flanders Language Valley. [http://portal.acm.org/citation.cfm?id=962756.963192]

Survey of the State of the Art in Human Language Technology. Eds. Ron Cole, Joseph Mariani, Hans Uszkoreit, Giovanni Batista Varile, Annie Zaenen, Antonio Zampolli, Victor Zue. Cambridge University Press and Giardini 1997. [http://www.dfki.de/~hansu/HLT-Survey.pdf]

Steven Krauwer. 2003. The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap. [http://www.elsnet.org/dox/krauwer-specom2003.pdf]

Joakim Nivre and Koenraad de Smedt and Martin Volk. 2005. Treebanking in Northern Europe: A White Paper. [http://ling.uib.no/desmedt/papers/whitepaper-yearbook2004.pdf]

Hans Uszkoreit. Language Technology. A First Overview. Accessed 2006. [http://www.dfki.de/~hansu/LT.pdf]

Nordic Organizations

NordForsk - Institution for Nordic cooperation within research and research training. [http://www.nordforsk.org/index.cfm]

NorDocNet - Nordic network of documentation centers for language technology. [http://www.nordoknet.org/]

NGSLT - Nordic Graduate School of Language Technology. [http://ngslt.org/]

Language Policy Documents

Denmark

Sprog- og Taleteknologi. Ministeriet for Videnskab Teknologi og Utveckling, Danmark. [http://www.vtu.dk/cgi-bin/theme-list.cgi?theme_id=9835]

Strategisk satsning på dansk sprogteknologi. 2005. Forskningsrådet for Kultur og Kommunikation, Danmark. [http://forsk.dk/pls/portal/docs/PAGE/FORSKNINGSSTYRELSEN/FORSKNINGSSTYRELSEN_FORSIDE/DET_FRIE_FORSKNINGSRAAD/FORSKNINGSRAADET_KULTUR_KOMMUNIKATION/FKK_PUBLIKATIONER/STRATEGISK%20SATSNING.PDF]

Finland

Kieliteknologia Suomessa (Language Technology in Finland). Ed. Manne Miettinen, Report No. R02/98, CSC.

Kieliteknologian koulutuksen laajentaminen(Extending Language Technology Education), Report No 23:1999. Ministry of Education. [http://www.ling.helsinki.fi/users/koskenni/kieliteknologia/opm-raportti.html]

Puheentutkimuksen resurssit Suomessa, (Speech Research Resources in Finland). Eds. Manne Miettinen and Juhani Toivanen. 2001. CSC. [http://www.csc.fi/raportit/puhe/]

Norway

Planer og utredninger. Språkrådet, Norge. [http://www.sprakradet.no/templates/Page.aspx?id=684]

Norsk språkbank. Språkrådet, Norge. [http://www.sprakrad.no/templates/Page.aspx?id=685]

Artiklar og utgreiingar. Språkrådet, Norge. [http://www.sprakradet.no/templates/Page.aspx?id=3166]

Sweden

Språkpolitiska dokument. Språkteknologi.se. [http://sprakteknologi.se/dokument]

-- KristerLinden - 01 Jun 2006


APPENDIXES

Invited Experts for the Expert Panel Report

Country
Person Agency E-mail
Denmark
Grete Kladakis Danish Agency for Science, Technology and Innovation gk@forsk.dk
Sidse Ægidius Ministry of Science, Technology and Innovation, Dep. International ICT policy sae@vtu.dk
Jørn Lund Det Danske Sprog- og Litteraturselskab jl@dsl.dk
Bente Maegaard Center for Sprogteknologi bente@cst.dk
Børge Lindberg Aalborg Universitet, Taleteknologi lindberg@cpk.auc.dk
Henrik Holmboe Aarhus School of Business hh@asb.dk
Eckhard Bick Aarhus University lineb@hum.au.dk
Sabine Kirchmeier Hansen Copenhagen Business School ska@id.cbs.dk
Daniel Hardt Copenhagen Business School dh@id.cbs.dk
Frans Gregersen Københavns Universitet fg@hum.ku.dk
Per Langgaard Oqaasileriffik - Grønlands sprogsekretariat pela@gh.gl
Hulda Zober Holm Nordic Council of Ministers hzh@norden.org
Finland
Marja Granlund Finansministeriet, Avdelningen för utvecklande av förvaltningen marja.granlund@vm.fi
Kristiina Pietikäinen Ministry of Transport and Communications kristiina.pietikainen@mintc.fi
Anita Lehikoinen Ministry of Education anita.lehikoinen@minedu.fi
Mikael Reuter Kotimaisten kielten tutkimuskeskus mikael.reuter@kotus.fi
Gyrid Högman Ålands lyceum gyrid.hogman@lyceum.aland.fi
Matti Sihto TEKES - Finnish Funding Agency for Technology and Innovation matti.sihto@tekes.fi
Arto Mustajoki Academy of Finland anneli.pauli@aka.fi
Kimmo Koskenniemi University of Helsinki kimmo.koskenniemi@helsinki.fi
Lauri Carlson University of Helsinki, Kouvola lauri.carlson@helsinki.fi
Martti Vainio University of Helsinki martti.vainio@helsinki.fi
Helena Ahonen-Myka University of Helsinki helena.ahonen-myka@cs.helsinki.fi
Timo Honkela Technical University of Helsinki timo.honkela@hut.fi
Mikko Kurimo Technical University of Helsinki mikko.kurimo@helsinki.fi
Tero Ojanperä Nokia tero.ojanpera@nokia.com
Iceland
Gudbjörg Sigurdardottir Prime Minister's Office, Department of Information society gudbjorg.sigurdardottir@for.stjr.is
Eiríkur Rögnvaldsson Universty of Iceland eirikur@hi.is
Norway
Torbjørg Breivik Språkrådet torbjorg.breivik@sprakradet.no
Sylfest Lomheim Språkrådet lomheim@sprakradet.no
Bernt Erik Heid The Research Council of Norway beh@forskningsradet.no
Tron Espeli The Research Council of Norway te@forskningsradet.no
Eivind Lorentzen Ministry of Trade and Industry eivind.lorentzen@nhd.dep.no
Fred-Arne Ødegaard Fornyings- og administrasjonsministeriet Fred-Arne.Odegaard@fad.dep.no
Espen Dennis Kristoffersen Fornyings- og administrasjonsministeriet espen.dennis.kristoffersen@mod.dep.no
Risten Aleksandersen Sámediggi risten.aleksandersen@samediggi.no
Torbjørn Nordgård Norwegian University of Science and Technology torbjorn@hf.ntnu.no
Torbjørn Svendsen Norwegian University of Science and Technology torbjorn@iet.ntnu.no
Lars Hellan Norwegian University of Science and Technology lars.hellan@hf.ntnu.no
Jon Atle Gulla Norwegian University of Science and Technology jon.atle.gulla@idi.ntnu.no
Tor Andre Myrvoll Norwegian University of Science and Technology myrvoll@iet.ntnu.no
Helge Dyvik University of Bergen helge.dyvik@lili.uib.no
Koenraad de Smedt University of Bergen desmedt@uib.no
Britt Helle Aarskog University of Bergen brit@ifi.uib.no
Gisle Andersen University of Bergen gisle.andersen@aksis.uib.no
Janne Bondi Johannessen University of Oslo jannebj@hedda.uio.no
Jan Tore Lønning University of Oslo jtl@ifi.uio.no
Stephan Oepen University of Oslo, Stanford oe@csli.stanford.edu
Trond Trosterud University of Tromsø Trond.Trosterud@hum.uit.no
Bernt Bremdal CognIT bernt.bremdal@cognit.no
Bente Moxness LingIT bente@lingit.no
Knut Morten Aasrud Microsoft Norge knutaa@microsoft.com
Bjørn Seljebotn Nynodata bjorn@nynodata.no
Knut Kvale Telenor Taleteknologi knut.kvale@telenor.com
Sweden
Staffan Jonson Näringsdepartementet, Enheten för IT, forskning och utveckling staffan.jonson@industry.ministry.se
Rickard Domeij Språknämnden Rickard.Domeij@spraknamnden.se
Ola Karlsson Språknämnden Ola.Karlsson@spraknamnden.se
Lars Borin Göteborg University lars.borin@svenska.gu.se
Robin Cooper Göteborg University cooper@gslt.hum.gu.se
Kirsti Hansen Göteborg University kirsti.hansen@svenska.gu.se
Rolf Carlson KTH Talteknologi rolf@speech.kth.se
Lars Ahrenberg Linköping University lah@ida.liu.se
Björn Gambäck SICS - Swedish Institute of Computer Science gamback@sics.se
Jussi Karlgren SICS - Swedish Institute of Computer Science jussi.karlgren@sics.se
Martin Volk University of Stockholm volk@ling.su.se
Anna Sågvall Hein University of Uppsala anna@lingfil.uu.se
Joakim Nivre University of Växjö joakim.nivre@lingfil.uu.se
Veikko Hara TeliaSonera veikko.hara@teliasonera.com

-- KristerLinden - 18 Aug 2006


Danish LT projects

(Note. This is not necessarily an exhaustive list of projects, but it is the best we could do in the time available and initial feed-back confirms that it gives a fair view of the activities.)

Denmark 2003 2004 2005 Project
Rigsarkivet 2000     Udvikling af emnebaserede søgemuligheder til Statens Arkivers samlinger
Handelshøjskolen i København 4500     Center for Computational Modelling of Language (CMOL)
Københavns Universitet 300     Tillægsbevilling til: Den medieuafhængige tekst og den elektroniske boghandel
Syddansk Universitet 1700     Global kommunikation i danske virksomheder
Københavns Universitet 400     IDANNA - IDentifikation og ANonymisering af NAvne
Handelshøjskolen i København   750   Language technology derived from spoken language resources
Københavns Universitet   2500   Oversættelse fra leksem- til tekstniveau. Innovation via synergi mellem sprogteknologi og komparativ forskning inden for vesteuropæiske sprog
Københavns Universitet   3000   Dansk leksikalsk-semantisk ordnet (DanNet)
Roskilde Universitetscenter   420   CONTROL: CONstraint based Tools for RObust Language processing
Københavns Universitet     2718 Center for Computational Cognitive Modeling
Københavns Universitet     740 Vidensbaseret leksikalsk disambiguering
Sum kDKK 8900 6670 3458 Total 19.0 MDKK
Sum kEUR 1194 895 464 Total 2.6 MEUR

Topic revision: r14 - 2006-08-22 - KristerLinden
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback