SpråkVis - Language Technology Expert Panel Report

by Krister Lindén, Kimmo Koskenniemi och Torbjørn Nordgård

Summary

The Nordic Council of Ministers has commissioned a ten-year plan in the form of an expert panel report for making the Nordic Countries a leading region in language technology (LT). Six key areas were identified: LT Policy, LT Resources, LT Research and Development, LT Training and Education, LT Legislation and LT Business Aspects, for which we present recommendations and an action plan in this Expert Panel Report.

LT Policy: We need to raise awareness that LT has a key position for protecting and maintaining our languages and our culture. LT is necessary e.g. for developing a digital infrastructure for research in the humanities and the social sciences. It does not matter whether LT is academic, open source or commercial, as long as it exists and its modules are compatible and available for building large systems and applications. Small language communities will not get LT on a commercial basis alone, so most (or all) languages in the area need at least some public support and some may be totally dependent on it. At the Nordic level, we need to establish recommendations for the actions on the national level. To assess the situation for language-specific and language-independent resources for the languages in the area, a Basic Language Resource Kit (BLARK) report for the Nordic languages should be prepared. The Nordic region needs to stay abreast with the development in the EU in order not to duplicate efforts and in order to focus on the aspects that are specifically Nordic. The participants of the NODALIDA 2005 decided to establish an association for speech and language technology which will be called NEALT (Northern European Association for Language Technology). Such an association would be ideal for coordinating various initiatives and networks.

LT Resources: The most obvious and substantial investment would be to create an appropriate infrastructure which has sufficient LT resources for relevant languages of the area. The resources belonging to the infrastructure should be freely available for research and training as well as for commercial product development. Based on the assessment of the situation in the BLARK report the most urgent gaps in availability of corpora should be filled in using national funding with cooperation on the Nordic level for developing and exchanging language-independent tools and methods.

LT Research and Development: The academic funding institutions ought to adopt recommendations or rules concerning linguistic resources which will be (or have been) developed using public funding. It ought to be a normal requirement that the researchers make the linguistic resources available for the rest of the research community with as free conditions or licenses as possible. Common interfaces and tools must be created in cooperation between both commercial and academic parties.

LT Training and Education: More cooperation is needed in academic training among the universities in the Nordic/Baltic region. A sufficient number of highly skilled PhDs and Masters ought to be trained with the best possible LT skills and all countries and language groups should be participating, including minorities and small language communities.

LT Legislation: Current copyright legislation makes the collection of resources unnecessarily difficult and costly. Certain privileges are currently granted to a few national libraries for archiving electronic copies of books, journals etc. and similar privileges are needed for creating LT resources. The legislation should be changed so that the collection of text and speech corpora for the purposes of research and development is possible. The use of such corpora should be deemed to conform to the principles of copyright when excluding republication.

LT Business Aspects: The licensing conditions of LT resources must allow and encourage both their commercial and academic use. Medium term applied research projects involving university and industrial partners should be encouraged.

Action Plan: The aim of the report was to identify key areas, magnitude of funding, parties involved and modes of cooperation. To implement the goals and to further specify the areas and their time-frames in the 10-year plan, we suggest that resources are allocated for:

  1. Establishing of NEALT and its working groups
  2. Commissioning BLARK reports for the Nordic languages
  3. Nordic funding for cooperation on LT training and education
  4. National funding of medium-term applied research projects involving university and industrial partners

When the BLARK reports have been delivered, resources coordinated by NEALT should be allocated for

  1. Nordic funding of LT tools according to the recommendations of the BLARK reports
  2. Nordic and national funding of corpora, treebanks and lexicons based on the BLARK report recommendations

-- KristerLinden - 01 Jun 2006

Topic revision: r10 - 2006-08-21 - KristerLinden
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback