Prerequisites for implementing the Vision

(Note. Some prerequisites for the initial vision were circulated among leading experts on LT in the Nordic countries and their comments on these can be found below.)

Opening up lexicon resources which have been created through public funding. Lexemes including the part of speech and inflectional codes as well as other mark-up should be moved to the open source domain so that anybody can alter and make use of them for research or commercial purposes.

Cooperation in creating open source tools for building further LT modules which, in turn, can be either proprietary or open source. Cooperation in creating open-source runtime support for the LT modules built with those tools.

Stimulating LT research for various application areas. National funding programs would provide the basis, and a Nordic/Baltic framework program for networking would provide the necessary regional infrastructure and communication.

Adjusting the university teaching to the needs. Better quality and wider availability of teaching and supervision on all special areas through cooperation at master's level teaching (perhaps as a Nordic/Baltic masters program) and in a Nordic/Baltic PhD teaching network (NGSLT).

Do you find that there are additional prerequisites for progress in LT? Are any of the above irrelevant?

(Quotes in order of submission:)

I agree and would like to add opening up language resources on all levels (lexicons, grammars, written language corpora and speech corpora, etc.) which have been created through public funding. ... I like the Nordic cooperation in PhD education. NGSLT is very good. But I am skeptical about coordinated Masters programs. I assume such coordination will happen rather on the local level (= neighboring universities).
-- Martin Volk

Education should not stop with a masters or PhD degree, but one needs to reach people already working in the industries that will integrate LT modules. Universities must create programmes for lifelong learning in HLT.
-- Koenraad de Smedt

Lexicon resources are important, but parallel texts and corpora (raw as well as annotated) are even more important because they are necessary in order to develop further monolingual and multilingual lexicons, taggers, parsers, and many other resources and tools.
-- Janne Bondi Johannessen

Alternatively, or in addition, the prospect of being able to financially benefit from language technology should not be jeopardized by opensourcing too much IP. The opportunity to be able to make money on LT IPR must be protected to attract people and money to this field.
-- Knut Aasrud

Regarding 'lexicon resources' they should be made available with no requirement for sharing additions, i.e. MIT license or Extended GPL.
-- Torbjørn Nordgård

Open source is a good idea, but the announcement of an open source project does not necessarily create a community of users to take part in the development. National funding programmes would not be sufficient to support 'various application areas'. I believe a better idea is a few (one or two) focused projects that invites (i) public funding, (ii) private funding, and in the best of all worlds (iii) public interest (i.e. a community of 'volunteers') say, something like a talking robot that any user could teach new words, or new languages. I would also add integrating LT with other technologies and design disciplines.
-- Lars Ahrenberg

Opening up (and creating) resources should include more than lexicon resources, notably corpora, possibly also grammar resources.
-- Joakim Nivre

I think all of this is very important. Especially, I want to emphasize the need for cooperation in master's level teaching - both cooperation between universities and countries, and also cooperation between different fields such as linguistics, computer science, statistics, etc. It is also important to assist smaller language communities in building basic resources, as pointed out above. Furthermore, it is necessary to raise public awareness about the importance of LT in our daily lives in the future, and to get commercial companies interested in LT research and development. It is also important to increase cooperation between universities and research institutes on one hand and private companies on the other.
-- Eiríkur Rögnvaldsson

I think we should include the BA-level as well; - and try to develop common teaching material, compendia and curricula using the idea of a common core with local variations.
-- Henrik Holmboe

If we want to change the status of Language Resources to Open Source, there is absolutely no need to limit ourselves to lexical Language Resources. Grammars, Parsers, Named Entity Recognizers, etc. are no different. So, I disagree that tools (made under the same conditions, i.e. public funding) could be left proprietary. However, the current policy of the Danish Ministries urges universities to invoice everything! We may of course recommend they behave differently. We should also remember that Open Source does not necessarily imply FREE, it only implies access to the source code. Promotion of standards would be beneficial, but is not a necessity. Documentation of Language Resources is a prerequisite if they are to be Open Source. If the user does not understand the categories used, he/she will fail in the use of the data and in their further development. An infrastructure to support the distribution of the Language Resources and tools will also be needed, it may be centralised or distributed, but it has to be set up. This could be a Nordic effort, or it could be done at a European level (e.g. by making special agreements with ELRA, or by joining other initiatives)
-- Bente Maegaard

  • Availability of other language resources, i.e. huge amounts speech and text.
  • Sufficient funding for both long term (university) research and support for industrial development.
-- Torbjørn Svendsen

I'm all for open source but open standards and open APIs are more important. Bring in the industrial players.
-- Jussi Karlgren

Preference should be given to research funding that integrates all research groups in a given area for a given country, or the Nordic area as such, rather than supporting a centralized (e.g. capital university based) funding approach.
-- Eckhard Bick

Potential users in all sectors and walks of life must be convinced that LT is something they need. Only powerful demand from the public will make politicians prioritize the area in question.
-- Jan Hoel

Standardization of resources and APIs, as well as tools for interchange and conversion of data from one format to another. Building on open-source lexicons and open-source tools, the next step would naturally be to harmonize these resources to really benefit from the available resources.
-- Sjur Nørstebø Moshagen

En samordnande funktion är en viktig förutsättning för att organisera ett samarbete, och inte minst för att överbrygga intressekonflikter och problem mellan forskare, industri och rättighetsinnehavare i tillgängliggörandet av resurser. Vid finansiering av forskning måste det finnas tydliga krav på tillgängliggörande av resultat och resurser.
-- Rickard Domeij

To the extent that there are lexicon resources established by private funds it should be considered if, and how (and to what extent) these resources could be made publicly available.
-- Tron Espeli

Commercial and industrial recognition of the advantages and broad involvement of these parties through all phases.
-- Bernt A. Bremdal

The 'go-west-maxim' that prevails at the political level and even in many academic institutions must be addressed. As long as factual policy as well as public opinion in reality sees uniformity in English as a necessity while validating diversity as a beautiful but slightly anachronistic dream in academic ivory towers LT as a roadpaver for multilingualism will never obtain the support needed.
-- Per Langgård

The development of lexicons should be done with speech technology in mind. That is, lexicons should include phonetic information, such as a phonetic transcriptions and stress.
-- Martti Vainio

A good progress in the LT field needs support for joint projects and networks on the Nordic level. To be able to share information and speed up development the infrastructure development needs to be accompanied by analysis software and methods for easy access. This is a research topic by itself.
-- Rolf Carlson

  • Opening up all kinds of linguistic resources (not only lexicons): corpora, grammars, speech databases, lexicons, etc.
  • Linguistic research on spoken language varieties (registers, dialects, non-native) and on non-standard written varieties (CMC, non-native, borderline literate)
-- Lars Borin

Obviously, the first two points apply to all types of language processing resources - and are very important. Whether national funding is important is a question which depends on which roads the EU research funding takes.
-- Björn Gambäck

