Weaknesses in or obstacles for LT development

(Note. Some initially identified weaknesses were circulated among leading experts on LT in the Nordic countries and their comments on these can be found below.)

Presently there are LT modules for most of the languages widely used in the Nordic/Baltic area. However,

  • The LT modules are often incompatible with each other, built on different principles, using different tools.
  • The tools for creating such LT modules are difficult and costly to acquire and there is no long term guarantee for the availability of the tools.
  • No common runtime code or application interface for the Nordic/Baltic and the major world languages exist. For modules built with some proprietary tools, the runtime requires complicated and costly licensing.

The further development and variation of existing LT modules for research and production purposes is mostly possible only for the owner. Proprietary LT modules can be licensed for research and development purposes, but not improved or altered by the researchers or others.

SMEs do not have the capacity to develop tools or dictionaries on their own even for official languages, not to mention minority languages. Many efforts are in stand still, as others will not or cannot develop proprietary resources or products owned by a competitor.

Are there other significant obstacles you know should be removed to realize the vision? Are any of the above of lesser importance?

(Quotes in order of submission:)

I wholeheartedly agree with the above. LT modules with clear interfaces are urgently needed. Moreover we need large annotated and manually checked corpora with syntactic and semantic information.
-- Martin Volk

Development and deployment of LT modules in different contexts presupposes a technical staff with a high level of competency in computer linguistics, a solid schooling in the LT modules capabilities and limitations, as well as profound knowledge of the application needs. Applications for a wide and inclusive Nordic audience presuppose that new LT modules are developed for the lesser languages (Greenlandic, Faroese, Sámi, etc.)
-- Koenraad de Smedt

LT endeavours and LT entrepreneurial businesses have not found the means to grow and prosper. A solid business potential is currently not visible, outside certain areas where the public invest money to seed development and create tools to remedy problems. If LT is to be a viable option for attracting talent and funds, the business potential will need to be developed and represent an interesting enough prospect.
-- Knut Aasrud

Proprietary solutions and tools will always exist, and innovative applications will often require that new tools and methods are developed. Again, the most significant obstacle is lack of linguistic data for these languages, not tools and standardized APIs.
-- Torbjørn Nordgård

  1. So called 'lesser used languages' e.g. minority languages in the Nordic countries do not have sufficient LT resources, not even in terms of data.
  2. Copyright law and IPRs (or perhaps rather the actions of copyright holders) is an obstacle to the creation of quality resources.
-- Lars Ahrenberg

One problem is that some of the smaller language communities in the area still do not have all basic LT modules and resources. It is just as expensive to build these modules and resources for the small language communities as for the larger ones, and enough national funding for such development may not be available. For fruitful cooperation involving all the languages in question to be possible, it is necessary to create some minimal common ground, and that means that the smaller language communities need some external support in the beginning. This support can be in the form of direct funding from Nordic funds or programs, but it can also involve exchange of research and knowledge.
-- Eiríkur Rögnvaldsson

For further development, we need willingness to fund and maintain and renew already established resources.
-- Henrik Holmboe

I believe that we do not always know the existence of all language resources and tools, because there is no incentive to make such information available, and there is no common format (metadata) for it.
-- Bente Maegaard

Lack of low cost language resources for most small languages is a major obstacle for both research and development.
-- Torbjørn Svendsen

Speech tools. Learner tools. Tools adapted to requirements of the mobile handset industries (desktop interaction will continue to grow but at a lesser rate than other interaction modes!)
-- Jussi Karlgren

In several Nordic countries, formal language knowledge in schools has been a low priority over several decades. This can potentially affect the recruiting base for LT-related education and research in the adult Nordic community.
-- Eckhard Bick

What you say about tools is good. One reason why the tools are incompatible is that we disagree on what is the best solution. The disagreement shrinks as the functionality criterion grows in importance, though. The accessibility of linguistic resources is a further obstacle.
-- Trond Trosterud

It is necessary to convince politicians that LT is vital for the viability, and even survival, of 'smaller' languages, even more so today than only a few decades ago. In this context it is also of paramount importance that politicians with budgetary power are made to realize that coordinated and publicly financed efforts to accumulate large language resource banks are vital nationally, and that development and use of standardized and interoperable technical methods is a prerequisite in a Nordic context.
-- Jan Hoel

Upphovsrättsliga frågor och licensavtal är ett stort problem, särskilt vad gäller publicerat material i elektronisk form. Det finns ingen samlad information om vilka resurser som finns och hur de är tillgängliga. Mycket är heller inte anpassat för språkteknologiska ändamål. Det saknas bra metoder för att bedöma och kvalitetssäkra språkteknologiska resurser och produkter, särskilt ur språkligt perspektiv.
-- Rickard Domeij

Both 'the availability of adequate language resources' and 'the access to existing language resources' could also be listed as obstacles.
-- Tron Espeli

Lack of industrial recognition of Nordic language capacities does not provide the necessary focus for R&D, product development and standardization, which seems to be driven from abroad, primarily the US.
-- Bernt A. Bremdal

Small market for LT and a need to develop viable business models.
-- Arnor Gudmundsson

There are two kinds of incompatibility, one that has more to do with software and one that has more to do with (conceptualizations of) knowledge of language and linguistic interaction. Both must be addressed.
-- Lars Borin

The two main obstacle to progress in LT in the Nordic region have always been two:

  1. the proprietary nature of the LT resources for the region's languages: language processing resources as well as lexica and other databases are only made available to a few persons and groups, and at what's often ridiculously high price levels (most amazingly, this also applies to resources that have been developed with public funding), and
  2. the lack of cooperation between different research groups in the region (both nationally and regionally).
-- Björn Gambäck

-- KristerLinden - 01 Jun 2006

Topic revision: r8 - 2006-06-11 - KristerLinden
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback