OpenAutomata: Project Abstract

Title: Open and Language Independent Automata-Based Resource Production Methods for Common Language Research Infrastructure


The hearts of the European speak 50 - 100 languages that are too important to be ignored in research, content production or education. This project is concerned with the lack of resources needed when we implement our shared vision of a multilingual society. Due to a similar concern, the EU Commission is funding the preparatory phase of the “Common Language Resource and Language Technology Research Infrastructure (CLARIN)”, a pan-European initiative that aims to establish an interoperable and integrated research infrastructure. Finland is one of the main partners of CLARIN, and the current application would form a significant national contribution to its success. The proposed research would be based on two particular strengths of the Finnish research: language-independent finite-state technology and open-source technology.

  • Finite-state technology is very useful in natural language processing, and the applicant’s recent results are renewing it with powerful and easily implementable algorithms.
  • The use of open source technology ensures the widest applicability of the language technology infrastructure.

The purpose of the proposed basic research is to create a renewed theory of compilation of linguistic knowledge into finite-state automata and transducers.

  • The commercial grammar formalisms for finite-state morphology are based on 10-25-year old algorithms that do not take advantage of all known useful structural properties of natural language. Their complex proprietary algorithms have not been adopted in free and open-source software.
  • The new theory would be based on the applicant’s recent results that have managed to improve the elegancy and generality of compilation while using more parsimonious computational equipment. A more elegant theory means more open-source implementations and globally higher capacity to produce morphological resources.

Open-source finite-state technology based on the new theory would empower language communities to build their own morphological resources for CLARIN. Such resources produce competitive research and multi-lingual applications such as grammar checking, machine translation, computer-aided language learning and information extraction, and support multilingual education, language development, ICT localization and document management. Wider multilingual education reduces inequality, poverty and insecurity and makes a better future for our children.

-- AnssiYliJyra - 08 Oct 2008

Topic revision: r2 - 2008-10-29 - AnssiYliJyra
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback