OMorFi: Outline of tasks for the lexicon compiler

OMorFi lexicons and rule sets

A first version of the Finnish lexicon has been developed by TommiPirinen, but additional material from the KOTUS word list needs to be added, see e.g. OMorFiRoadmap and some activities in the HyClt240s2007.

The current SFST rule and regular expression compiler only applies single rules. The compilation of rule sets will be investigated by MiikkaSilfverberg.

Notational variants

In order to promote migration to the lexicon compiler, we need to modify the notational format of SFST to something slightly more readable and more standard like XFST. In the long-run we also wish to have weigted transducers, so we may as well look at useful notations for weigted FST expressions, e.g. WFST.

Rule, expression and replace compilation

Currently the rule, expression and replace compilation are implemented in the SFST package using FST library routines. They will remain so with the addition of handling weights using libraries for weighted transducers. The rule, replace and expression compilation is the core contribution of SFST to our project. In this code, some of the rule compilation formulae originated from AnssiYliJyra.

Library of FST routines

We currently aim at comparing the performance of three different FST libraries using the OMorFi lexicon by creating an API through which we can plug in the SFST, OpenFST and Vaucanson code libraries into the lexicon compiler. This will be done by ErikAxelson. The goal is to choose one of the weighted transducer libraries in case they are not significantly slower than the current SFST code library. More information at OMorFiWithOpenFst.

-- KristerLinden - 19 Oct 2007
Topic revision: r4 - 2008-03-17 - ErikAxelson
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback