OMorFi: Demo Outline

The analyzer and generator demos on the internet are intended to demonstate our capabilities to morphologically analyze and generate various languages. Some of the languages also contain a component for guessing paradigms of unknown words in the language.

Below is a specification of the two demos depending on whether the language has a statistical paradigm guessing component.

No Guessing Component

Analysis

Input: kokeilla

Output:

base form paradigm tags analysis tags
koe 48-d nominal, plural, adessive
kokeilla 67 verb, active, a-infintive, singular, lative
kokeilla 67 verb, passive, indicative, present, 4th person, negative
kokki 5-a nominal, plural, adessive

Method:

hanalyse(Input)

See: HAnalyseMethod

Generation

Input: kokeilla

Output:

base form paradigm tags model forms
kokeilla 67 kokeilla, kokeilen, kokeili, kokeilisi, kokeillee, kokeilkoon, kokeillut, kokeiltiin

Method:

for Analysis in hanalyse(Input): 
    if(Analysis.baseform=Input.baseform): 
        hgenerate(Analysis)

See: HGenerateMethod

With guessing component

Analysis

Input: xkokeilla

Output:

base form paradigm tags analysis tags
xkoe 48-d nominal, plural, adessive
xkokeilla 67 verb, active, a-infintive, singular, lative
xkokeilla 67 verb, passive, indicative, present, 4th person, negative
xkokki 5-a nominal, plural, adessive
...  

Method:

Analyses = hanalyse(Input)
if(len(Analyses)=0): 
    hguess(Input)
else Analyses

See: HGuessMethod, HAnalyseMethod

Generation

Input: xkokeilla

Output:

base form paradigm tags model forms
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Method:

Analyses = hanalyse(Input)
if(len(Analyses)=0): 
    for Analysis in hguess(Input): 
        if(Analysis.baseform=Input.baseform): 
            hgenerate(Analysis)
else for Analysis in Analyses: 
    if(Analysis.baseform=Input.baseform): 
        hgenerate(Analysis)

See: HAnalyseMethod, HGuessMethod, HGenerateMethod


NOTE. The demo only prints unique Output lines, i.e. a line is printed only the first time it appears as the order may be significant

ISSUE: Given two analysis strings A and B, the uniqueness of them is not trivially solvable, e.g. for (my current and past implementations of) Finnish the following:

talon<wb>poika<noun><10><d><sg><gen>
talonpoika<noun><10><d><sg><gen>
talo<sg><gen>poika<noun><10><d><sg><gen>
talonpoika<noun><10><d><sg><acc>

are same, however, given the use of guesser, following:

talonpoika<noun><9><d><sg><gen>
talonpoika<adjective><10><d><sg><gen>

are not. Given that algorithms principally should be (natural) language agnostic, the issue is not easily solvable. The two assumed approaches are: 1. Only supporting analysis strings whose format is formally specified, and 2. forcing the linguist writing the new natural languages for demos to write analysis string equality test themself (can be done with transducers or e.g. using programming language framework that demos and other software uses, currently python).

-- TommiPirinen - 30 Apr 2008


-- KristerLinden - 22 Apr 2008

Edit | Attach | Print version | History: r12 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2008-04-30 - TommiPirinen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback