Difference: OMorFiDemo (1 vs. 12)

Revision 122008-05-05 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: HDemo Outline

Line: 146 to 146
 Analysis.baseform='talonpoika' Analysis.paradigm='<10>' Analysis.tags=''
Added:
>
>

Web Demo API Interface

The following code is used for calling the web interface:

#! /usr/bin/env python
# -*- coding: utf8 -*-

from omorcgidemo import *

def main():
    initialize_variables(
        analyser         = './OMorFiAnalyser',
        generator        = './OMorFiGenerator',
        guesser          = './OMorFiGuesser',
        language         = 'Finnish',
        lang_code        = 'fi',
        title            = 'Omorfi - Demo of Finnish Morphology',
        lexicon_source   = 'http://kaino.kotus.fi/sanat/nykysuomi/',
        lexicon_name     = 'Nykysuomen sanalista',
        script           = 'omorfi-cgi-demo.py')
    interact()

main()
 

Revision 112008-05-02 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: HDemo Outline

Line: 127 to 127
 talonpoika <10>
Changed:
<
<
where the lines will be interpreted by HDemo as:
>
>
The lines will be interpreted by HDemo as:
 
Analysis.baseform='talonpoika'  Analysis.paradigm='<10><d>'  Analysis.tags='<noun><sg><gen>'
Analysis.baseform='talonpoika'  Analysis.paradigm='<10><d>'  Analysis.tags='<noun><sg><acc>'
Added:
>
>
It is possible to indicate word and morpheme boundaries in HAnalyse with a '|' without affecting the interpretation of the analysis:

<base>talon|poika</base> <par><10><d></par> <anl><noun><sg><gen></anl>

This line will also be interpreted by HDemo as:

Analysis.baseform='talonpoika'  Analysis.paradigm='<10><d>'  Analysis.tags='<noun><sg><gen>'
 

Revision 102008-05-01 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: HDemo Outline

Line: 51 to 51
  Method:
for Analysis in HAnalyse(Word,OmorXXAnalyser): 
Changed:
<
<
if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)):
>
>
if Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output:
  Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)
Line: 76 to 76
  Method:
Analyses = HAnalyse(Word,OmorXXAnalyser)
Changed:
<
<
if(not Analyses):
>
>
if not Analyses:
  HGuess(Word,OmorXXGuesser)

See: HGuess, HAnalyse

Line: 92 to 92
  Method:
Analyses = HAnalyse(Word,OmorXXAnalyser)
Changed:
<
<
if(not Analyses):
>
>
if not Analyses :
  for Analysis in HGuess(Word,OmorXXGuesser):
Changed:
<
<
if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)):
>
>
if Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output:
  Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator) else: for Analysis in Analyses:
Changed:
<
<
if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)):
>
>
if Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output:
  Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)

Revision 92008-04-30 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: HDemo Outline

Line: 8 to 8
 of unknown words in the language.

Ideally, all language related information is encoded in the transducers:

Changed:
<
<
OmorXXAnalyser, OmorXXWeightedAnalyser and OmorXXGenerator
>
>
OmorXXAnalyser, OmorXXGuesser and OmorXXGenerator
 (where XX is the language code) run with the hfst software. However, the lexicographer may have created language-specific components for some other purpose that can be reused, in which case the HAnalyse, HGuess and HGenerate methods may be implemented
Line: 76 to 76
  Method:
Analyses = HAnalyse(Word,OmorXXAnalyser)
Changed:
<
<
if(len(Analyses)=0): HGuess(Word,OmorXXWeightedAnalyser) else Analyses
>
>
if(not Analyses): HGuess(Word,OmorXXGuesser)
  See: HGuess, HAnalyse
Line: 93 to 92
  Method:
Analyses = HAnalyse(Word,OmorXXAnalyser)
Changed:
<
<
if(len(Analyses)=0): for Analysis in HGuess(Word,OmorXXWeightedAnalyser):
>
>
if(not Analyses): for Analysis in HGuess(Word,OmorXXGuesser):
  if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)): Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)
Changed:
<
<
else for Analysis in Analyses:
>
>
else: for Analysis in Analyses:
  if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)): Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)

Revision 82008-04-30 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"
Changed:
<
<

OMorFi: Demo Outline

>
>

OMorFi: HDemo Outline

  The analyzer and generator demos on the internet are intended to demonstate our capabilities to morphologically analyze and generate various languages. Some of the languages also contain a component for guessing paradigms of unknown words in the language.
Added:
>
>
Ideally, all language related information is encoded in the transducers: OmorXXAnalyser, OmorXXWeightedAnalyser and OmorXXGenerator (where XX is the language code) run with the hfst software. However, the lexicographer may have created language-specific components for some other purpose that can be reused, in which case the HAnalyse, HGuess and HGenerate methods may be implemented as language dependent shell scripts transforming the existing output to the required format. In this case, the shell scripts are better named e.g. OmorXXAnalyse, OmorXXGuess and OmorXXGenerate, but they are still expected to take the same input and produce output in the same format as the generic methods.
 Below is a specification of the two demos depending on whether the language has a statistical paradigm guessing component.
Added:
>
>
 

No Guessing Component

Analysis

Line: 25 to 36
 
kokki 5-a nominal, plural, adessive

Method:

Changed:
<
<
hanalyse(Input)
>
>
HAnalyse(Word,OmorXXAnalyser)
 
Changed:
<
<
See: HAnalyseMethod
>
>
See: HAnalyse
 

Generation

Changed:
<
<
Input: kokeilla
<-- Input: kokeilla 67 -->
>
>
Input: kokeilla
  Output:
Line: 40 to 50
 
kokeilla 67 kokeilla, kokeilen, kokeili, kokeilisi, kokeillee, kokeilkoon, kokeillut, kokeiltiin

Method:

Changed:
<
<
for Analysis in hanalyse(Input): if(Analysis.baseform=Input.baseform): hgenerate(Analysis)
>
>
for Analysis in HAnalyse(Word,OmorXXAnalyser): if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)): Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)
 
Changed:
<
<
<-- if(Input.baseform=Analysis.baseform and (Input.paradigm=Analysis.paradigm or Input.paradigm="")):  -->
>
>
NOTE. A word form may have several analyses with an identical base form and paradigm combination, for which only one model word is output
 
Changed:
<
<
See: HGenerateMethod
>
>
See: HGenerate
 

With guessing component

Line: 64 to 75
 
...  

Method:

Changed:
<
<
Analyses = hanalyse(Input)
>
>
Analyses = HAnalyse(Word,OmorXXAnalyser)
 if(len(Analyses)=0):
Changed:
<
<
hguess(Input)
>
>
HGuess(Word,OmorXXWeightedAnalyser)
 else Analyses
Changed:
<
<
See: HGuessMethod, HAnalyseMethod
>
>
See: HGuess, HAnalyse
 

Generation

Changed:
<
<
Input: xkokeilla
<-- Input: xkokeilla 67 -->
>
>
Input: xkokeilla
  Output:
Line: 82 to 92
 
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Method:

Changed:
<
<
Analyses = hanalyse(Input)
>
>
Analyses = HAnalyse(Word,OmorXXAnalyser)
 if(len(Analyses)=0):
Changed:
<
<
for Analysis in hguess(Input): if(Analysis.baseform=Input.baseform): hgenerate(Analysis)
>
>
for Analysis in HGuess(Word,OmorXXWeightedAnalyser): if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)): Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)
 else for Analysis in Analyses:
Changed:
<
<
if(Analysis.baseform=Input.baseform): hgenerate(Analysis)
>
>
if(Analysis.baseform=Word.baseform and not [Analysis.baseform, Analysis.paradigm] in Output)): Output = Output + Analysis.baseform, Analysis.paradigm HGenerate(Analysis,OmorXXGenerator)
 
Changed:
<
<
<-- if(Input.baseform=Analysis.baseform and (Input.paradigm=Analysis.paradigm or Input.paradigm="")): -->
>
>
NOTE. A word form may have several analyses with an identical base form and paradigm combination, for which only one model word is output
 
Changed:
<
<
See: HAnalyseMethod, HGuessMethod, HGenerateMethod
>
>
See: HAnalyse, HGuess, HGenerate
 
Changed:
<
<

>
>

On Equivalent Base Forms

 
Changed:
<
<
NOTE. The demo only prints unique Output lines, i.e. a line is printed only the first time it appears as the order may be significant
>
>
The demo only prints unique Output lines for unique base form and paradigm combinations, but it is up to the lexicographer to decided what base forms are equivalent.
 
Changed:
<
<
ISSUE: Given two analysis strings A and B, the uniqueness of them is not trivially solvable, e.g. for (my current and past implementations of) Finnish the following:
>
>
A lexicon may give several analyses for a word form, e.g. "talonpojan":
 
talon<wb>poika<noun><10><d><sg><gen>
Line: 109 to 120
 talonpoika<10>
Changed:
<
<
are same, however, given the use of guesser, following:
>
>
If the lexicographer wishes these to be considered equivalent, the lexicon output should be unified in HAnalyse (or in that case preferably OmorXXAnalyse), outputting e.g.
 
Changed:
<
<
talonpoika<9> talonpoika<10>
>
>
talonpoika <10> talonpoika <10>
 
Changed:
<
<
are not. Given that algorithms principally should be (natural) language agnostic, the issue is not easily solvable. The two assumed approaches are: 1. Only supporting analysis strings whose format is formally specified, and 2. forcing the linguist writing the new natural languages for demos to write analysis string equality test themself (can be done with transducers or e.g. using programming language framework that demos and other software uses, currently python).
>
>
where the lines will be interpreted by HDemo as:
 
Changed:
<
<
-- TommiPirinen - 30 Apr 2008
>
>
Analysis.baseform='talonpoika'  Analysis.paradigm='<10><d>'  Analysis.tags='<noun><sg><gen>'
Analysis.baseform='talonpoika'  Analysis.paradigm='<10><d>'  Analysis.tags='<noun><sg><acc>'
 

Revision 72008-04-30 - TommiPirinen

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 99 to 99
  NOTE. The demo only prints unique Output lines, i.e. a line is printed only the first time it appears as the order may be significant
Added:
>
>
ISSUE: Given two analysis strings A and B, the uniqueness of them is not trivially solvable, e.g. for (my current and past implementations of) Finnish the following:

talon<wb>poika<noun><10><d><sg><gen>
talonpoika<noun><10><d><sg><gen>
talo<sg><gen>poika<noun><10><d><sg><gen>
talonpoika<noun><10><d><sg><acc>

are same, however, given the use of guesser, following:

talonpoika<noun><9><d><sg><gen>
talonpoika<adjective><10><d><sg><gen>

are not. Given that algorithms principally should be (natural) language agnostic, the issue is not easily solvable. The two assumed approaches are: 1. Only supporting analysis strings whose format is formally specified, and 2. forcing the linguist writing the new natural languages for demos to write analysis string equality test themself (can be done with transducers or e.g. using programming language framework that demos and other software uses, currently python).

-- TommiPirinen - 30 Apr 2008

 

Revision 62008-04-24 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 18 to 18
  Output:
Added:
>
>
base form paradigm tags analysis tags
 
koe 48-d nominal, plural, adessive
kokeilla 67 verb, active, a-infintive, singular, lative
kokeilla 67 verb, passive, indicative, present, 4th person, negative
Line: 30 to 31
 

Generation

Changed:
<
<
Input: kokeilla
>
>
Input: kokeilla
<-- Input: kokeilla 67 -->
  Output:
Added:
>
>
base form paradigm tags model forms
 
kokeilla 67 kokeilla, kokeilen, kokeili, kokeilisi, kokeillee, kokeilkoon, kokeillut, kokeiltiin

Method:

for Analysis in hanalyse(Input): 
Changed:
<
<
if(Analysis.baseform=Input):
>
>
if(Analysis.baseform=Input.baseform):
  hgenerate(Analysis)
Added:
>
>
<-- if(Input.baseform=Analysis.baseform and (Input.paradigm=Analysis.paradigm or Input.paradigm="")):  -->
 See: HGenerateMethod

With guessing component

Line: 51 to 56
  Output:
Added:
>
>
base form paradigm tags analysis tags
 
xkoe 48-d nominal, plural, adessive
xkokeilla 67 verb, active, a-infintive, singular, lative
xkokeilla 67 verb, passive, indicative, present, 4th person, negative
Line: 67 to 73
 

Generation

Changed:
<
<
Input: xkokeilla
>
>
Input: xkokeilla
<-- Input: xkokeilla 67 -->
  Output:
Added:
>
>
base form paradigm tags model forms
 
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Method:

Analyses = hanalyse(Input)
if(len(Analyses)=0): 
    for Analysis in hguess(Input): 
Changed:
<
<
if(Analysis.baseform=Input):
>
>
if(Analysis.baseform=Input.baseform):
  hgenerate(Analysis) else for Analysis in Analyses:
Changed:
<
<
if(Analysis.baseform=Input):
>
>
if(Analysis.baseform=Input.baseform):
  hgenerate(Analysis)
Added:
>
>
<-- if(Input.baseform=Analysis.baseform and (Input.paradigm=Analysis.paradigm or Input.paradigm="")): -->
 See: HAnalyseMethod, HGuessMethod, HGenerateMethod


Revision 52008-04-24 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 58 to 58
 
...  

Method:

Changed:
<
<
if(len(hanalyse(Input)=0): hguess(Input)
>
>
Analyses = hanalyse(Input) if(len(Analyses)=0): hguess(Input) else Analyses
  See: HGuessMethod, HAnalyseMethod
Line: 72 to 74
 
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Method:

Changed:
<
<
if(len(hanalyse(Input))=0):
>
>
Analyses = hanalyse(Input) if(len(Analyses)=0):
  for Analysis in hguess(Input): if(Analysis.baseform=Input):
Added:
>
>
hgenerate(Analysis) else for Analysis in Analyses: if(Analysis.baseform=Input):
  hgenerate(Analysis)

See: HAnalyseMethod, HGuessMethod, HGenerateMethod

Revision 42008-04-24 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 26 to 26
 Method:
hanalyse(Input)
Added:
>
>
See: HAnalyseMethod
 

Generation

Input: kokeilla

Line: 39 to 41
  if(Analysis.baseform=Input): hgenerate(Analysis)
Added:
>
>
See: HGenerateMethod
 

With guessing component

Analysis

Line: 54 to 58
 
...  

Method:

Changed:
<
<
if(no hanalyse(Input)): hwanalyse(Input)
>
>
if(len(hanalyse(Input)=0): hguess(Input)
 
Changed:
<
<
NOTE. Need to modify current hparadigm to output not only paradigms but also the potential analysis, i.e. hwanalyse
>
>
See: HGuessMethod, HAnalyseMethod
 

Generation

Line: 68 to 72
 
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Method:

Changed:
<
<
if(no hanalyse(Input)): for Analysis in hwanalyse(Input):
>
>
if(len(hanalyse(Input))=0): for Analysis in hguess(Input):
  if(Analysis.baseform=Input): hgenerate(Analysis)
Added:
>
>
See: HAnalyseMethod, HGuessMethod, HGenerateMethod
 

NOTE. The demo only prints unique Output lines, i.e. a line is printed only the first time it appears as the order may be significant

Revision 32008-04-23 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 23 to 23
 
kokeilla 67 verb, passive, indicative, present, 4th person, negative
kokki 5-a nominal, plural, adessive
Changed:
<
<
Method: hanalyse
>
>
Method:
hanalyse(Input)
 

Generation

Line: 33 to 34
 
kokeilla 67 kokeilla, kokeilen, kokeili, kokeilisi, kokeillee, kokeilkoon, kokeillut, kokeiltiin
Changed:
<
<
Methods: if(hanalyse=Input) -> hgenerate
>
>
Method:
for Analysis in hanalyse(Input): 
    if(Analysis.baseform=Input): 
        hgenerate(Analysis)
 

With guessing component

Line: 49 to 53
 
xkokki 5-a nominal, plural, adessive
...  
Changed:
<
<
Methods: if(hanalyse=0): hwanalyse
>
>
Method:
if(no hanalyse(Input)): 
    hwanalyse(Input)
 
Changed:
<
<
Need to modify hparadigm to output not only paradigms but also the potential analysis, i.e. hwanalyse
>
>
NOTE. Need to modify current hparadigm to output not only paradigms but also the potential analysis, i.e. hwanalyse
 

Generation

Line: 61 to 67
 
xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin
Changed:
<
<
Methods: if(hanalyse=0): if(hwanalyse=Input) -> hgenerate
>
>
Method:
if(no hanalyse(Input)): 
    for Analysis in hwanalyse(Input): 
        if(Analysis.baseform=Input): 
            hgenerate(Analysis)


NOTE. The demo only prints unique Output lines, i.e. a line is printed only the first time it appears as the order may be significant

 

Revision 22008-04-23 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

Line: 10 to 10
 Below is a specification of the two demos depending on whether the language has a statistical paradigm guessing component.
Changed:
<
<

No guessing component

>
>

No Guessing Component

 

Analysis

Changed:
<
<
kokeilla
>
>
Input: kokeilla
 
Changed:
<
<
koe48-d nominal, plural, adessive
kokeilla67 verb, active, a-infintive, singular, lative
kokeilla67 verb, passive, indicative, present, 4th person, negative
kokki5-a nominal, plural, adessive
>
>
Output:
 
Added:
>
>
koe 48-d nominal, plural, adessive
kokeilla 67 verb, active, a-infintive, singular, lative
kokeilla 67 verb, passive, indicative, present, 4th person, negative
kokki 5-a nominal, plural, adessive

Method: hanalyse

 

Generation

Added:
>
>
Input: kokeilla

Output:

kokeilla 67 kokeilla, kokeilen, kokeili, kokeilisi, kokeillee, kokeilkoon, kokeillut, kokeiltiin

Methods: if(hanalyse=Input) -> hgenerate

 

With guessing component

Analysis

Added:
>
>
Input: xkokeilla

Output:

xkoe 48-d nominal, plural, adessive
xkokeilla 67 verb, active, a-infintive, singular, lative
xkokeilla 67 verb, passive, indicative, present, 4th person, negative
xkokki 5-a nominal, plural, adessive
...  

Methods: if(hanalyse=0): hwanalyse

Need to modify hparadigm to output not only paradigms but also the potential analysis, i.e. hwanalyse

 

Generation

Added:
>
>
Input: xkokeilla

Output:

xkokeilla 67 xkokeilla, xkokeilen, xkokeili, xkokeilisi, xkokeillee, xkokeilkoon, xkokeillut, xkokeiltiin

Methods: if(hanalyse=0): if(hwanalyse=Input) -> hgenerate

 

Revision 12008-04-22 - KristerLinden

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="OMorFiHome"

OMorFi: Demo Outline

The analyzer and generator demos on the internet are intended to demonstate our capabilities to morphologically analyze and generate various languages. Some of the languages also contain a component for guessing paradigms of unknown words in the language.

Below is a specification of the two demos depending on whether the language has a statistical paradigm guessing component.

No guessing component

Analysis

kokeilla

koe48-d nominal, plural, adessive
kokeilla67 verb, active, a-infintive, singular, lative
kokeilla67 verb, passive, indicative, present, 4th person, negative
kokki5-a nominal, plural, adessive

Generation

With guessing component

Analysis

Generation


-- KristerLinden - 22 Apr 2008

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback