Difference: HfstFinnishOTProsody (1 vs. 6)

Revision 62016-05-18 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

Line: 283 to 283
 
<--  
-->
-- ErikAxelson - 2011-09-23 \ No newline at end of file
Added:
>
>
META PREFERENCE name="VIEW_TEMPLATE" title="VIEW_TEMPLATE" type="Set" value="FinCLARIN.ViewFinClarinWideEngTemplate"

Revision 52013-01-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

Changed:
<
<
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst. This solution also uses lenient composition. The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
>
>
NOTE: The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen

Revision 42011-09-28 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst.

Changed:
<
<
This solution also uses lenient composition.
>
>
This solution also uses lenient composition. The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen

Revision 32011-09-26 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

Changed:
<
<
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst.
>
>
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst. This solution also uses lenient composition.
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen
Line: 15 to 16
 $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.
Changed:
<
<
>
>
 
Changed:
<
<
# -*- coding: utf-8 -*-

# finnish-ot-prosody.script

# Copyright (C) 2004 Lauri Karttunen # # This program is free software; you can redistribute it and/or modify # it under the terms of GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details.

# This script maps Finnish words into a prosodic representation # that splits the words into syllables, adds primary and secondary # stress marks, and organizes the syllables into feet. For example, # the input "ilmoittautumisesta" 'registering' (Sg. Elative) becomes # # (l.moit).(tu.tu).mi.(ss.ta) # # where the acute accent on the first vowel indicates primary stress, # the grave accents mark secondary stress and feet are enclosed in # parentheses.

# Note that this script is encoded in utf-8. To run it, # you should start fst in utf-8 mode: # # xfst -utf8 -l finnish-ot-prosody.fst

# The version of xfst that comes with the Book is not utf8-enabled # To check about the availability of a utf8-enabled version of xfst, # please write to karttunen@parc.com.

# The analysis that the script implements comes from Paul Kiparsky's paper # "Finnish Noun Inflection" in Generative Approaches to Finnic and # Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161, # CSLI Publications, 2003. It covers only the basic system presented # on pages 111-112 of the paper and does not cover the extensions # in the latter part of the paper.

# The system encoded in the script consists of a Gen function # that produces a vast number of output candidates for each input # word. The candidates are subject to nine ranked optimality # constraints: Clash, AlignLeft, MainStress, FootBin, Lapse, NonFinal, # StressToWeight, Parse, and AllFeetFirst.

# Lenient composition is used to guarantee that at least one output # form survives, no matter how suboptimal it is.

# Lenient composition is an operation, .O. (capital O), is part of the xfst # language but it is not described in the Beesley & Karttunen book. To # learn about lenient composition, please consult Karttunen's paper on # "The Proper Treatment of Optimality in Phonology".

# You may find it interesting to compare this OT implementation of # Kiparsky's analysis with a non-OT account for the same data. # See the Finnish Non-OT Prosody script.

>
>
You may find it interesting to compare this OT implementation of with a non-OT account for the same data. See the Finnish Non-OT Prosody solution.
 
Changed:
<
<
################################## DATA ##################################
>
>
Data:
 
Added:
>
>
 echo "{kalastelet} | {kalasteleminen} | {ilmoittautuminen} | {jrjestelmttmyydestns} | {kalastelemme} | {ilmoittautumisesta} | {jrjestelmllisyydellni} |
Line: 91 to 36
  {matematiikka} | {puhelimistani} | {rakastajattariansa} | {kuningas} | {kainostelijat} | {ravintolat} | {merkonomin}" | hfst-regexp2fst -f $FORMAT > FinnWords
Added:
>
>
 
Changed:
<
<
######################### BASIC DEFINITIONS #############################
>
>
Basic definitions:
 
Added:
>
>
 echo '[u | y | i]' | hfst-regexp2fst -f $FORMAT > HighV # High vowel echo '[e | o | ]' | hfst-regexp2fst -f $FORMAT > MidV # Mid vowel echo '[a | ]' | hfst-regexp2fst -f $FORMAT > LowV # Low vowel
Line: 102 to 49
 echo '[b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z]' | hfst-regexp2fst -f $FORMAT > C # Consonant
Changed:
<
<
echo '[ | | | | | | "&#769' | hfst-regexp2fst -f $FORMAT > MSV " | "́"]; echo '[ | | | | | "y&#768' | hfst-regexp2fst -f $FORMAT > SSV " | "̀" | "̀"]; echo '[MSV | SSV]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[USV | SV] ' | hfst-regexp2fst -f $FORMAT > V # Vowel
>
>
echo '[ | | | | | | ´ | ´]' | hfst-regexp2fst -f $FORMAT > MSV echo '[ | | | | | y` | ` | `]' | hfst-regexp2fst -f $FORMAT > SSV echo '[@"MSV" | @"SSV"]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[@"USV" | @"SV"] ' | hfst-regexp2fst -f $FORMAT > V # Vowel
 
Changed:
<
<
echo '[V | C]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\P+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
>
>
echo '[@"V" | @"C"]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\@"P"+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
  echo '.#. | "."' | hfst-regexp2fst -f $FORMAT > E # Edge echo '[~$"." "." ~$"."]' | hfst-regexp2fst -f $FORMAT > SB # At most one syllable boundary
Changed:
<
<
echo '[C* V]' | hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[Light P+]' | hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable

echo '[Heavy | Light]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[S & $SV]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[S & ~$SV]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[S & $MSV] ' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress echo '[S "." S]' | hfst-regexp2fst -f $FORMAT > BF # Binary foot

>
>
echo '[@"C"* @"V"]' | hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[Light @"P"+]' | hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable
 
Added:
>
>
echo '[@"Heavy" | @"Light"]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[@"S" & $@"SV"]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[@"S" & ~$@"SV"]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[@"S" & $@"MSV"] ' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress echo '[@"S" "." @"S"]' | hfst-regexp2fst -f $FORMAT > BF # Binary foot
 
Changed:
<
<
################################### GEN ##################################
>
>
Gen:
 
Added:
>
>
 # A diphthong is a combination of two unlike vowels that together form # the nucleus of a syllable. In general, Finnish diphthongs end in a high vowel. # However, there are three exceptional high-mid diphthongs: ie, uo, and y # that historically come from long ee, oo, and , respectively. # All other adjacent vowels must be separated by a syllable boundary.
Changed:
<
<
echo '[ [. .] -> "." || [HighV | MidV] _ LowV, i _ [MidV - e], u _ [MidV - o], y _ [MidV - ] ]' | hfst-regexp2fst -f $FORMAT > MarkNonDiphtongs
>
>
echo '[ [. .] -> "." || [@"HighV" | @"MidV"] _ @"LowV", i _ [@"MidV" - e], u _ [@"MidV" - o], y _ [@"MidV" - ] ]' | hfst-regexp2fst -f $FORMAT > MarkNonDiphtongs
  # The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the # lexicon.
Changed:
<
<
echo 'C* V+ C* @-> ... "." || _ C V' | hfst-regexp2fst -f $FORMAT > Syllabify
>
>
echo '@"C"* @"V"+ @"C"* @-> ... "." || _ @"C" @"V"' | hfst-regexp2fst -f $FORMAT > Syllabify
  # Optionally adds primary or secondary stress to the first vowel # of each syllable.

echo 'a (->) |, e (->) |, i (->) |, o (->) |,

Changed:
<
<
u (->) |, y (->) |"ỳ", (->) "́"|"̀", (->) "́"|"̀" || E C* _' | hfst-regexp2fst -f $FORMAT > Stress
>
>
u (->) |, y (->) |y`, (->) ´|`, (->) ´|` || @"E" @"C"* _' | hfst-regexp2fst -f $FORMAT > Stress
  # Scan the word, optionally dividing it to any combination of # unary, binary, and ternary feet. Each foot must contain at least # one stressed syllable.
Changed:
<
<
echo '[[S ("." S ("." S)) & $SS] (->) "(" ... ")" || E _ E]' | hfst-regexp2fst -f $FORMAT > Scan
>
>
echo '[[@"S" ("." @"S" ("." @"S")) & $@"SS"] (->) "(" ... ")" || @"E" _ @"E"]' | hfst-regexp2fst -f $FORMAT > Scan
  # In keeping with the idea of "richness of the base", the Gen # function produces a great number of output candidates for # even short words. Long words have millions of possible outputs.
Changed:
<
<
echo '[MarkNonDiphthongs .o. Syllabify .o. Stress .o. Scan]' | hfst-regexp2fst -f $FORMAT > Gen
>
>
echo '[@"MarkNonDiphthongs" .o. @"Syllabify" .o. @"Stress" .o. @"Scan"]' | hfst-regexp2fst -f $FORMAT > Gen
 
Changed:
<
<
######################### OT CONSTRAINTS #############################
>
>
OT constraints:
 
Added:
>
>
 # We use asterisks to mark constraint violations. Ordinary constraints # such as Lapse assign single asterisks as the violation marks and the # candidate with the fewest number is selected. Gradient constraints
Line: 177 to 127
 # candidates that violate the constraint provided that at least # one output candidate survives.
Changed:
<
<
echo '~Viol' | hfst-regexp2fst -f $FORMAT > Viol0 # No violations echo '~[Viol^2]' | hfst-regexp2fst -f $FORMAT > Viol1 # At most one violation echo '~[Viol^3]' | hfst-regexp2fst -f $FORMAT > Viol2 # At most two violations echo '~[Viol^4]' | hfst-regexp2fst -f $FORMAT > Viol3 # etc. echo '~[Viol^5]' | hfst-regexp2fst -f $FORMAT > Viol4 echo '~[Viol^6]' | hfst-regexp2fst -f $FORMAT > Viol5 echo '~[Viol^7]' | hfst-regexp2fst -f $FORMAT > Viol6 echo '~[Viol^8]' | hfst-regexp2fst -f $FORMAT > Viol7 echo '~[Viol^9]' | hfst-regexp2fst -f $FORMAT > Viol8 echo '~[Viol^10]' | hfst-regexp2fst -f $FORMAT > Viol9 echo '~[Viol^11]' | hfst-regexp2fst -f $FORMAT > Viol10 echo '~[Viol^12]' | hfst-regexp2fst -f $FORMAT > Viol11 echo '~[Viol^13]' | hfst-regexp2fst -f $FORMAT > Viol12 echo '~[Viol^14]' | hfst-regexp2fst -f $FORMAT > Viol13 echo '~[Viol^15]' | hfst-regexp2fst -f $FORMAT > Viol14 echo '~[Viol^16]' | hfst-regexp2fst -f $FORMAT > Viol15
>
>
echo '~@"Viol"' | hfst-regexp2fst -f $FORMAT > Viol0 # No violations echo '~[@"Viol"^2]' | hfst-regexp2fst -f $FORMAT > Viol1 # At most one violation echo '~[@"Viol"^3]' | hfst-regexp2fst -f $FORMAT > Viol2 # At most two violations echo '~[@"Viol"^4]' | hfst-regexp2fst -f $FORMAT > Viol3 # etc. echo '~[@"Viol"^5]' | hfst-regexp2fst -f $FORMAT > Viol4 echo '~[@"Viol"^6]' | hfst-regexp2fst -f $FORMAT > Viol5 echo '~[@"Viol"^7]' | hfst-regexp2fst -f $FORMAT > Viol6 echo '~[@"Viol"^8]' | hfst-regexp2fst -f $FORMAT > Viol7 echo '~[@"Viol"^9]' | hfst-regexp2fst -f $FORMAT > Viol8 echo '~[@"Viol"^10]' | hfst-regexp2fst -f $FORMAT > Viol9 echo '~[@"Viol"^11]' | hfst-regexp2fst -f $FORMAT > Viol10 echo '~[@"Viol"^12]' | hfst-regexp2fst -f $FORMAT > Viol11 echo '~[@"Viol"^13]' | hfst-regexp2fst -f $FORMAT > Viol12 echo '~[@"Viol"^14]' | hfst-regexp2fst -f $FORMAT > Viol13 echo '~[@"Viol"^15]' | hfst-regexp2fst -f $FORMAT > Viol14 echo '~[@"Viol"^16]' | hfst-regexp2fst -f $FORMAT > Viol15
  # This eliminates the violation marks after the candidate set has # been pruned by a constraint.

echo '{*} -> 0' | hfst-regexp2fst -f $FORMAT > Pardon

Added:
>
>
 
Changed:
<
<
########################## CONSTRAINTS ##############################
>
>
Constraints:
 
Added:
>
>
 # In this section we define nine constraints for Finnish prosody, # listed in the order of their ranking: MainStress, Clash, AlignLeft, # FootBin, Lapse, NonFinal, StressToWeight, Parse, and AllFeetFirst.
Line: 211 to 163
 # Main Stress: The primary stress in Finnish is on the first # syllable. This is an inviolable constraint.
Changed:
<
<
define MainStress [B MSS ~$MSS];
>
>
echo '[@"B" @"MSS" ~$@"MSS"]' | hfst-regexp2fst -f $FORMAT > MainStress
 

# Clash: No stress on adjacent syllables.

Changed:
<
<
define Clash SS -> ... {*} || SS B _ ;
>
>
echo '@"SS" -> ... {*} || @"SS" @"B" _ ' | hfst-regexp2fst -f $FORMAT > Clash
 

# Align-Left: The stressed syllable is initial in the foot.

Changed:
<
<
define AlignLeft SV -> ... {*} || .#. ~[?* "(" C*] _ ;
>
>
echo '@"SV" -> ... {*} || .#. ~[?* "(" @"C"*] _ ' | hfst-regexp2fst -f $FORMAT > AlignLeft
 

# Foot-Bin: Feet are minimally bimoraic and maximally bisyllabic.

Changed:
<
<
define FootBin ["(" Light ")" | "(" S ["." S]^>1] -> ... {*} ;
>
>
echo '["(" @"Light" ")" | "(" @"S" ["." @"S"]^>1] -> ... {*} ' | hfst-regexp2fst -f $FORMAT > FootBin
 

# Lapse: Every unstressed syllable must be adjacent to a stressed # syllable.

Changed:
<
<
define Lapse US -> ... {*} || [B US B] _ [B US B];
>
>
echo '@"US" -> ... {*} || [@"B" @"US" @"B"] _ [@"B" @"US" @"B"]' | hfst-regexp2fst -f $FORMAT > Lapse
 

# Non-Final: The final syllable is not stressed.

Changed:
<
<
define NonFinal SS -> ... {*} || _ ~$S .#.;
>
>
echo '@"SS" -> ... {*} || _ ~@"$S" .#.' | hfst-regexp2fst -f $FORMAT > NonFinal
 

# Stress-To-Weight: Stressed syllables are heavy.

Changed:
<
<
define StressToWeight [SS & Light] -> ... {*} || _ ")"| E;
>
>
echo '[@"SS" & @"Light"] -> ... {*} || _ ")"| @"E"' | hfst-regexp2fst -f $FORMAT > StressToWeight
 

# License-σ: Syllables are parsed into feet.

Changed:
<
<
define Parse S -> ... {*} || E _ E;
>
>
echo '@"S" -> ... {*} || @"E" _ @"E"' | hfst-regexp2fst -f $FORMAT > Parse
 

# All-Ft-Left: Every foot starts at the beginning of a # prosodic word.

Changed:
<
<
define AllFeetFirst [ "(" -> ... {*} || .#. SB _
>
>
echo '[ "(" -> ... {*} || .#. @"SB" _
  .o.
Changed:
<
<
"(" -> ... {*}^2 || .#. SB^2 _
>
>
"(" -> ... {*}^2 || .#. @"SB"^2 _
  .o.
Changed:
<
<
"(" -> ... {*}^3 || .#. SB^3 _
>
>
"(" -> ... {*}^3 || .#. @"SB"^3 _
  .o.
Changed:
<
<
"(" -> ... {*}^4 || .#. SB^4 _
>
>
"(" -> ... {*}^4 || .#. @"SB"^4 _
  .o.
Changed:
<
<
"(" -> ... {*}^5 || .#. SB^5 _
>
>
"(" -> ... {*}^5 || .#. @"SB"^5 _
  .o.
Changed:
<
<
"(" -> ... {*}^6 || .#. SB^6 _
>
>
"(" -> ... {*}^6 || .#. @"SB"^6 _
  .o.
Changed:
<
<
"(" -> ... {*}^7 || .#. SB^7 _
>
>
"(" -> ... {*}^7 || .#. @"SB"^7 _
  .o.
Changed:
<
<
"(" -> ... {*}^8 || .#. SB^8 _ ];
>
>
"(" -> ... {*}^8 || .#. @"SB"^8 _ ]' | hfst-regexp2fst -f $FORMAT > AllFeetFirst
 
Changed:
<
<
########################## Evaluation ######################################
>
>
Evaluation:
 
Changed:
<
<
echo ### Computing the prosody for FinnWords
>
>
# Computing the prosody for FinnWords
  # Some constraints can always be satisfied; some constraints are # violated many times. The limits have been chosen to produce # a unique winner in all the 25 test cases in FinnWords.
Changed:
<
<
regex [FinnWords .o. Gen
>
>
echo '[FinnWords .o. Gen
  .o. MainStress .o. Clash .O. Viol0 .o. Pardon .o. AlignLeft .O. Viol0
Line: 290 to 244
  Viol12 .O. Viol11 .O. Viol10 .O. Viol9 .O. Viol8 .O. Viol7 .O. Viol6 .O. Viol5 .O. Viol4 .O. Viol3 .O. Viol2 .O. Viol1 .O. Viol0 .o. Pardon
Changed:
<
<
];

print lower-words

>
>
]' | hfst-regexp2fst | hfst-project -p output | hfst-fst2strings
 
Changed:
<
<
# This final command produces the following output. The two errors # indicate that there is a problem in Kiparsky's analysis.
>
>
This final command produces the following output. The two errors indicate that there is a problem in Kiparsky's analysis.
 
Added:
>
>
 # (n.nit).(t.le).(m.ni).kin # (.pis).(k.li).ja # (.pet).ta.(ms.sa)
Line: 322 to 276
 # (j́r.jes).tel.(m̀l.li).syy.(dl.l).ni <===== Error # (j́r.jes).(tl.mt).t.(mỳy.des).(t̀n.s) # (j́r.jes).(tl.ml).(ls.t).m.(t̀n.t)
Changed:
<
<
>
>
 

Revision 22011-09-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

Revision 12011-09-23 - ErikAxelson

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HfstAllPages"

HFST: Finnish OT Prosody

NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst.

We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen that maps Finnish words into a prosodic representation. that splits the words into syllables, adds primary and secondary stress marks, and organizes the syllables into feet.

For more information on the representation, see the original solution. $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.

cat F | perl -pe "s/define ([^ ]*) ([^;]*)\;(.*)/echo \'\2\' \| hfst-regexp2fst -f DOLLARFORMAT \> \1 \3/;"

# -*- coding: utf-8 -*-

# finnish-ot-prosody.script

# Copyright (C) 2004 Lauri Karttunen # # This program is free software; you can redistribute it and/or modify # it under the terms of GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details.

# This script maps Finnish words into a prosodic representation # that splits the words into syllables, adds primary and secondary # stress marks, and organizes the syllables into feet. For example, # the input "ilmoittautumisesta" 'registering' (Sg. Elative) becomes # # (l.moit).(tu.tu).mi.(ss.ta) # # where the acute accent on the first vowel indicates primary stress, # the grave accents mark secondary stress and feet are enclosed in # parentheses.

# Note that this script is encoded in utf-8. To run it, # you should start fst in utf-8 mode: # # xfst -utf8 -l finnish-ot-prosody.fst

# The version of xfst that comes with the Book is not utf8-enabled # To check about the availability of a utf8-enabled version of xfst, # please write to karttunen@parc.com.

# The analysis that the script implements comes from Paul Kiparsky's paper # "Finnish Noun Inflection" in Generative Approaches to Finnic and # Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161, # CSLI Publications, 2003. It covers only the basic system presented # on pages 111-112 of the paper and does not cover the extensions # in the latter part of the paper.

# The system encoded in the script consists of a Gen function # that produces a vast number of output candidates for each input # word. The candidates are subject to nine ranked optimality # constraints: Clash, AlignLeft, MainStress, FootBin, Lapse, NonFinal, # StressToWeight, Parse, and AllFeetFirst.

# Lenient composition is used to guarantee that at least one output # form survives, no matter how suboptimal it is.

# Lenient composition is an operation, .O. (capital O), is part of the xfst # language but it is not described in the Beesley & Karttunen book. To # learn about lenient composition, please consult Karttunen's paper on # "The Proper Treatment of Optimality in Phonology".

# You may find it interesting to compare this OT implementation of # Kiparsky's analysis with a non-OT account for the same data. # See the Finnish Non-OT Prosody script.

################################## DATA ##################################

echo "{kalastelet} | {kalasteleminen} | {ilmoittautuminen} | {jrjestelmttmyydestns} | {kalastelemme} | {ilmoittautumisesta} | {jrjestelmllisyydellni} | {jrjestelmllistmtnt} | {voimisteluttelemasta} | {opiskelija} | {opettamassa} | {kalastelet} | {strukturalismi} | {onnittelemanikin} | {mki} | {perij} | {repem} | {ergonomia} | {puhelimellani} | {matematiikka} | {puhelimistani} | {rakastajattariansa} | {kuningas} | {kainostelijat} | {ravintolat} | {merkonomin}" | hfst-regexp2fst -f $FORMAT > FinnWords

######################### BASIC DEFINITIONS #############################

echo '[u | y | i]' | hfst-regexp2fst -f $FORMAT > HighV # High vowel echo '[e | o | ]' | hfst-regexp2fst -f $FORMAT > MidV # Mid vowel echo '[a | ]' | hfst-regexp2fst -f $FORMAT > LowV # Low vowel echo '[HighV | MidV | LowV]' | hfst-regexp2fst -f $FORMAT > USV # Unstressed Vowel

echo '[b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z]' | hfst-regexp2fst -f $FORMAT > C # Consonant

echo '[ | | | | | | "&#769' | hfst-regexp2fst -f $FORMAT > MSV " | "́"]; echo '[ | | | | | "y&#768' | hfst-regexp2fst -f $FORMAT > SSV " | "̀" | "̀"]; echo '[MSV | SSV]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[USV | SV] ' | hfst-regexp2fst -f $FORMAT > V # Vowel

echo '[V | C]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\P+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary

echo '.#. | "."' | hfst-regexp2fst -f $FORMAT > E # Edge echo '[~$"." "." ~$"."]' | hfst-regexp2fst -f $FORMAT > SB # At most one syllable boundary

echo '[C* V]' | hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[Light P+]' | hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable

echo '[Heavy | Light]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[S & $SV]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[S & ~$SV]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[S & $MSV] ' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress echo '[S "." S]' | hfst-regexp2fst -f $FORMAT > BF # Binary foot

################################### GEN ##################################

# A diphthong is a combination of two unlike vowels that together form # the nucleus of a syllable. In general, Finnish diphthongs end in a high vowel. # However, there are three exceptional high-mid diphthongs: ie, uo, and y # that historically come from long ee, oo, and , respectively. # All other adjacent vowels must be separated by a syllable boundary.

echo '[ [. .] -> "." || [HighV | MidV] _ LowV, i _ [MidV - e], u _ [MidV - o], y _ [MidV - ] ]' | hfst-regexp2fst -f $FORMAT > MarkNonDiphtongs

# The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the # lexicon.

echo 'C* V+ C* @-> ... "." || _ C V' | hfst-regexp2fst -f $FORMAT > Syllabify

# Optionally adds primary or secondary stress to the first vowel # of each syllable.

echo 'a (->) |, e (->) |, i (->) |, o (->) |, u (->) |, y (->) |"ỳ", (->) "́"|"̀", (->) "́"|"̀" || E C* _' | hfst-regexp2fst -f $FORMAT > Stress

# Scan the word, optionally dividing it to any combination of # unary, binary, and ternary feet. Each foot must contain at least # one stressed syllable.

echo '[[S ("." S ("." S)) & $SS] (->) "(" ... ")" || E _ E]' | hfst-regexp2fst -f $FORMAT > Scan

# In keeping with the idea of "richness of the base", the Gen # function produces a great number of output candidates for # even short words. Long words have millions of possible outputs.

echo '[MarkNonDiphthongs .o. Syllabify .o. Stress .o. Scan]' | hfst-regexp2fst -f $FORMAT > Gen

######################### OT CONSTRAINTS #############################

# We use asterisks to mark constraint violations. Ordinary constraints # such as Lapse assign single asterisks as the violation marks and the # candidate with the fewest number is selected. Gradient constraints # such as AllFeetFirst mark violations with sequences of asterisks. # The number increases with distance from the word edge.

# Every instance of * in an output candidate is a violation.

echo '${*}' | hfst-regexp2fst -f $FORMAT > Viol

# We prune candidates with "lenient composition" that eliminates # candidates that violate the constraint provided that at least # one output candidate survives.

echo '~Viol' | hfst-regexp2fst -f $FORMAT > Viol0 # No violations echo '~[Viol^2]' | hfst-regexp2fst -f $FORMAT > Viol1 # At most one violation echo '~[Viol^3]' | hfst-regexp2fst -f $FORMAT > Viol2 # At most two violations echo '~[Viol^4]' | hfst-regexp2fst -f $FORMAT > Viol3 # etc. echo '~[Viol^5]' | hfst-regexp2fst -f $FORMAT > Viol4 echo '~[Viol^6]' | hfst-regexp2fst -f $FORMAT > Viol5 echo '~[Viol^7]' | hfst-regexp2fst -f $FORMAT > Viol6 echo '~[Viol^8]' | hfst-regexp2fst -f $FORMAT > Viol7 echo '~[Viol^9]' | hfst-regexp2fst -f $FORMAT > Viol8 echo '~[Viol^10]' | hfst-regexp2fst -f $FORMAT > Viol9 echo '~[Viol^11]' | hfst-regexp2fst -f $FORMAT > Viol10 echo '~[Viol^12]' | hfst-regexp2fst -f $FORMAT > Viol11 echo '~[Viol^13]' | hfst-regexp2fst -f $FORMAT > Viol12 echo '~[Viol^14]' | hfst-regexp2fst -f $FORMAT > Viol13 echo '~[Viol^15]' | hfst-regexp2fst -f $FORMAT > Viol14 echo '~[Viol^16]' | hfst-regexp2fst -f $FORMAT > Viol15

# This eliminates the violation marks after the candidate set has # been pruned by a constraint.

echo '{*} -> 0' | hfst-regexp2fst -f $FORMAT > Pardon

########################## CONSTRAINTS ##############################

# In this section we define nine constraints for Finnish prosody, # listed in the order of their ranking: MainStress, Clash, AlignLeft, # FootBin, Lapse, NonFinal, StressToWeight, Parse, and AllFeetFirst. # For the one inviolable constraint, we assign no violation marks. # Clash, Align-Left and Foot-Bin are always satisfiable in Finnish # but we assign violation marks as not to depend on that knowledge.

# Main Stress: The primary stress in Finnish is on the first # syllable. This is an inviolable constraint.

define MainStress [B MSS ~$MSS];

# Clash: No stress on adjacent syllables.

define Clash SS -> ... {*} || SS B _ ;

# Align-Left: The stressed syllable is initial in the foot.

define AlignLeft SV -> ... {*} || .#. ~[?* "(" C*] _ ;

# Foot-Bin: Feet are minimally bimoraic and maximally bisyllabic.

define FootBin ["(" Light ")" | "(" S ["." S]^>1] -> ... {*} ;

# Lapse: Every unstressed syllable must be adjacent to a stressed # syllable.

define Lapse US -> ... {*} || [B US B] _ [B US B];

# Non-Final: The final syllable is not stressed.

define NonFinal SS -> ... {*} || _ ~$S .#.;

# Stress-To-Weight: Stressed syllables are heavy.

define StressToWeight [SS & Light] -> ... {*} || _ ")"| E;

# License-σ: Syllables are parsed into feet.

define Parse S -> ... {*} || E _ E;

# All-Ft-Left: Every foot starts at the beginning of a # prosodic word.

define AllFeetFirst [ "(" -> ... {*} || .#. SB _ .o. "(" -> ... {*}^2 || .#. SB^2 _ .o. "(" -> ... {*}^3 || .#. SB^3 _ .o. "(" -> ... {*}^4 || .#. SB^4 _ .o. "(" -> ... {*}^5 || .#. SB^5 _ .o. "(" -> ... {*}^6 || .#. SB^6 _ .o. "(" -> ... {*}^7 || .#. SB^7 _ .o. "(" -> ... {*}^8 || .#. SB^8 _ ];

########################## Evaluation ######################################

echo ### Computing the prosody for FinnWords

# Some constraints can always be satisfied; some constraints are # violated many times. The limits have been chosen to produce # a unique winner in all the 25 test cases in FinnWords.

regex [FinnWords .o. Gen .o. MainStress .o. Clash .O. Viol0 .o. Pardon .o. AlignLeft .O. Viol0 .o. FootBin .O. Viol0 .o. Pardon .o. Lapse .O. Viol3 .O. Viol2 .O. Viol1 .O. Viol0 .o. Pardon .o. NonFinal .O. Viol0 .o. Pardon .o. StressToWeight .O. Viol3 .O. Viol2 .O. Viol1 .O. Viol0 .o. Pardon .o. Parse .O. Viol3 .O. Viol2 .O. Viol1 .O. Viol0 .o. Pardon .o. AllFeetFirst .O. Viol15 .O. Viol14 .O. Viol13 Viol12 .O. Viol11 .O. Viol10 .O. Viol9 .O. Viol8 .O. Viol7 .O. Viol6 .O. Viol5 .O. Viol4 .O. Viol3 .O. Viol2 .O. Viol1 .O. Viol0 .o. Pardon ];

print lower-words

# This final command produces the following output. The two errors # indicate that there is a problem in Kiparsky's analysis.

# (n.nit).(t.le).(m.ni).kin # (.pis).(k.li).ja # (.pet).ta.(ms.sa) # (l.moit).(tu.tu).mi.(ss.ta) # (l.moit).(tu.tu).(m.nen) # (r.go).(n.mi).a # (vi.mis).te.(lt.te).le.(ms.ta) # (strk.tu).ra.(ls.mi) # (r.pe).(̀.m) # (r.vin).(t.lat) # (r.kas).ta.(jt.ta).ri.(n.sa) # (p.he).li.(ms.ta).ni # (p.he).li.(ml.la).ni # (p.ri).j # (ḿ.ki) # (mr.ko).(n.min) # (m.te).ma.(tik.ka) # (k.nin).gas # (ki.nos).(t.li).jat # (k.las).te.(lm.me) # (k.las).te.(l.mi).nen <==== Error # (k.las).(t.let) # (j́r.jes).tel.(m̀l.li).syy.(dl.l).ni <===== Error # (j́r.jes).(tl.mt).t.(mỳy.des).(t̀n.s) # (j́r.jes).(tl.ml).(ls.t).m.(t̀n.t)


<--  
-->
-- ErikAxelson - 2011-09-23
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback