Difference: HfstFinnishProsody (1 vs. 13)

Revision 132016-05-18 - KristerLinden

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 188 to 188
 
<--  
-->
-- ErikAxelson - 2011-09-19 \ No newline at end of file
Added:
>
>
META PREFERENCE name="VIEW_TEMPLATE" title="VIEW_TEMPLATE" type="Set" value="FinCLARIN.ViewFinClarinWideEngTemplate"

Revision 122013-01-16 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Deleted:
<
<
 NOTE: The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
Deleted:
<
<
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen

Revision 112013-01-14 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Changed:
<
<
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst. The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
>
>
NOTE: The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
 

We examplify the use of HFST command line tools with an example taken from

Revision 102011-09-28 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Changed:
<
<
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst.
>
>
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst. The character ´ shows as &#180; inside the verbatim sections, probably due to a bug in KitWiki formalism.
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen
Line: 165 to 168
 (strk.tu.ra).(ls.mi) (r.kas.ta).(jt.ta.ri).(n.sa) (r.vin).(t.lat)
Changed:
<
<
(r.pe).(̀.m)
>
>
(r.pe).(`.m)
 (p.ri.j) (p.he.li).(ml.la.ni) (p.he.li).(ms.ta.ni)
Changed:
<
<
(ḿ.ki)
>
>
(m´.ki)
 (m.te.ma).(tik.ka) (mr.ko).(n.min) (ki.nos).(t.li).jt
Line: 179 to 182
 (k.las.te).(lm.me) (k.nin).gs (k.nin).gas
Changed:
<
<
(j́r.jes).(tl.ml).(ls.t.m).(t̀n.t) (j́r.jes).(tl.ml.li).(sỳy.del).(l̀.ni) (j́r.jes).(t.le).(m̀t.t).(mỳy.des).(t̀n.s)
>
>
(j´r.jes).(tl.ml).(ls.t.m).(t`n.t) (j´r.jes).(tl.ml.li).(sy`y.del).(l`.ni) (j´r.jes).(t.le).(m`t.t).(my`y.des).(t`n.s)
 


Revision 92011-09-26 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 53 to 53
 echo "[ b | c | d | f | g | h | j | k | l | m | n | p " "| q | r | s | t | v | w | x | z ]" | hfst-regexp2fst -f $FORMAT > C # Consonant
Changed:
<
<
echo '[ | | | | | | "́" | "́"]'
>
>
echo '[ | | | | | | ´ | ´]'
 | hfst-regexp2fst -f $FORMAT > MSV
Changed:
<
<
echo '[ | | | | | "ỳ" | "̀" | "̀"]'
>
>
echo '[ | | | | | y` | ` | `]'
 | hfst-regexp2fst -f $FORMAT > SSV echo '[ @"MSV" | @"SSV" ]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[ @"USV" | @"SV" ]' | hfst-regexp2fst -f $FORMAT > V # Vowel
Line: 106 to 106
 # Assign the primary stress to the first vowel of the first syllable.

echo ' a -> , e -> , i -> , o -> ,'

Changed:
<
<
'u -> , y -> , -> "́", -> "́" || .#. "(" @"C"* _'
>
>
'u -> , y -> , -> ´, -> ´ || .#. "(" @"C"* _'
 | hfst-regexp2fst -f $FORMAT > MainStress

# Assign secondary stress to all initial vowels of non-initial syllables.

echo ' a -> , e -> , i -> , o -> ,'

Changed:
<
<
'u -> , y -> "ỳ", -> "̀", -> "̀"'
>
>
'u -> , y -> y`, -> `, -> `'
  '|| "(" @"C"* _ ' | hfst-regexp2fst -f $FORMAT > SecondaryStress

# Assign an optional secondary stress to an unfooted final syllable # if it is heavy.

echo 'a (->) , e (->) , i (->) ,'

Changed:
<
<
'o (->) , u (->) , y (->) "ỳ",' ' (->) "̀", (->) "̀" || "." @"C"* _ @"P" .#. '
>
>
'o (->) , u (->) , y (->) y`,' ' (->) `, (->) `" || "." @"C"* _ @"P" .#. '
 | hfst-regexp2fst -f $FORMAT > OptFinalStress

Revision 82011-09-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 6 to 6
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen
Changed:
<
<
that maps Finnish words into a prosodic representation that splits the words into syllables, adds primary and secondary stress marks, and organizes the syllables into feet.

For more information on the representation,

>
>
that maps Finnish words into a prosodic representation. For more information on the representation,
 see the original solution. $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.

Revision 72011-09-23 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 6 to 6
  We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen
Changed:
<
<
that maps Finnish words into a prosodic representation. For more information on the representation,
>
>
that maps Finnish words into a prosodic representation that splits the words into syllables, adds primary and secondary stress marks, and organizes the syllables into feet.

For more information on the representation,

 see the original solution. $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.

Revision 62011-09-21 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Added:
>
>
NOTE: This solution does not work at the moment, because rules are not yet implemented in hfst-regexp2fst.
 We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen that maps Finnish words into a prosodic representation. For more information on the representation,

Revision 52011-09-21 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Changed:
<
<
# This script maps Finnish words into a prosodic representation
# that splits the words into syllables, adds primary and secondary
# stress marks, and organizes the syllables into feet. For example,
# the input "ilmoittautumisesta" 'registering' (Sg. Elative) becomes
#
#     (íl.moit)(tàu.tu.mi)(sès.ta)
#
# where the acute accent on the first vowel indicates primary stress,
# the grave accents mark secondary stress, and feet are enclosed in
# parentheses.

# Note that this script is encoded in utf-8. To run it,
# you should start xfst in utf-8 mode:
#
#     xfst -utf8 -l finnish-prosody.xfst

# The version of xfst that comes with the Book is not utf8-enabled
# To check about the availability of a utf8-enabled version of xfst,
# please write to karttunen@parc.com.

# The descriptive generalizations come from Paul Kiparsky's paper
# "Finnish Noun Inflection" in Generative Approaches to Finnic and
# Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161,
# CSLI Publications, 2003. Kiparsky writes (p. 111): "Speaking for the
# moment in derivational terms, Finnish stress is assigned by laying down
# binary feet from left to right. Final syllables are not stressed if
# they are light, and only optionally if they are heavy. An important
# phenomenon is the LH` effect: when the left-to-right scansion
# encounters a Light-Heavy sequence, the light syllable is skipped
# with the result that a ternary foot is formed. At the edge of a
# word the LH` effect is superseded by the inviolable requirement that
# a word must have initial stress."

# For an OT account of the same generalizations, see the Finnish OT Prosody
# script.

################################## DATA ##################################
>
>
We examplify the use of HFST command line tools with an example taken from Beesley & Karttunen that maps Finnish words into a prosodic representation. For more information on the representation, see the original solution. $FORMAT is the implementation type of the transducer. The solution given on this page can also be executed with a single script.
 
Added:
>
>
The data:
 
Added:
>
>
 echo "kalastelet kalasteleminen ilmoittautuminen
Changed:
<
<
järjestelemättömyydestänsä
>
>
jrjestelemttmyydestns
 kalastelemme ilmoittautumisesta
Changed:
<
<
järjestelmällisyydelläni järjestelmällistämätöntä
>
>
jrjestelmllisyydellni jrjestelmllistmtnt
 voimisteluttelemasta opiskelija opettamassa kalastelet strukturalismi onnittelemanikin
Changed:
<
<
mäki perijä repeämä
>
>
mki perij repem
 ergonomia puhelimellani matematiikka
Line: 71 to 40
 merkonomin" | hfst-strings2fst -f $FORMAT -j > FinnWords.hfst
Changed:
<
<
######################### BASIC DEFINITIONS #############################
>
>
Some definitions:
 
Changed:
<
<
echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV.hfst # High vowel echo "[e | o | ö]" | hfst-regexp2fst -f $FORMAT > MidV.hfst # Mid vowel echo "[a | ä]" | hfst-regexp2fst -f $FORMAT > LowV.hfst # Low vowel echo '[ @"HighV" | @"MidV" | @"LowV" ]' | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel

echo "[b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z] | hfst-regexp2fst -f $FORMAT > C.hfst # Consonant

echo '[á | é | í | ó | ú | ý | "ä́" | "ö́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[à | è | ì | ò | ù | "ỳ" | "ä̀" | "ö̀"]' | hfst-regexp2fst -f $FORMAT > SSV

>
>
echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV # High vowel echo "[e | o | ]" | hfst-regexp2fst -f $FORMAT > MidV # Mid vowel echo "[a | ]" | hfst-regexp2fst -f $FORMAT > LowV # Low vowel echo '[ @"HighV" | @"MidV" | @"LowV" ]' | hfst-regexp2fst -f $FORMAT > USV # Unstressed Vowel

echo "[ b | c | d | f | g | h | j | k | l | m | n | p " "| q | r | s | t | v | w | x | z ]" | hfst-regexp2fst -f $FORMAT > C # Consonant

echo '[ | | | | | | "́" | "́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[ | | | | | "ỳ" | "̀" | "̀"]' | hfst-regexp2fst -f $FORMAT > SSV

 echo '[ @"MSV" | @"SSV" ]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[ @"USV" | @"SV" ]' | hfst-regexp2fst -f $FORMAT > V # Vowel
Line: 101 to 72
 echo '[ @"S" & ~@"SV" ]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[ @"S" & $@"MSV" ]' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress
Changed:
<
<
echo '[ @"S" "." @"S" ] | hfst-regexp2fst -f $FORMAT > BF # Binary foot
>
>
echo '[ @"S" "." @"S" ]' | hfst-regexp2fst -f $FORMAT > BF # Binary foot
 
Changed:
<
<
######################### RULES FOR PROSODY #############################
>
>
Rules for prosody:
 
Changed:
<
<
echo '[ [. .] -> "." || [\read_file(HighV) | \read_file(MidV)] _ \read_file(LowV), # i.a, e.a i _ [\read_file(MidV) - e], # i.o, i.ö u _ [\read_file(MidV) - o], # u.e y _ [\read_file(MidV) - ö] ]' | hfst-rules2fst -f $FORMAT > MarkNonDiphtongs.hfst # y.e
>
>
echo '[ [. .] -> "." || [ @"HighV" | @"MidV" ]' '_ @"LowV",' 'i _ [@"MidV" - e],' 'u _ [@"MidV" - o],' 'y _ [@"MidV" - ] ]'| hfst-regexp2fst -f $FORMAT > MarkNonDiphtongs # y.e
  # The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the # lexicon.
Changed:
<
<
define Syllabify C* V+ C* @-> ... "." || _ C V ;
>
>
echo ' @"C"* @"V"+ @"C"* @-> ... "." || _ @"C" @"V" ' | hfst-regexp2fst -f $FORMAT > Syllabify
 
Changed:
<
<
define TernaryFeet BF "." Light @-> "(" ... ")" // [{).} | .#.] [BF "."]* _ ["." Heavy "." S ] | .#. ;
>
>
echo ' @"BF" "." @"Light" @-> "(" ... ")" ' '// [{).} | .#.] [@"BF" "."]* _' '["." @"Heavy" "." @"S" ] | .#. ' | hfst-regexp2fst -f $FORMAT > TernaryFeet
  # Scan all the unfooted material into binary feet.
Changed:
<
<
define BinaryFeet BF @-> "(" ... ")" || .#.|"." _ .#.|".";
>
>
echo ' @"BF" @-> "(" ... ")" || .#.|"." _ .#.|"." ' | hfst-regexp2fst -f $FORMAT > BinaryFeet
  # Assign the primary stress to the first vowel of the first syllable.
Changed:
<
<
define MainStress a -> á, e -> é, i -> í, o -> ó, u -> ú, y -> ý, ä -> "ä́", ö -> "ö́" || .#. "(" C* _ ;
>
>
echo ' a -> , e -> , i -> , o -> ,' 'u -> , y -> , -> "́", -> "́" || .#. "(" @"C"* _' | hfst-regexp2fst -f $FORMAT > MainStress
  # Assign secondary stress to all initial vowels of non-initial syllables.
Changed:
<
<
define SecondaryStress a -> à, e -> è, i -> ì, o -> ò, u -> ù, y -> "ỳ", ä -> "ä̀", ö -> "ö̀" || "(" C* _ ;
>
>
echo ' a -> , e -> , i -> , o -> ,' 'u -> , y -> "ỳ", -> "̀", -> "̀"' '|| "(" @"C"* _ ' | hfst-regexp2fst -f $FORMAT > SecondaryStress
  # Assign an optional secondary stress to an unfooted final syllable # if it is heavy.
Changed:
<
<
define OptFinalStress a (->) à, e (->) è, i (->) ì, o (->) ò, u (->) ù, y (->) "ỳ", ä (->) "ä̀", ö (->) "ö̀" || "." C* _ P .#.;

define FinnProsody [ MarkNonDiphthongs .o. Syllabify .o. TernaryFeet .o. BinaryFeet .o. MainStress .o. SecondaryStress .o. OptFinalStress ];

echo ### Computing [FinnWords .o. FinnProsody]

regex FinnWords .o. FinnProsody;

print lower-words

################################ END ######################################

# Here is the output produced by the script:

# (ón.nit).(tè.le).(mà.ni).kìn # (ón.nit).(tè.le).(mà.ni).kin # (ó.pet.ta).(màs.sa) # (ó.pis).(kè.li.ja) # (ér.go).(nò.mi.a) # (íl.moit).(tàu.tu).(mì.nen) # (íl.moit).(tàu.tu.mi).(sès.ta) # (vói.mis.te).(lùt.te.le).(màs.ta) # (strúk.tu.ra).(lìs.mi) # (rá.kas.ta).(jàt.ta.ri).(àn.sa) # (rá.vin).(tò.lat) # (ré.pe).(ä̀.mä) # (pé.ri.jä) # (pú.he.li).(mèl.la.ni) # (pú.he.li).(mìs.ta.ni) # (mä́.ki) # (má.te.ma).(tìik.ka) # (mér.ko).(nò.min) # (kái.nos).(tè.li).jàt # (kái.nos).(tè.li).jat # (ká.las).(tè.let) # (ká.las).(tè.le).(mì.nen) # (ká.las.te).(lèm.me) # (kú.nin).gàs # (kú.nin).gas # (jä́r.jes).(tèl.mäl).(lìs.tä.mä).(tö̀n.tä) # (jä́r.jes).(tèl.mäl.li).(sỳy.del).(lä̀.ni) # (jä́r.jes).(tè.le).(mä̀t.tö).(mỳy.des).(tä̀n.sä)

>
>
echo 'a (->) , e (->) , i (->) ,' 'o (->) , u (->) , y (->) "ỳ",' ' (->) "̀", (->) "̀" || "." @"C"* _ @"P" .#. ' | hfst-regexp2fst -f $FORMAT > OptFinalStress

Calculate the composition of rules from MarkNonDiphtongs to OptFinalStress and compose the lexicon with the composition of rules.

cp MarkNonDiphtongs Rules;
for i in
  Syllabify \
  TernaryFeet \
  BinaryFeet \
  MainStress \
  SecondaryStress \
  OptFinalStress; \
do
  cat Rules | hfst-compose $i > TMP;
  mv TMP Rules;
done

cat FinnWords | hfst-compose Rules > FinnProsody

Print the lexicon with prosody indicated.

cat FinnProsody | hfst-project -p output | hfst-fst2strings

Here is the output:

(n.nit).(t.le).(m.ni).kn
(n.nit).(t.le).(m.ni).kin
(.pet.ta).(ms.sa)
(.pis).(k.li.ja)
(r.go).(n.mi.a)
(l.moit).(tu.tu).(m.nen)
(l.moit).(tu.tu.mi).(ss.ta)
(vi.mis.te).(lt.te.le).(ms.ta)
(strk.tu.ra).(ls.mi)
(r.kas.ta).(jt.ta.ri).(n.sa)
(r.vin).(t.lat)
(r.pe).(&#768;.m)
(p.ri.j)
(p.he.li).(ml.la.ni)
(p.he.li).(ms.ta.ni)
(m&#769;.ki)
(m.te.ma).(tik.ka)
(mr.ko).(n.min)
(ki.nos).(t.li).jt
(ki.nos).(t.li).jat
(k.las).(t.let)
(k.las).(t.le).(m.nen)
(k.las.te).(lm.me)
(k.nin).gs
(k.nin).gas
(j&#769;r.jes).(tl.ml).(ls.t.m).(t&#768;n.t)
(j&#769;r.jes).(tl.ml.li).(sy&#768;y.del).(l&#768;.ni)
(j&#769;r.jes).(t.le).(m&#768;t.t).(my&#768;y.des).(t&#768;n.s)
 
-- ErikAxelson - 2011-09-19
 \ No newline at end of file
Added:
>
>
--> -- ErikAxelson - 2011-09-19

Revision 42011-09-21 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 9 to 8
 # stress marks, and organizes the syllables into feet. For example, # the input "ilmoittautumisesta" 'registering' (Sg. Elative) becomes #
Changed:
<
<
# (l.moit)(tu.tu.mi)(ss.ta)
>
>
# (íl.moit)(tàu.tu.mi)(sès.ta)
 # # where the acute accent on the first vowel indicates primary stress, # the grave accents mark secondary stress, and feet are enclosed in
Line: 41 to 40
 # script.

################################## DATA ##################################

Deleted:
<
<
 
Changed:
<
<
>
>
 echo "kalastelet kalasteleminen ilmoittautuminen
Changed:
<
<
jrjestelemttmyydestns
>
>
järjestelemättömyydestänsä
 kalastelemme ilmoittautumisesta
Changed:
<
<
jrjestelmllisyydellni jrjestelmllistmtnt
>
>
järjestelmällisyydelläni järjestelmällistämätöntä
 voimisteluttelemasta opiskelija opettamassa kalastelet strukturalismi onnittelemanikin
Changed:
<
<
mki perij repem
>
>
mäki perijä repeämä
 ergonomia puhelimellani matematiikka
Line: 76 to 75
 
echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV.hfst                          # High vowel
Changed:
<
<
echo "[e | o | ]" | hfst-regexp2fst -f $FORMAT > MidV.hfst # Mid vowel echo "[a | ]" | hfst-regexp2fst -f $FORMAT > LowV.hfst # Low vowel echo '[@"HighV") | @"MidV") | @"LowV")]' | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
>
>
echo "[e | o | ö]" | hfst-regexp2fst -f $FORMAT > MidV.hfst # Mid vowel echo "[a | ä]" | hfst-regexp2fst -f $FORMAT > LowV.hfst # Low vowel echo '[ @"HighV" | @"MidV" | @"LowV" ]' | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
 
Deleted:
<
<
 echo "[b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z] | hfst-regexp2fst -f $FORMAT > C.hfst # Consonant
Changed:
<
<
echo '[ | | | | | | "́" | "́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[ | | | | | "ỳ" | "̀" | "̀"]' | hfst-regexp2fst -f $FORMAT > SSV echo '[@"MSV") | @"SSV")]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[@"USV") | @"SV")]' | hfst-regexp2fst -f $FORMAT > V # Vowel
>
>
echo '[á | é | í | ó | ú | ý | "ä́" | "ö́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[à | è | ì | ò | ù | "ỳ" | "ä̀" | "ö̀"]' | hfst-regexp2fst -f $FORMAT > SSV echo '[ @"MSV" | @"SSV" ]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[ @"USV" | @"SV" ]' | hfst-regexp2fst -f $FORMAT > V # Vowel
 
Changed:
<
<
echo '[@"V") | @"C")]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\@"P")+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
>
>
echo '[ @"V" | @"C" ]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\@"P"+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
 echo '[.#. | "."]' | hfst-regexp2fst -f $FORMAT > E # Edge echo '[~$"." "." ~$"."]' | hfst-regexp2fst -f $FORMAT > SB # At most one syllable boundary
Changed:
<
<
echo '[@"C"* @"V")]'| hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[@"Light" @"P")+]'| hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable
>
>
echo '[ @"C"* @"V" ]'| hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[ @"Light" @"P"+ ]'| hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable
  echo '[ @"Heavy" | @"Light" ]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[@"S" & $@"SV"]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable
Line: 109 to 108
 
echo '[ [. .] -> "." || [\read_file(HighV) | \read_file(MidV)] _ \read_file(LowV), # i.a, e.a
Changed:
<
<
i _ [\read_file(MidV) - e], # i.o, i.
>
>
i _ [\read_file(MidV) - e], # i.o, i.ö
  u _ [\read_file(MidV) - o], # u.e
Changed:
<
<
y _ [\read_file(MidV) - ] ]' | hfst-rules2fst -f $FORMAT > MarkNonDiphtongs.hfst # y.e
>
>
y _ [\read_file(MidV) - ö] ]' | hfst-rules2fst -f $FORMAT > MarkNonDiphtongs.hfst # y.e
  # The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the
Line: 129 to 128
  # Assign the primary stress to the first vowel of the first syllable.
Changed:
<
<
define MainStress a -> , e -> , i -> , o -> , u -> , y -> , -> "́", -> "́" || .#. "(" C* _ ;
>
>
define MainStress a -> á, e -> é, i -> í, o -> ó, u -> ú, y -> ý, ä -> "ä́", ö -> "ö́" || .#. "(" C* _ ;
  # Assign secondary stress to all initial vowels of non-initial syllables.
Changed:
<
<
define SecondaryStress a -> , e -> , i -> , o -> , u -> , y -> "ỳ", -> "̀", -> "̀" || "(" C* _ ;
>
>
define SecondaryStress a -> à, e -> è, i -> ì, o -> ò, u -> ù, y -> "ỳ", ä -> "ä̀", ö -> "ö̀" || "(" C* _ ;
  # Assign an optional secondary stress to an unfooted final syllable # if it is heavy.
Changed:
<
<
define OptFinalStress a (->) , e (->) , i (->) , o (->) , u (->) , y (->) "ỳ", (->) "̀", (->) "̀" || "." C* _ P .#.;
>
>
define OptFinalStress a (->) à, e (->) è, i (->) ì, o (->) ò, u (->) ù, y (->) "ỳ", ä (->) "ä̀", ö (->) "ö̀" || "." C* _ P .#.;
  define FinnProsody [ MarkNonDiphthongs .o.
Line: 169 to 168
  # Here is the output produced by the script:
Changed:
<
<
# (n.nit).(t.le).(m.ni).kn # (n.nit).(t.le).(m.ni).kin # (.pet.ta).(ms.sa) # (.pis).(k.li.ja) # (r.go).(n.mi.a) # (l.moit).(tu.tu).(m.nen) # (l.moit).(tu.tu.mi).(ss.ta) # (vi.mis.te).(lt.te.le).(ms.ta) # (strk.tu.ra).(ls.mi) # (r.kas.ta).(jt.ta.ri).(n.sa) # (r.vin).(t.lat) # (r.pe).(̀.m) # (p.ri.j) # (p.he.li).(ml.la.ni) # (p.he.li).(ms.ta.ni) # (ḿ.ki) # (m.te.ma).(tik.ka) # (mr.ko).(n.min) # (ki.nos).(t.li).jt # (ki.nos).(t.li).jat # (k.las).(t.let) # (k.las).(t.le).(m.nen) # (k.las.te).(lm.me) # (k.nin).gs # (k.nin).gas # (j́r.jes).(tl.ml).(ls.t.m).(t̀n.t) # (j́r.jes).(tl.ml.li).(sỳy.del).(l̀.ni) # (j́r.jes).(t.le).(m̀t.t).(mỳy.des).(t̀n.s)
>
>
# (ón.nit).(tè.le).(mà.ni).kìn # (ón.nit).(tè.le).(mà.ni).kin # (ó.pet.ta).(màs.sa) # (ó.pis).(kè.li.ja) # (ér.go).(nò.mi.a) # (íl.moit).(tàu.tu).(mì.nen) # (íl.moit).(tàu.tu.mi).(sès.ta) # (vói.mis.te).(lùt.te.le).(màs.ta) # (strúk.tu.ra).(lìs.mi) # (rá.kas.ta).(jàt.ta.ri).(àn.sa) # (rá.vin).(tò.lat) # (ré.pe).(ä̀.mä) # (pé.ri.jä) # (pú.he.li).(mèl.la.ni) # (pú.he.li).(mìs.ta.ni) # (mä́.ki) # (má.te.ma).(tìik.ka) # (mér.ko).(nò.min) # (kái.nos).(tè.li).jàt # (kái.nos).(tè.li).jat # (ká.las).(tè.let) # (ká.las).(tè.le).(mì.nen) # (ká.las.te).(lèm.me) # (kú.nin).gàs # (kú.nin).gas # (jä́r.jes).(tèl.mäl).(lìs.tä.mä).(tö̀n.tä) # (jä́r.jes).(tèl.mäl.li).(sỳy.del).(lä̀.ni) # (jä́r.jes).(tè.le).(mä̀t.tö).(mỳy.des).(tä̀n.sä)
 
-- ErikAxelson - 2011-09-19
 \ No newline at end of file
Added:
>
>
--> -- ErikAxelson - 2011-09-19
 \ No newline at end of file

Revision 32011-09-20 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 78 to 78
 echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV.hfst # High vowel echo "[e | o | ]" | hfst-regexp2fst -f $FORMAT > MidV.hfst # Mid vowel echo "[a | ]" | hfst-regexp2fst -f $FORMAT > LowV.hfst # Low vowel
Changed:
<
<
echo "[\read_file(HighV) | \read_file(MidV) | \read_file(LowV)]" | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
>
>
echo '[@"HighV") | @"MidV") | @"LowV")]' | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
 
Line: 86 to 86
  echo '[ | | | | | | "́" | "́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[ | | | | | "ỳ" | "̀" | "̀"]' | hfst-regexp2fst -f $FORMAT > SSV
Changed:
<
<
echo "[\read_file(MSV) | \read_file(SSV)]" | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo "[\read_file(USV) | \read_file(SV)]" | hfst-regexp2fst -f $FORMAT > V # Vowel
>
>
echo '[@"MSV") | @"SSV")]' | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo '[@"USV") | @"SV")]' | hfst-regexp2fst -f $FORMAT > V # Vowel
 
Changed:
<
<
echo '[\read_file(V) | \read_file(C)]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\\read_file(P)+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
>
>
echo '[@"V") | @"C")]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\@"P")+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary
 echo '[.#. | "."]' | hfst-regexp2fst -f $FORMAT > E # Edge echo '[~$"." "." ~$"."]' | hfst-regexp2fst -f $FORMAT > SB # At most one syllable boundary
Changed:
<
<
echo '[\read_file(C)* \read_file(V)]'| hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[\read_file(Light) \read_file(P)+]'| hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable
>
>
echo '[@"C"* @"V")]'| hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[@"Light" @"P")+]'| hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable
 
Changed:
<
<
echo '[\read_file(Heavy) | \read_file(Light)]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[\read_file(S) & $\read_file(SV)]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[\read_file(S) & ~$\read_file(SV)]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[\read_file(S) & $\read_file(MSV)]' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress
>
>
echo '[ @"Heavy" | @"Light" ]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[@"S" & $@"SV"]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[@"S" & ~@"SV" ]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[@"S" & $@"MSV" ]' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress
 
Changed:
<
<
echo '[\read_file(S) "." \read_file(S)] | hfst-regexp2fst -f $FORMAT > BF # Binary foot
>
>
echo '[@"S" "." @"S"] | hfst-regexp2fst -f $FORMAT > BF # Binary foot
 

######################### RULES FOR PROSODY #############################

Revision 22011-09-19 - ErikAxelson

Line: 1 to 1
 
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

Line: 78 to 78
 echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV.hfst # High vowel echo "[e | o | ]" | hfst-regexp2fst -f $FORMAT > MidV.hfst # Mid vowel echo "[a | ]" | hfst-regexp2fst -f $FORMAT > LowV.hfst # Low vowel
Changed:
<
<
echo "[HighV | MidV | LowV]" | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
>
>
echo "[\read_file(HighV) | \read_file(MidV) | \read_file(LowV)]" | hfst-regexp2fst -f $FORMAT > USV.hfst # Unstressed Vowel
 
Changed:
<
<
define C [b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z]; # Consonant
>
>
echo "[b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z] | hfst-regexp2fst -f $FORMAT > C.hfst  # Consonant
 
Changed:
<
<
define MSV [ | | | | | | "́" | "́"]; define SSV [ | | | | | "ỳ" | "̀" | "̀"]; define SV [MSV | SSV]; # Stressed vowel define V [USV | SV] ; # Vowel

define P [V | C]; # Phone define B [[\P+] | .#.]; # Boundary define E .#. | "."; # Edge define SB [~$"." "." ~$"."]; # At most one syllable boundary

define Light [C* V]; # Light syllable define Heavy [Light P+]; # Heavy syllable

define S [Heavy | Light]; # Syllable define SS [S & $SV]; # Stressed syllable define US [S & ~$SV]; # Unstressed syllable define MSS [S & $MSV] ; # Syllable with main stress

>
>
echo '[ | | | | | | "́" | "́"]' | hfst-regexp2fst -f $FORMAT > MSV echo '[ | | | | | "ỳ" | "̀" | "̀"]' | hfst-regexp2fst -f $FORMAT > SSV echo "[\read_file(MSV) | \read_file(SSV)]" | hfst-regexp2fst -f $FORMAT > SV # Stressed vowel echo "[\read_file(USV) | \read_file(SV)]" | hfst-regexp2fst -f $FORMAT > V # Vowel

echo '[\read_file(V) | \read_file(C)]' | hfst-regexp2fst -f $FORMAT > P # Phone echo '[[\\read_file(P)+] | .#.]' | hfst-regexp2fst -f $FORMAT > B # Boundary echo '[.#. | "."]' | hfst-regexp2fst -f $FORMAT > E # Edge echo '[~$"." "." ~$"."]' | hfst-regexp2fst -f $FORMAT > SB # At most one syllable boundary

echo '[\read_file(C)* \read_file(V)]'| hfst-regexp2fst -f $FORMAT > Light # Light syllable echo '[\read_file(Light) \read_file(P)+]'| hfst-regexp2fst -f $FORMAT > Heavy # Heavy syllable

echo '[\read_file(Heavy) | \read_file(Light)]' | hfst-regexp2fst -f $FORMAT > S # Syllable echo '[\read_file(S) & $\read_file(SV)]' | hfst-regexp2fst -f $FORMAT > SS # Stressed syllable echo '[\read_file(S) & ~$\read_file(SV)]' | hfst-regexp2fst -f $FORMAT > US # Unstressed syllable echo '[\read_file(S) & $\read_file(MSV)]' | hfst-regexp2fst -f $FORMAT > MSS # Syllable with main stress

 
Changed:
<
<
define BF [S "." S]; # Binary foot
>
>
echo '[\read_file(S) "." \read_file(S)] | hfst-regexp2fst -f $FORMAT > BF # Binary foot
  ######################### RULES FOR PROSODY #############################
Changed:
<
<
define MarkNonDiphthongs [ [. .] -> "." || [HighV | MidV] _ LowV, # i.a, e.a i _ [MidV - e], # i.o, i. u _ [MidV - o], # u.e y _ [MidV - ] ]; # y.e
>
>
echo '[ [. .] -> "." || [\read_file(HighV) | \read_file(MidV)] _ \read_file(LowV), # i.a, e.a
                                           i _ [\read_file(MidV) - e],        # i.o, i.
                                           u _ [\read_file(MidV) - o],        # u.e
                                           y _ [\read_file(MidV) - ] ]' | hfst-rules2fst -f $FORMAT > MarkNonDiphtongs.hfst      # y.e
  # The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the

Revision 12011-09-19 - ErikAxelson

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HfstAllPages"

HFST: Finnish Prosody

# This script maps Finnish words into a prosodic representation
# that splits the words into syllables, adds primary and secondary
# stress marks, and organizes the syllables into feet. For example,
# the input "ilmoittautumisesta" 'registering' (Sg. Elative) becomes
#
#     (l.moit)(tu.tu.mi)(ss.ta)
#
# where the acute accent on the first vowel indicates primary stress,
# the grave accents mark secondary stress, and feet are enclosed in
# parentheses.

# Note that this script is encoded in utf-8. To run it,
# you should start xfst in utf-8 mode:
#
#     xfst -utf8 -l finnish-prosody.xfst

# The version of xfst that comes with the Book is not utf8-enabled
# To check about the availability of a utf8-enabled version of xfst,
# please write to karttunen@parc.com.

# The descriptive generalizations come from Paul Kiparsky's paper
# "Finnish Noun Inflection" in Generative Approaches to Finnic and
# Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161,
# CSLI Publications, 2003. Kiparsky writes (p. 111): "Speaking for the
# moment in derivational terms, Finnish stress is assigned by laying down
# binary feet from left to right. Final syllables are not stressed if
# they are light, and only optionally if they are heavy. An important
# phenomenon is the LH` effect: when the left-to-right scansion
# encounters a Light-Heavy sequence, the light syllable is skipped
# with the result that a ternary foot is formed. At the edge of a
# word the LH` effect is superseded by the inviolable requirement that
# a word must have initial stress."

# For an OT account of the same generalizations, see the Finnish OT Prosody
# script.

################################## DATA ##################################

echo "kalastelet
kalasteleminen
ilmoittautuminen
jrjestelemttmyydestns
kalastelemme
ilmoittautumisesta
jrjestelmllisyydellni
jrjestelmllistmtnt
voimisteluttelemasta
opiskelija
opettamassa
kalastelet
strukturalismi
onnittelemanikin
mki
perij
repem
ergonomia
puhelimellani
matematiikka
puhelimistani
rakastajattariansa
kuningas
kainostelijat
ravintolat
merkonomin" | hfst-strings2fst -f $FORMAT -j > FinnWords.hfst

######################### BASIC DEFINITIONS #############################

echo "[u | y | i]" | hfst-regexp2fst -f $FORMAT > HighV.hfst                          # High vowel
echo "[e | o | ]" | hfst-regexp2fst -f $FORMAT > MidV.hfst                         # Mid vowel
echo "[a | ]" | hfst-regexp2fst -f $FORMAT > LowV.hfst                            # Low vowel
echo "[HighV | MidV | LowV]" | hfst-regexp2fst -f $FORMAT > USV.hfst                  # Unstressed Vowel

define C [b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z]; # Consonant

define MSV [ | | | | | | "́" | "́"]; define SSV [ | | | | | "ỳ" | "̀" | "̀"]; define SV [MSV | SSV]; # Stressed vowel define V [USV | SV] ; # Vowel

define P [V | C]; # Phone define B [[\P+] | .#.]; # Boundary define E .#. | "."; # Edge define SB [~$"." "." ~$"."]; # At most one syllable boundary

define Light [C* V]; # Light syllable define Heavy [Light P+]; # Heavy syllable

define S [Heavy | Light]; # Syllable define SS [S & $SV]; # Stressed syllable define US [S & ~$SV]; # Unstressed syllable define MSS [S & $MSV] ; # Syllable with main stress

define BF [S "." S]; # Binary foot

######################### RULES FOR PROSODY #############################

define MarkNonDiphthongs [ [. .] -> "." || [HighV | MidV] _ LowV, # i.a, e.a i _ [MidV - e], # i.o, i. u _ [MidV - o], # u.e y _ [MidV - ] ]; # y.e

# The general syllabification rule has exceptions. In particular, loan # words such as ate.isti 'atheist' must be partially syllabified in the # lexicon.

define Syllabify C* V+ C* @-> ... "." || _ C V ;

define TernaryFeet BF "." Light @-> "(" ... ")" // [{).} | .#.] [BF "."]* _ ["." Heavy "." S ] | .#. ;

# Scan all the unfooted material into binary feet.

define BinaryFeet BF @-> "(" ... ")" || .#.|"." _ .#.|".";

# Assign the primary stress to the first vowel of the first syllable.

define MainStress a -> , e -> , i -> , o -> , u -> , y -> , -> "́", -> "́" || .#. "(" C* _ ;

# Assign secondary stress to all initial vowels of non-initial syllables.

define SecondaryStress a -> , e -> , i -> , o -> , u -> , y -> "ỳ", -> "̀", -> "̀" || "(" C* _ ;

# Assign an optional secondary stress to an unfooted final syllable # if it is heavy.

define OptFinalStress a (->) , e (->) , i (->) , o (->) , u (->) , y (->) "ỳ", (->) "̀", (->) "̀" || "." C* _ P .#.;

define FinnProsody [ MarkNonDiphthongs .o. Syllabify .o. TernaryFeet .o. BinaryFeet .o. MainStress .o. SecondaryStress .o. OptFinalStress ];

echo ### Computing [FinnWords .o. FinnProsody]

regex FinnWords .o. FinnProsody;

print lower-words

################################ END ######################################

# Here is the output produced by the script:

# (n.nit).(t.le).(m.ni).kn # (n.nit).(t.le).(m.ni).kin # (.pet.ta).(ms.sa) # (.pis).(k.li.ja) # (r.go).(n.mi.a) # (l.moit).(tu.tu).(m.nen) # (l.moit).(tu.tu.mi).(ss.ta) # (vi.mis.te).(lt.te.le).(ms.ta) # (strk.tu.ra).(ls.mi) # (r.kas.ta).(jt.ta.ri).(n.sa) # (r.vin).(t.lat) # (r.pe).(̀.m) # (p.ri.j) # (p.he.li).(ml.la.ni) # (p.he.li).(ms.ta.ni) # (ḿ.ki) # (m.te.ma).(tik.ka) # (mr.ko).(n.min) # (ki.nos).(t.li).jt # (ki.nos).(t.li).jat # (k.las).(t.let) # (k.las).(t.le).(m.nen) # (k.las.te).(lm.me) # (k.nin).gs # (k.nin).gas # (j́r.jes).(tl.ml).(ls.t.m).(t̀n.t) # (j́r.jes).(tl.ml.li).(sỳy.del).(l̀.ni) # (j́r.jes).(t.le).(m̀t.t).(mỳy.des).(t̀n.s)


<--  
-->
-- ErikAxelson - 2011-09-19
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback