TWiki> KitWiki Web>FinnishActivities>HfstHome>HfstAPIGeneralizedRestrictionLayerOLD (2009-09-30, ErikAxelson) EditAttach

**THIS TOPIC IS NOT UP TO DATE**

- rewriting, replace and markup rules, constraint rules, contextual rules and grammars in general
- Two-Level Phonology, Generative Phonology, Optimality Theoretic Phonology, and Autosegmental Phonology, and
- lexicon formalisms and surface syntactic grammars.

- the Generalized Restriction framework (Yli-Jyrä & Koskenniemi 2004, 2006, 2007; Yli-Jyrä 2007a, 2007b)
- the Regular Expressions and Predicate Logic (Hulden 2008).

Overt FSTs keep track of the set of the known auxiliary markers (or their keys).

- The basic and common notion in both streams is the notion of an auxiliary symbol that is used to indicate a boundary of a substring. This symbol has two names: a
*diamond*`◊`

and*variable symbol*`()`

. More such symbols can be numbered or named according to letters using different notational conventions:`◊`

. These auxiliary symbols occur in strings of marker-augmented FSTs that we call_{1}, ◊_{2},..., <x>, <y>, ... (x), (y),..., x, y, ...*overt FSTs*(interfaced through`OvertFstHandle`

). The data type of an uncompiled generalized restriction is`OvertFstPair`

. - The size of th auxiliary symbol alphabet may vary dynamically.
- Most applications use only one or two kinds of auxiliary symbols, but they may need a few more (a few dozens of them).
- In addition to these auxiliary symbols that are employed by an application programmer, we need to prepare for hundreds of such auxiliary symbols that are created by special
*algorithms*that speed up the compilation of the GR operation. These symbols are added in certain paths in order to make FSTs look simpler. It is not yet certain whether these symbols are internalized to an algorithm or whether they need to be stored to the marker alphabet of the*overt FSTs*. - The number may be even larger in the future when we have a weighted GR operation. Each rule and non-zero weight in the representation of the rules may need a separate auxiliary symbol. For example, a nondeterministic FST can often be transformed to a deterministic one by adding such auxiliary symbols.

- Unknown auxiliary symbols have to added by
*harmonization*in some situations:- When an FST is combined under
*intersection*with other FSTs: loops for unknown auxiliaries are simply added to all FSTs that are being combined under intersection. - The
*complement*or*relative complement*of an*overt FST*is computed w.r.t. the universal language over all pairs of known and unknown normal symbols and all the identity pairs of known auxiliary symbols.

- When an FST is combined under

All auxiliary symbol are neither used to mark things in similar ways and they are not used to mark similar things.

- The number of auxiliary symbols in the strings may vary since there are many ways ("usage modes") to use auxiliary markers. All these should be supported:
- It occurs in specified positions and is used as a joiner rather than a substring marker.
- It occurs a bounded number of times (typically 2) and mark substrings.
- It occurs freely in specified positions that may be finite of infinit.

- All auxiliary symbols used in the current layer have are interfaced through a
`VarHandle`

.- Each
`VarHandle`

contains a`Key`

. In transitions, these*Keys*are used. The auxiliary symbols can be removed from FSTs one by one or once for all. Accordingly, auxiliary symbols are quite similar to keys of normal symbols. - The
`VarHandle`

that is stored to the marker alphabet of*over FSTs*containts additional information on usage mode of the symbol.

- Each
- The functions in the GR layer may test that the relevant
`VarHandles`

have a correct usage mode.- The current layer supports only one kind of auxiliary symbols: those that are used in pairs to mark substrings.
- Yli-Jyrä (2007a, 2007b) and Yli-Jyrä and Koskenniemi (2004) already employ auxiliary symbols that do not occur in pairs but as a joiners or freely in specified positions.

The total set of possible pairs have to be defined somehow when a complement of an *overt FST* is computed.

- Access to related =other=-symbol (or key) conventions would make the complementation more compact and efficient. Therefore, the layer supporting unknown keys and pairs would be below the current layer.
- We do not need an =other=-symbol that would account for unknown auxiary markers, since unknown auxiliary symbols are treated as epsilons except in intersection and relative complementation.

Excepting the harmonization and carrying the auxiliary symbol alphabet, most of the functionalities of this layer is based on normal FST operations provided by the TransducerLayer. Accordingly, the large number of operations in the current layer does not imply much programming.

Type | Comment | hfst-lexc | hfst-twolc |
---|---|---|---|

VarHandle |
a variable denoting positions in strings and used in different ways according to the variable type (on boundaries, sprinkled, members) | ||

VarSetHandle |
a set of variables | ||

OvertFstHandle |
an FST using variables that are not yet bound to any value | ||

OvertFstPair |
a pair of FSTs using variables | ||

TransducerHandle |
an HFST transducer without variable symbols |

The current layer contains a few sets of functions:

- Those that need to assume a particular usage of auxiliary symbols that are given as
`VarHandle`

parameters:- functions for joiner variables (minimun level of support)
- functions for substring variables
- functions for auxiliary symbols that occur freely in specified positions

- Those that operate
*overt FSTs*and their pairs without paying attention to the usage modes of the known auxiliary symbols.

Type | Function | Comment | hfst-lexc | hfst-twolc |
---|---|---|---|---|

VarHandle |
`create_var_type1` () |
Create a FO variable used for a boundary between two substrings | ||

VarSetHandle |
`create_var_set` () |
Create an empty set of FO variables | ||

VarSetHandle |
`insert_var` (VarSetHandle s, VarHandle x) |
Insert an FO variable to set `s` |

Type | Function | Comment | hfst-lexc | hfst-twolc |
---|---|---|---|---|

OvertFstHandle |
`var_create_joiner_fst` (VarHandle x, TransducerHandle t) |
Asserts that position `x` is a boundary between empty prefix and empty suffix. |

Type | Function | Comment | hfst-lexc | hfst-twolc |
---|---|---|---|---|

OvertFstHandle |
`convert_to_overt_fst` (TransducerHande e) |
Rises the type of a transducer to an overt FST. | ||

OvertFstHandle |
`connect_with_concatenation` (OvertFstHande e, OvertFstHande f) |
Combines two overt FSTs under concatenation. | ||

OvertFstHandle |
`connect_with_and` (OvertFstHande e, OvertFstHande f) |
Combines two overt FSTs under conjunction. | ||

OvertFstHandle |
`connect_with_relative_complement` (OvertFstHande e, OvertFstHande f) |
Combine two overt FSTs as `e ∪ ¬f` . |
||

OvertFstHandle |
`quantify_existentially` (VarHandle m, OvertFstHandle φ) |
Restricts `φ` to those strings that contain an `m` and hides then `m` in `φ` by removing them like epsilons, i.e. computes `(∃ x)φ(x)` . |
||

TransducerHandle |
`quantify_existentially_all` (OvertFstHande e) |
Removes all auxiliary symbols. | ||

OvertFstHandle |
`quantify_universally` (VarHandle m, OvertFstHandle φ) |
Restricts `x` to all substrings in `φ` by evaluating`Σ` i.e. computes `¬ ((∃ x) ¬φ(x))` that is the same as `(∀ x)φ(m)` . |
||

TransducerHandle |
`quantify_universally_all` (OvertFstHandle φ) |
The essential part of the ordinary GR operation that removes all variables at once like repeatly applying `quantify_universally` for all of them. |

These functions are implemented in a latter stage. These are currently moved to HfstAPIGeneralizedRestrictionLayer2OLD in order to prioritize the work.

- C. Elgot: Decision problems of finite automata and related arithmetics. Trans. Amer. Math. Soc. 98: 21-51 (1961)
- R. Cohen and J. Brzozowski: Dot-depth of star-free events. J. Comput. Syst. Sci. 5:1-16 (1971)
- W. Thomas: Classifying Regular Events in Symbolic Logic. J. C. S. Sci. 25(3): 360-376 (1982)
- N. Vaillette: Logical specification of regular relations for NLP. Natural Language Engineering 9(1): 65-85. (2003)
- A. Yli-Jyrä: Describing syntax with star-free expressions. In EACL 2003. 379-386. (2003)
- --: Contributions to the Theory of Finite-State Based Grammars. Dissertation. (2005a)
- --: Linguistic Grammars with Very Low Complexity. In Arppe et al., Inquiries into Words, Constraints and Contexts. CSLI Publications. (2005b)
- --: Applications of Diamonded Double Negation. Invited talk (original title: Hidden Jewels of Double Negation). FSMNLP 2007 postproceedings. (2007a)
- --: Transducers from Parallel Replace Rules and Modes with Generalized Lenient Composition. FSMNLP 2007 postproceedings. (2007b)
- -- and K. Koskenniemi: Compiling contextual restrictions on strings into finite-state automata. FASTAR Days. Eindhoven. (2004)
- -- and --: Compiling Generalized Two-Level Rules and Grammars. FinTAL 2006: 174-185. (2006)
- -- and --: A New Method for Compiling Parallel Replacement Rules. CIAA 2007: 320-321 (2007c)
- M. Hulden. Regular expressions and predicate logic in finite-state language processing. FSMNLP 2008. (2008)

-- AnssiYliJyra - 06 Oct 2008

Topic revision: r11 - 2009-09-30 - ErikAxelson

**TWiki Reference**

- ATasteOfTWiki
- TextFormattingRules
- TWikiVariables
- FormattedSearch
- TWikiDocGraphics
- InstalledPlugins
- TWikiReferenceManual

Copyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback