HFST: API for Generalized Restriction (GR) and Predicate Logic in FST Calculus

Introduction

Purpose of the layer

This extension to the basic HFST API provides support for new letter-FST operations that are useful for compilation of
  • rewriting, replace and markup rules, constraint rules, contextual rules and grammars in general
  • Two-Level Phonology, Generative Phonology, Optimality Theoretic Phonology, and Autosegmental Phonology, and
  • lexicon formalisms and surface syntactic grammars.

Prior references

The current definitions are based on the convergence of two streams of work:
  • the Generalized Restriction framework (Yli-Jyrä & Koskenniemi 2004, 2006, 2007; Yli-Jyrä 2007a, 2007b)
  • the Regular Expressions and Predicate Logic (Hulden 2008).
The more general discussion on FST related arithmetics goes back to the work of Buchi (1960), Elgot (1961), etc. See Section References below.

Basic Notions

Auxiliary Marker Symbols

  1. The basic and common notion in both streams is the notion of an auxiliary symbol that is used to indicate a boundary of a substring. This symbol has two names: a diamond and variable symbol (). More such symbols can be numbered or named according to letters using different notational conventions: 1, ◊2,..., <x>, <y>, ... (x), (y),..., x, y, .... These auxiliary symbols occur in strings of marker-augmented FSTs that we call below over FSTs (interfaced through OvertFstHandle). The data type of an uncompiled generalized restriction is OvertFstPair.
  2. In addition to those e.g. dozen auxiliary symbols that have been employed by an application programmer, we need to prepare for hundreds of such auxiliary symbols that are created by special algorithms that speed up the compilation of the GR operation. These symbols are added in certain paths in order to make FSTs look simpler. Their number can be large since we want to more auxiliary symbols when we have a weighted GR operation in the future. For example, nondeterministic FST can be transformed to a deterministic one by adding such auxiliary symbols. Regardless of the purpose, the auxiliary symbols have the same type and it is interfaced through a VarHandle.
  3. There are many ways to use auxiliary markers. All these should be supported: (i) it occurs only once in every path of over FSTs, (ii) it occurs twice, (iii) it occurs 4 times, (iv) it occurs freely in specified positions. They can be removed one by one or once for all. Accordingly, auxiliary symbols are quite similar to normal symbols (or keys).
  4. Overt FSTs keep track of the known auxiliary markers that vary from FST to FST even if they could share the same key table. Those symbols that an FST does not know need to be added by harmonization when the FST is combined under intersection with other FSTs. There, loops for unknown auxiliaries are simply added.
  5. The complementation of over FST is computed w.r.t. the universal language over all pairs of known and unknown normal symbols and all the identity pairs of known auxiliary symbols.

Nondeterministic Letter Transducers

Often letter transducers are kept as deterministic and minimal. However, in the case of overt FSTs, we may want to avoid determinization of intermediate results.

Operations, Functions and Related Layers

  1. VarHandle can contain a KeyHandle rather than a SymbolHandle and it probably more feasible to implement that way.
  2. Excepting the harmonization and carrying the auxiliary symbol alphabet, most of the functionalities of this layer is based just normal FST operations provided by the TransducerLayer. Accordingly, the large number of operations in the current layer does not imply much programming.
  3. The functions in the current layer are all derived and their are mostly of similar complexity. We see all operations that use only overt FSTs as primitive while those build on over FST pairs are derived.
  4. However, the total set of possible pairs have to be defined somehow. Therefore, optional access to related =other=-symbol (or key) conventions would make the complementation more compact and efficient.

Data Types

Type Comment hfst-lexc hfst-twolc
VarHandle a variable denoting positions in strings and used in different ways according to the variable type (on boundaries, sprinkled, members)    
VarSetHandle a set of variables    
OvertFstHandle an FST using variables that are not yet bound to any value    
OvertFstPair a pair of FSTs using variables    
TransducerHandle an HFST transducer without variable symbols    

Function Primitives

Definition of Substring Variables and other Diamond Symbols

Type Function Comment hfst-lexc hfst-twolc
VarHandle create_var_type2 () Create a FO variable used for boundaries of a substring    
VarSetHandle create_var_set () Create an empty set of FO variables    
VarSetHandle insert_var (VarSetHandle s, VarHandle x) Insert an FO variable to set s    

Functions That Relate Substring Variables to Variable-Free Conditions

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle var_is_member (VarHandle x, TransducerHandle t) Asserts that substring x is recognized by t.    
OvertFstHandle var_partition (VarHandle v, VarHandle x, VarHandle y) Asserts that string is partitioned into substrings v, x and y.    
OvertFstHandle var_is_substring_with_context (VarHandle x, TransducerHandle rc, TransducerHandle cntr, TransducerHandle rc, bool padded) Asserts that x is substring of strings in cntr that is surrounded by (padded) contexts lc and rc.    
OvertFstHandle var_has_context (VarHandle x, TransducerHandle lc, TransducerHandle rc, bool padded) Asserts that substring x is surrounded by (padded) contexts lc and rc.    
OvertFstHandle var_precedes (VarHandle x, TransducerHandle rc, bool padded) Asserts that substring x precedes (padded) right context rc.    
OvertFstHandle var_follows (VarHandle x, TransducerHandle lc, bool padded) Asserts that substring x follows (padded) left context lc.    
(this table is subject to minor changes)

Functions That Compare or Rename Substring Variables

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle var_embeds (VarHandle x, VarHandle y) Asserts that substring x contains substring y.    
OvertFstHandle var_equals (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OvertFstHandle var_starts_simultaneously (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OverFstHandle var_ends_simultaneously (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OvertFstHandle var_starts_before (VarHandle x, VarHandle y) Asserts that substring x starts before substring y.    
OvertFstHandle var_ends_before (VarHandle x, VarHandle y) Asserts that substring x ends before substring y.    
OvertFstHandle var_comes_before (VarHandle x, VarHandle y) Asserts that substring x occurs ends before substring y starts.    
OvertFstHandle var_has_overlap (VarHandle x, VarHandle y) Asserts that substring x and substring y are not disjoint (Hulden 2008).    
OvertFstHandle var_is_prefix_of (VarHandle x, VarHandle y) Asserts that substring x is prefix of y.    
OvertFstHandle var_is_suffix_of (VarHandle x, VarHandle y) Asserts that substring x is suffix of y.    
OvertFstHandle rename_var (VarHandle x, VarHandle y, OvertFstHandle ψ) Makes α transformation for overt FST ψ by replacing variable x with variable y.    

Functions That Implement Logical Connectors and Quantifiers

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle connect_with_concatenation (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under contenation.    
OvertFstHandle connect_with_and (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under conjunction.    
OvertFstHandle connect_with_or (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under disjunction.    
OvertFstHandle connect_with_xor (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under exclusive disjunction.    
OvertFstHandle connect_with_not (OvertFstHande e) Negate an overt FST.    
OvertFstHandle connect_with_arrow (OvertFstHande e, OvertFstHande f) Combine two overt FSTs with material implication.    
OvertFstHandle quantify_existentially (VarHandle m, OvertFstHandle φ) Hides diamond x in φ,
or computes (∃ x)φ(x).
   
OvertFstHandle quantify_universally (VarHandle m, OvertFstHandle φ) Restrict x to all substrings in φ by evaluating
    Σ* <x> Σ* <x> Σ*  =x>  φ
or computes (∀ x)φ(x).
   

Functions That Construct, Combine and Compile GR Rules

Type Function Comment hfst-lexc hfst-twolc
OvertFstPair gr_construct (OvertFstHande e, OvertFstHande f) Combine two overt FSTs for generalized restriction operator.    
OvertFstPair gr_intersect (OvertFstPair gr1, OvertFstPair gr2) Combine two generalized restrictions under intersection.    
OvertFstPair gr_intersect_coherently (OvertFstPair gr1, OvertFstPair gr2) Combine two generalized restrictions under coherent intersection.    
OvertFstHandle gr_compile (OvertFstPair g, VarHandle x) Compiles
    generalized restriction φ =x> ψ or
    logical formula (∀ x)( φ(x) → ψ(x) )
where (φ,ψ)=g.
   
OvertFstHandle gr_compile_parallel (OvertFstPair g, VarSetHandle X) Compile
    generalized restriction φ =x1,...,xn> ψ or
    logical formula (∀ x1,...,xn)( φ(x1,...,xn) → ψ(x1,...,xn))
where (φ,ψ)=g and x1,...,xn ∈ X.
   
TransducerHandle gr_compile_final (OvertFstPair g) Compile
    generalized restriction φ =x1,...,xn> ψ or
    logical formula (∀ x1,...,xn)( φ(x1,...,xn) → ψ(x1,...,xn))
where (φ,ψ)=g.
   

Derived Functions

Functions That Compile Constraints using GR Rules

In the following, c is an overt FST for contexts.

Type Function Comment hfst-lexc hfst-twolc
OvertFstPair gr_construct_not (TransducerHandle t) not(t).    
OvertFstPair gr_construct_nowhere (TransducerHandle t) nowhere(t).    
OvertFstPair gr_construct_if_P_then_S (TransducerHandle p, TransducerHandle s) if-P-then-S(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_if_S_then_P (TransducerHandle p, TransducerHandle s) if-S-then-P(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_iff_P_S (TransducerHandle p, TransducerHandle s) iff-P-S(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_center_prohibition (TransducerHandle x, OvertFstHandle c) center prohibition x /<= c (in TWOLC).    
OvertFstPair gr_construct_context_restriction (TransducerHandle x, OvertFstHandle c) context restriction x => c (Koskenniemi 1983).    
OvertFstPair gr_construct_coercion (TransducerHandle x, OvertFstHandle c) coercion x <<= c (Yli-Jyrä & Koskenniemi 2004).    
OvertFstPair gr_construct_output_coercion (TransducerHandle x, OvertFstHandle c) surface coercion x <= c (Koskenniemi 1983).    
OvertFstPair gr_construct_input_coercion (TransducerHandle x, OvertFstHandle c) lexical coercion x <- c (less used).    
OvertFstPair gr_construct_center_presence_requirement (TransducerHandle x, OvertFstHandle c) center presence requirement x <== c (Yli-Jyrä & Koskenniemi 2006).    
OvertFstPair gr_construct_presence_requirement (OvertFstHandle c1, OvertFstHandle c2) presence requirement c1 ==> c2 (Yli-Jyrä & Koskenniemi 2006).    
OvertFstPair gr_construct_output_double_arrow (TransducerHandle x, OvertFstHandle c) double arrow x <=> c (Koskenniemi 1983).    
OvertFstPair gr_construct_input_double_arrow (TransducerHandle x, OvertFstHandle c) double arrow x <-> c that combines a lexical coercion and a context restriction (less used).    
OvertFstPair gr_construct_match_L_R (TransducerHandle l, TransducerHandle d, TransducerHandle r) match-L-R(l,d,r) that says that a left side l (right side r) must always be paired with a right side r (left side l), and separated from that with a center that belongs to d (Yli-Jyrä 2007).    
OvertFstPair gr_construct_bracketing_restriction (TransducerHandle l, TransducerHandle r, TransducerHandle d, TransducerHandle c) bracketing restriction constraint # l __ R # => c according to which (∀ vxy)( v∈l ∧ x∈d ∧ y∈r → x∈d ) (Yli-Jyrä 2003).    

GR-Based Engines for Grammar Systems

The GR Calculus makes is easy to define some complete rule systems with rule application modes and conflict resolutions. The interfaces to the following systems will be added here:

  • Two-Level Grammars
  • Generalized Two-Level Grammars
  • Bracketed Generalized Two-Level Grammars
  • Parallel Replacement and Markup Rules

References

  • C. Elgot: Decision problems of finite automata and related arithmetics. Trans. Amer. Math. Soc. 98: 21-51 (1961)
  • R. Cohen and J. Brzozowski: Dot-depth of star-free events. J. Comput. Syst. Sci. 5:1-16 (1971)
  • W. Thomas: Classifying Regular Events in Symbolic Logic. J. C. S. Sci. 25(3): 360-376 (1982)
  • N. Vaillette: Logical specification of regular relations for NLP. Natural Language Engineering 9(1): 65-85. (2003)
  • A. Yli-Jyrä: Describing syntax with star-free expressions. In EACL 2003. 379-386. (2003)
  • --: Contributions to the Theory of Finite-State Based Grammars. Dissertation. (2005a)
  • --: Linguistic Grammars with Very Low Complexity. In Arppe et al., Inquiries into Words, Constraints and Contexts. CSLI Publications. (2005b)
  • --: Applications of Diamonded Double Negation. Invited talk (original title: Hidden Jewels of Double Negation). FSMNLP 2007 postproceedings. (2007a)
  • --: Transducers from Parallel Replace Rules and Modes with Generalized Lenient Composition. FSMNLP 2007 postproceedings. (2007b)
  • -- and K. Koskenniemi: Compiling contextual restrictions on strings into finite-state automata. FASTAR Days. Eindhoven. (2004)
  • -- and --: Compiling Generalized Two-Level Rules and Grammars. FinTAL 2006: 174-185. (2006)
  • -- and --: A New Method for Compiling Parallel Replacement Rules. CIAA 2007: 320-321 (2007c)
  • M. Hulden. Regular expressions and predicate logic in finite-state language processing. FSMNLP 2008. (2008)


-- AnssiYliJyra - 06 Oct 2008
Edit | Attach | Print version | History: r11 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2008-10-13 - AnssiYliJyra
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback