Difference: HfstAPIGeneralizedRestrictionLayerOLD (7 vs. 8)

Revision 82008-10-14 - AnssiYliJyra

Line: 1 to 1
 
META TOPICPARENT name="HfstHome"

HFST: API for Generalized Restriction (GR) and Predicate Logic in FST Calculus

Line: 17 to 17
 

Basic Notions

Auxiliary Marker Symbols

Changed:
<
<
  1. The basic and common notion in both streams is the notion of an auxiliary symbol that is used to indicate a boundary of a substring. This symbol has two names: a diamond and variable symbol (). More such symbols can be numbered or named according to letters using different notational conventions: 1, ◊2,..., <x>, <y>, ... (x), (y),..., x, y, .... These auxiliary symbols occur in strings of marker-augmented FSTs that we call below over FSTs (interfaced through OvertFstHandle). The data type of an uncompiled generalized restriction is OvertFstPair.
  2. In addition to those e.g. dozen auxiliary symbols that have been employed by an application programmer, we need to prepare for hundreds of such auxiliary symbols that are created by special algorithms that speed up the compilation of the GR operation. These symbols are added in certain paths in order to make FSTs look simpler. Their number can be large since we want to more auxiliary symbols when we have a weighted GR operation in the future. For example, nondeterministic FST can be transformed to a deterministic one by adding such auxiliary symbols. Regardless of the purpose, the auxiliary symbols have the same type and it is interfaced through a VarHandle.
  3. There are many ways to use auxiliary markers. All these should be supported: (i) it occurs only once in every path of over FSTs, (ii) it occurs twice, (iii) it occurs 4 times, (iv) it occurs freely in specified positions. They can be removed one by one or once for all. Accordingly, auxiliary symbols are quite similar to normal symbols (or keys).
  4. Overt FSTs keep track of the known auxiliary markers that vary from FST to FST even if they could share the same key table. Those symbols that an FST does not know need to be added by harmonization when the FST is combined under intersection with other FSTs. There, loops for unknown auxiliaries are simply added.
  5. The complementation of over FST is computed w.r.t. the universal language over all pairs of known and unknown normal symbols and all the identity pairs of known auxiliary symbols.
>
>
Overt FSTs keep track of the set of the known auxiliary markers (or their keys).
  1. The basic and common notion in both streams is the notion of an auxiliary symbol that is used to indicate a boundary of a substring. This symbol has two names: a diamond and variable symbol (). More such symbols can be numbered or named according to letters using different notational conventions: 1, ◊2,..., <x>, <y>, ... (x), (y),..., x, y, .... These auxiliary symbols occur in strings of marker-augmented FSTs that we call overt FSTs (interfaced through OvertFstHandle). The data type of an uncompiled generalized restriction is OvertFstPair.
  2. The size of th auxiliary symbol alphabet may vary dynamically.
    1. Most applications use only one or two kinds of auxiliary symbols, but they may need a few more (a few dozens of them).
    2. In addition to these auxiliary symbols that are employed by an application programmer, we need to prepare for hundreds of such auxiliary symbols that are created by special algorithms that speed up the compilation of the GR operation. These symbols are added in certain paths in order to make FSTs look simpler. It is not yet certain whether these symbols are internalized to an algorithm or whether they need to be stored to the marker alphabet of the overt FSTs.
    3. The number may be even larger in the future when we have a weighted GR operation. Each rule and non-zero weight in the representation of the rules may need a separate auxiliary symbol. For example, a nondeterministic FST can often be transformed to a deterministic one by adding such auxiliary symbols.
  3. Unknown auxiliary symbols have to added by harmonization in some situations:
    1. When an FST is combined under intersection with other FSTs: loops for unknown auxiliaries are simply added to all FSTs that are being combined under intersection.
    2. The complement or relative complement of an overt FST is computed w.r.t. the universal language over all pairs of known and unknown normal symbols and all the identity pairs of known auxiliary symbols.

All auxiliary symbol are neither used to mark things in similar ways and they are not used to mark similar things.

  1. The number of auxiliary symbols in the strings may vary since there are many ways ("usage modes") to use auxiliary markers. All these should be supported:
    1. It occurs in specified positions and is used as a joiner rather than a substring marker.
    2. It occurs a bounded number of times (typically 2) and mark substrings.
    3. It occurs freely in specified positions that may be finite of infinit.
  2. All auxiliary symbols used in the current layer have are interfaced through a VarHandle.
    1. Each VarHandle contains a Key. In transitions, these Keys are used. The auxiliary symbols can be removed from FSTs one by one or once for all. Accordingly, auxiliary symbols are quite similar to keys of normal symbols.
    2. The VarHandle that is stored to the marker alphabet of over FSTs containts additional information on usage mode of the symbol.
  3. The functions in the GR layer may test that the relevant VarHandles have a correct usage mode.
    1. The current layer supports only one kind of auxiliary symbols: those that are used in pairs to mark substrings.
    2. Yli-Jyrä (2007a, 2007b) and Yli-Jyrä and Koskenniemi (2004) already employ auxiliary symbols that do not occur in pairs but as a joiners or freely in specified positions.

The total set of possible pairs have to be defined somehow when a complement of an overt FST is computed.

  1. Access to related =other=-symbol (or key) conventions would make the complementation more compact and efficient. Therefore, the layer supporting unknown keys and pairs would be below the current layer.
  2. We do not need an =other=-symbol that would account for unknown auxiary markers, since unknown auxiliary symbols are treated as epsilons except in intersection and relative complementation.
 

Nondeterministic Letter Transducers

Often letter transducers are kept as deterministic and minimal. However, in the case of overt FSTs, we may want to avoid determinization of intermediate results.
Changed:
<
<

Operations, Functions and Related Layers

  1. VarHandle can contain a KeyHandle rather than a SymbolHandle and it probably more feasible to implement that way.
  2. Excepting the harmonization and carrying the auxiliary symbol alphabet, most of the functionalities of this layer is based just normal FST operations provided by the TransducerLayer. Accordingly, the large number of operations in the current layer does not imply much programming.
  3. The functions in the current layer are all derived and their are mostly of similar complexity. We see all operations that use only overt FSTs as primitive while those build on over FST pairs are derived.
  4. However, the total set of possible pairs have to be defined somehow. Therefore, optional access to related =other=-symbol (or key) conventions would make the complementation more compact and efficient.
>
>

Overview of this Layer

 
Changed:
<
<

Data Types

>
>
Excepting the harmonization and carrying the auxiliary symbol alphabet, most of the functionalities of this layer is based on normal FST operations provided by the TransducerLayer. Accordingly, the large number of operations in the current layer does not imply much programming.
 
Type Comment hfst-lexc hfst-twolc
Line: 42 to 59
 
OvertFstPair a pair of FSTs using variables    
TransducerHandle an HFST transducer without variable symbols    
Changed:
<
<

Function Primitives

Definition of Substring Variables and other Diamond Symbols

>
>
The current layer contains a few sets of functions:
  1. Those that need to assume a particular usage of auxiliary symbols that are given as VarHandle parameters:
    • functions for joiner variables (minimun level of support)
    • functions for substring variables
    • functions for auxiliary symbols that occur freely in specified positions
  2. Those that operate overt FSTs and their pairs without paying attention to the usage modes of the known auxiliary symbols.

Functions Supporting Joiner Variables

Creating New Joiner Variables

 
Type Function Comment hfst-lexc hfst-twolc
Changed:
<
<
VarHandle create_var_type2 () Create a FO variable used for boundaries of a substring    
>
>
VarHandle create_var_type1 () Create a FO variable used for a boundary between two substrings    
 
VarSetHandle create_var_set () Create an empty set of FO variables    
VarSetHandle insert_var (VarSetHandle s, VarHandle x) Insert an FO variable to set s    
Changed:
<
<

Functions That Relate Substring Variables to Variable-Free Conditions

Type Function Comment hfst-lexc hfst-twolc
OvertFstHandle var_is_member (VarHandle x, TransducerHandle t) Asserts that substring x is recognized by t.    
OvertFstHandle var_partition (VarHandle v, VarHandle x, VarHandle y) Asserts that string is partitioned into substrings v, x and y.    
OvertFstHandle var_is_substring_with_context (VarHandle x, TransducerHandle rc, TransducerHandle cntr, TransducerHandle rc, bool padded) Asserts that x is substring of strings in cntr that is surrounded by (padded) contexts lc and rc.    
OvertFstHandle var_has_context (VarHandle x, TransducerHandle lc, TransducerHandle rc, bool padded) Asserts that substring x is surrounded by (padded) contexts lc and rc.    
OvertFstHandle var_precedes (VarHandle x, TransducerHandle rc, bool padded) Asserts that substring x precedes (padded) right context rc.    
OvertFstHandle var_follows (VarHandle x, TransducerHandle lc, bool padded) Asserts that substring x follows (padded) left context lc.    
(this table is subject to minor changes)

Functions That Compare or Rename Substring Variables

>
>

Functions That Relate Joiner Variables to FSTs

 
Type Function Comment hfst-lexc hfst-twolc
Changed:
<
<
OvertFstHandle var_embeds (VarHandle x, VarHandle y) Asserts that substring x contains substring y.    
OvertFstHandle var_equals (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OvertFstHandle var_starts_simultaneously (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OverFstHandle var_ends_simultaneously (VarHandle x, VarHandle y) Asserts that substrings x and y are equal.    
OvertFstHandle var_starts_before (VarHandle x, VarHandle y) Asserts that substring x starts before substring y.    
OvertFstHandle var_ends_before (VarHandle x, VarHandle y) Asserts that substring x ends before substring y.    
OvertFstHandle var_comes_before (VarHandle x, VarHandle y) Asserts that substring x occurs ends before substring y starts.    
OvertFstHandle var_has_overlap (VarHandle x, VarHandle y) Asserts that substring x and substring y are not disjoint (Hulden 2008).    
OvertFstHandle var_is_prefix_of (VarHandle x, VarHandle y) Asserts that substring x is prefix of y.    
OvertFstHandle var_is_suffix_of (VarHandle x, VarHandle y) Asserts that substring x is suffix of y.    
OvertFstHandle rename_var (VarHandle x, VarHandle y, OvertFstHandle ψ) Makes α transformation for overt FST ψ by replacing variable x with variable y.    
>
>
OvertFstHandle var_create_joiner_fst (VarHandle x, TransducerHandle t) Asserts that position x is a boundary between empty prefix and empty suffix.    
 

Functions That Implement Logical Connectors and Quantifiers

Type Function Comment hfst-lexc hfst-twolc
Changed:
<
<
OvertFstHandle connect_with_concatenation (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under contenation.    
OvertFstHandle connect_with_and (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under conjunction.    
OvertFstHandle connect_with_or (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under disjunction.    
OvertFstHandle connect_with_xor (OvertFstHande e, OvertFstHande f) Combine two overt FSTs under exclusive disjunction.    
OvertFstHandle connect_with_not (OvertFstHande e) Negate an overt FST.    
OvertFstHandle connect_with_arrow (OvertFstHande e, OvertFstHande f) Combine two overt FSTs with material implication.    
OvertFstHandle quantify_existentially (VarHandle m, OvertFstHandle φ) Hides diamond x in φ,
or computes (∃ x)φ(x).
   
OvertFstHandle quantify_universally (VarHandle m, OvertFstHandle φ) Restrict x to all substrings in φ by evaluating
    Σ* <x> Σ* <x> Σ*  =x>  φ
or computes (∀ x)φ(x).
   
>
>
OvertFstHandle convert_to_overt_fst (TransducerHande e) Rises the type of a transducer to an overt FST.    
OvertFstHandle connect_with_concatenation (OvertFstHande e, OvertFstHande f) Combines two overt FSTs under concatenation.    
OvertFstHandle connect_with_and (OvertFstHande e, OvertFstHande f) Combines two overt FSTs under conjunction.    
OvertFstHandle connect_with_relative_complement (OvertFstHande e, OvertFstHande f) Combine two overt FSTs as e ∪ ¬f.    
OvertFstHandle quantify_existentially (VarHandle m, OvertFstHandle φ) Restricts φ to those strings that contain an m and hides then m in φ,
or computes (∃ x)φ(x).
   
OvertFstHandle quantify_universally (VarHandle m, OvertFstHandle φ) Restricts x to all substrings in φ by evaluating
    Σ* <x> Σ* <x> Σ*  =x>  φ
i.e. computes ¬ ((∃ x) ¬φ(x)) that is the same as (∀ x)φ(m).
   
OvertFstHandle quantify_universally_all (OvertFstHandle φ) The essential part of the ordinary GR operation that removes all variables at once like repeatly applying quantify_universally for all of them.    
 
Changed:
<
<

Functions That Construct, Combine and Compile GR Rules

>
>

Function Supporting Substring Variables

 
Changed:
<
<
Type Function Comment hfst-lexc hfst-twolc
OvertFstPair gr_construct (OvertFstHande e, OvertFstHande f) Combine two overt FSTs for generalized restriction operator.    
OvertFstPair gr_intersect (OvertFstPair gr1, OvertFstPair gr2) Combine two generalized restrictions under intersection.    
OvertFstPair gr_intersect_coherently (OvertFstPair gr1, OvertFstPair gr2) Combine two generalized restrictions under coherent intersection.    
OvertFstHandle gr_compile (OvertFstPair g, VarHandle x) Compiles
    generalized restriction φ =x> ψ or
    logical formula (∀ x)( φ(x) → ψ(x) )
where (φ,ψ)=g.
   
OvertFstHandle gr_compile_parallel (OvertFstPair g, VarSetHandle X) Compile
    generalized restriction φ =x1,...,xn> ψ or
    logical formula (∀ x1,...,xn)( φ(x1,...,xn) → ψ(x1,...,xn))
where (φ,ψ)=g and x1,...,xn ∈ X.
   
TransducerHandle gr_compile_final (OvertFstPair g) Compile
    generalized restriction φ =x1,...,xn> ψ or
    logical formula (∀ x1,...,xn)( φ(x1,...,xn) → ψ(x1,...,xn))
where (φ,ψ)=g.
   

Derived Functions

Functions That Compile Constraints using GR Rules

In the following, c is an overt FST for contexts.

Type Function Comment hfst-lexc hfst-twolc
OvertFstPair gr_construct_not (TransducerHandle t) not(t).    
OvertFstPair gr_construct_nowhere (TransducerHandle t) nowhere(t).    
OvertFstPair gr_construct_if_P_then_S (TransducerHandle p, TransducerHandle s) if-P-then-S(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_if_S_then_P (TransducerHandle p, TransducerHandle s) if-S-then-P(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_iff_P_S (TransducerHandle p, TransducerHandle s) iff-P-S(p,s) (Kaplan & Kay 1994).    
OvertFstPair gr_construct_center_prohibition (TransducerHandle x, OvertFstHandle c) center prohibition x /<= c (in TWOLC).    
OvertFstPair gr_construct_context_restriction (TransducerHandle x, OvertFstHandle c) context restriction x => c (Koskenniemi 1983).    
OvertFstPair gr_construct_coercion (TransducerHandle x, OvertFstHandle c) coercion x <<= c (Yli-Jyrä & Koskenniemi 2004).    
OvertFstPair gr_construct_output_coercion (TransducerHandle x, OvertFstHandle c) surface coercion x <= c (Koskenniemi 1983).    
OvertFstPair gr_construct_input_coercion (TransducerHandle x, OvertFstHandle c) lexical coercion x <- c (less used).    
OvertFstPair gr_construct_center_presence_requirement (TransducerHandle x, OvertFstHandle c) center presence requirement x <== c (Yli-Jyrä & Koskenniemi 2006).    
OvertFstPair gr_construct_presence_requirement (OvertFstHandle c1, OvertFstHandle c2) presence requirement c1 ==> c2 (Yli-Jyrä & Koskenniemi 2006).    
OvertFstPair gr_construct_output_double_arrow (TransducerHandle x, OvertFstHandle c) double arrow x <=> c (Koskenniemi 1983).    
OvertFstPair gr_construct_input_double_arrow (TransducerHandle x, OvertFstHandle c) double arrow x <-> c that combines a lexical coercion and a context restriction (less used).    
OvertFstPair gr_construct_match_L_R (TransducerHandle l, TransducerHandle d, TransducerHandle r) match-L-R(l,d,r) that says that a left side l (right side r) must always be paired with a right side r (left side l), and separated from that with a center that belongs to d (Yli-Jyrä 2007).    
OvertFstPair gr_construct_bracketing_restriction (TransducerHandle l, TransducerHandle r, TransducerHandle d, TransducerHandle c) bracketing restriction constraint # l __ R # => c according to which (∀ vxy)( v∈l ∧ x∈d ∧ y∈r → x∈d ) (Yli-Jyrä 2003).    

GR-Based Engines for Grammar Systems

The GR Calculus makes is easy to define some complete rule systems with rule application modes and conflict resolutions. The interfaces to the following systems will be added here:

  • Two-Level Grammars
  • Generalized Two-Level Grammars
  • Bracketed Generalized Two-Level Grammars
  • Parallel Replacement and Markup Rules
>
>
These functions are implemented in a latter stage. These are currently moved to HfstAPIGeneralizedRestrictionLayer2 in order to prioritize the work.
 

References

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback