(manual generated with FSA Utilities version 6.159)
FSA6 is a collection of utilities to
construct finite automata (from regular expressions)
manipulate finite automata
visualise finite automata
apply finite automata
Many basic regular expression operators are provided, both for acceptors and transducers. Moreover, it is easy to define new regular expression operators. The built-in regular expression operators include:
Concatenation; Kleene star; Kleene plus; Option; Union
Complement; Difference; Intersection
Reversal; Containment; Ignore
Composition; Cross-product; Domain; Range; Identity; Inversion;
Interval
`Any' meta-symbol.
Arbitrary predicates instead of symbols
Tools are provided to manipulate finite automata. Such manipulations include determinization and minimization (both the classical algorithms for recognizers and the recent algorithms for transducers are provided).
Determinization. Currently there are three different implementations of this algorithm, depending on how epsilon transitions (jumps) are treated. There is also an implementation of Mohri's determinization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers. The implementation is described in a paper in Computational Linguistics, available from http://www.let.rug.nl/~vannoord/papers/
Minimization. Three different minimization algorithms are supported. There is also an implementation of Mohri's minimization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers.
Random generation of finite automata, based on the algorithm in Leslie (1995).
Epsilon-removal.
Completion and Incompletion: extending a given automaton in order to make the transition table total (typically by adding a sink state and adding transitions to this sink state); and removing transitions leading to sink states.
Regular manipulations. The various regular expression operators can be applied to automata directly as well.
Acceptance. Tools to check a given string for acceptance by a recognizer.
Transduction. Tools to apply a transducer to a given input string.
Production. Tools to produce strings of a given recognizer, and pairs of strings for a given transducer.
Code Generation. Tools to compile finite automata into efficient Prolog or C programs which can be used to check whether a given string is in the language defined by the automaton, or to generate the transduction of a given string w.r.t. a given transducer.
Much attention has been paid to be able to visualize finite state recognizers and finite state transducers. Support includes built- in visualization and interfaces to third party software:
DOT. The program is able to produce a representation of a finite automaton compatible with the DOT graph visualisation program. DOT (part of AT&T's graphviz) is a tool that automatically figures out how a graph is best displayed (crossing-edges reduction, etc). It can produce e.g. Postscript output. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/dot.png.
VCG. The program is able to produce a representation of a finite state automaton compatible with the VCG graph visualisation program. VCG is a tool that automatically figures outhow a graph is best displayed (crossing-edges reduction, etc). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/vcg.png.
daVinci. The program is able to produce a representation of a finite state automaton compatible with the daVinci 1.4 graph visualisation program. This program automatically computes the most optimal way to view the finite-state automaton by minimizing the number of crossing edges. Postscript output can easily be generated from the result. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/daVinci.png.
TK Widget. The package contains an interface to a TK Widget to browse finite state automata, providing a graphical user-interface for the toolbox. The TK Widget is explained in much more detail below. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/dump.png. Note that the GUI is not an integral part of the toolbox; it makes perfect sense to use the program in batch mode. The graphical user interface is only available under SICStus.
LaTeX (if you want to be able to use the result you need the pstricks package). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/pstricks.png.
Postscript (thanks to Peter Kleiweg). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/pk.png.
There are a number of ways that the toolbox is meant to be used:
interactively using a command interpreter and/or a graphical user interface. For example, in order to use fsa interactively with the graphical user interface, use the command:
% fsa -tk
as a UNIX-like filter. In such cases you use the fsa command with a number of options. For instance:
% fsa write=postscript -r '[a,b+,c*,d]' | ghostview -
as a library in your own Prolog program. You can incorporate the FSA program in your own program, just as you can use other Prolog libraries. In order for this to work, you simply need to load the file fsa_library.pl in the installation directory. For example:
% sicstus
SICStus ...
Licensed to ...
| ?- use_module(fsa_library).
...
...
...
yes
| ?- fsa_regex_atom_compile('[a*,b^,{d,e}]',L).
L = fa(r(fsa_preds),3,[0],[1],[trans(0,a,0),trans(0,b,2),
trans(0,d,1),trans(0,e,1),trans(2,d,1),trans(2,e,1)],[]) ?
yes
| ?- fsa_regex_transduces('{a:b,? -a}*',"ababac",L), atom_codes(Atom,L).
L = [98,98,98,98,98,99],
Atom = bbbbbc ?
yes
| ?-
All predicates that are imported have names starting with *fsa*. All module names start with fsa as well.
The package comes with a number of larger examples These examples include both automata and extended regular expression definitions.
Examples/Automata Dale Gerdemann provided regular expression operators which allows to input an automaton directly.
Examples/Booleans Dale Gerdemann provided collection of regular expression operators including boolean operators and various predicates of automata.
Examples/Bouma Gosse Bouma's finite-state automaton of all possible Dutch monosyllabic words.
Examples/DutchWords Dutch words, taken somewhere from ftp site (see ftp_info.txt). This list of words can then be used to experiment with the option to create perfect hashes (-dict2ph).
Examples/GerdemannVannoord99 The replace operator as defined in our EACL 99 paper. Also some further examples with longest match and finite-state parsing.
Examples/Graph2Phon Grapheme to Phoneme conversion for Dutch. Uses the Celex format for phonemes. By Gosse Bouma.
Examples/Grimley-Evans Implementation of the Hopcroft minimization algorithm by Edmund Grimley-Evans (does not work under YAP).
Examples/HMM HMM's can be seen as a special type of weighted finite automata. This example implements the Baum-Welch training algorithm. Fairly simple-minded implementation.
Examples/KaplanKay94 These examples are taken from Kaplan and Kay, Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3), 1994. Simple examples of transducers, and composition.
Examples/Karttunen91 These examples are taken from Karttunen, Finite-state Constraints, Proceedings International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang, 1991. Simple examples of transducers, and composition.
Examples/Karttunen95 Lauri Karttunen, The Replace Operator, ACL 1995, MIT Boston. Fairly complex examples of regular expression operator definitions. Buggy?.
Examples/Karttunen96 Lauri Karttunen, Directed Replacement, ACL 1996. Includes soundex example from MLTT home page.
Examples/Karttunen97 Lauri Karttunen, The Replace Operator, 1997. In volume edited by Roche and Schabes. Buggy?.
Examples/Mohri97 Simple examples of weighted automata.
Examples/MohriSproat96 Mehryar Mohri and Richard Sproat, An Efficient Compiler for Weighted Rewrite Rules. 34th Annual Meeting of the ACL, Santa Cruz 1996, pages 230-238. This only treats the non-weighted case. Nice example of the power of the regular expression language: their algorithm only takes a few paragraphs in FSA6.
Examples/Nederhof These are examples used by Mark-Jan Nederhof while investigating finite-state approximations of context-free grammars. The larger examples were used in my Computational Linguistics paper, The Treatment of Epsilon-moves in Subset Construction, available from http://www.let.rug.nl/~vannoord/papers/
Examples/Nerbonne examples from http://www.let.rug.nl/~nerbonne/teach.html material for a course on computational morphology. Simple examples of transducers.
Examples/OptimalityTheory Implementation of Lauri Karttunen, The Proper Treatment of Optimality in Computational Phonology. FSMNLP 1998, Ankara. Includes definition of lenient composition operator and syllibification algorithm. Also includes Gerdemann/van Noord (even more proper?) alternative implementation.
Examples/PredModules examples of predicate modules; for example using bitvectors to represent sets of symbols, or using types. The bitvector stuff is only available under SICStus.
Examples/PereiraRiley96 Fernando C. N. Pereira and Michael D. Riley, Speech Recognition by Composition of Weighted Finite Automata, 1996 (on cmp-lg). Also appears as chapter 15 of the volume edited by Roche and Schabes. Simple examples of weighted composition. Definition of their version of the composition operator (filtering our spurious paths).
Examples/Queens Solving the N-queens problem with regular expressions, by Dale Gerdemann. Another solution by G. van Noord. Interesting examples of definitions of regular expression operators.
Examples/Random Random automata. Used for the experiments documented in my FSMNLP98 paper, on the treatment of epsilon-moves in subset construction.
Examples/Recognizers Small examples.
Examples/RocheSchabes95 Emmanuel Roche and Yves Schabes, Deterministic Part-of-speech Tagging with Finite-state Transducers, Computational Linguistics, 21(2), 1995. Small examples of transducers. Also includes a definition of the local extension operator.
Examples/RocheSchabes97 Roche and Schabes, Introduction. In: Roche and Schabes (eds), Finite State Language Processing. MIT Press 1997. Includes implementations of is_functional, unambiguous, is_subsequential, build_bimachine, bitransform. Also has simple utils to apply and visualize bimachines.
Examples/SemiringModules contains examples of others types of transducers (semi-rings), apart from the two built-in transducer types `fsa_strings' and `fsa_weights'.
Examples/Spell implements a simple spell-checker as the combination of a dictionary and strings within Levenshtein distance d of the words in the dictionary (for some fixed d). Interesting application of the priority union operator of Karttunen (1998).
Examples/Transducers small stuff, including my attempt to translate Dutch temporal expression into a numerical format (that one is quite large, in fact).
Examples/twolevel Definitions to implement twolevel rules in the style of Kimmo. Mostly by Rob Malouf.
Examples/Weights small stuff, weighted automata.
Examples/Wordgraphs some small acyclic weighted automata.
Alfred V. Aho and John E. Hopcroft and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms. Addison Wesley, 1974.
Meera Blattner and Tom Head, Single-Valued a-Transducers. In: Journal of Computer and System Sciences, 15. Pp. 310--327. 1977.
J.A. Brzozowski. Canonical Regular Expressions and Minimal State Graphs for Definite Events. In: Mathematical Theory of Automata, Polytechnic Press Brooklyn, 1962.
Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.
Jan Daciuk, Incremental Construction of Finite-State Automata and Transducers and their use in the Natural Language Processing. Thesis, Politechnika Gdanska, 1998.
Jeffrey Friedl. Mastering Regular Expressions. O'Reilly 1997.
Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.
John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In: Z. Kohavi (editor), The Theory of Machines and Computations. Academic Press 1971.
John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley 1979.
C. Douglas Johnson. Formal Aspects of Phonological Descriptions. Mouton The Hague, 1972.
Ronald M. Kaplan and Martin Kay. Regular Models of Phonological Rule Systems. Computational Linguistics 20 (3) 1994.
Lauri Karttunen. Finite-state Constraints. In: Proceedings International Conference on Current Issues in Computational Linguistics. Unviersiti Sains Malaysia, Penang. 1991.
Lauri Karttunen. The Replace Operator. 33rd ACL, Boston, 1995.
Lauri Karttunen and Jean-Pierre Chanod and Gregory Grefenstette and Anne Schiller, Regular Expressions for Language Engineering. Natural Language Engineering 2 (4) 1996.
Lauri Karttunen, Directed Replacement. ACL 1996 Santa Cruz.
Lauri Karttunen, The Proper Treatment of Optimality Theory in Computational Phonology. In: FSMNLP 1998, Ankara.
George Anton Kiraz and Edmund Grimley-Evans, Multi-Tape Automata for Speech and Language Systems: A Prolog Implementation. In D. Wood and S. Yu (eds.), Automata Implementation, Lecture Notes in Computer Science 1436, Springer, 1998.
Ted Leslie, Efficient Approaches to Subset Construction. University of Waterloo 1995.
Mehryar Mohri, Compact Representations by Finite-state Transducers. ACL New Mexico 1994.
Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics 23 (2) 1997.
Mehryar Mohri and Fernando C.N. Pereira and Michael Riley. A rational design for a weighted finite-state transducer library. In: Automata Implementation. WIA '97. Lecture Notes in Computer Science 1436. Spring Verlag 1998.
Mehryar Mohri and Richard Sproat. An Efficient Compiler for Weighted Rewrite Rules. ACL 1996 Santa Cruz.
Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.
Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.
Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.
Emmanuel Roche and Yves Schabes. Deterministic Part-of-speech Tagging with Finite-state Transducers. Computational Linguistics 21 (2) 1995.
Emmanuel Roche and Yves Schabes. Finite-State Language Processing. MIT Press 1997.
Bruce Watson, Taxonomies and Toolkits of Regular Language Algorithms. Ph.D. thesis TU Eindhoven. 1996.
Bruce Watson, Implementing and Using Finite Automata Toolkits. In: Proceedings of the ECAI'96 Workshop Extended Finite State Models of Language. 1996.
Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.
Gosse Bouma, A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 303-310 Seattle 2000.
Gosse Bouma, A modern Computational Linguistic Course using Dutch. In: Frank van Eynde and Ineke Schuurman, editors, CLIN 1998, Papers from the ninth CLIN Meeting, Amsterdam, 1999. Rodopi Press.
Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.
George Anton Kiraz, Multi-Tiered Nonlinear Morphology Using Multi-Tape Finite Automata: A case study on Syriac and Arabic. In: Computational Linguistics, 26 (1) 2000.
Edwin Kuipers, Ill-formed Input and Finite-state Devices. MA Thesis, Rijksuniversiteit Groningen, 1997.
Gertjan van Noord, FSA Utilities: Manipulation of Finite-state Automata implemented in Prolog, in: Proceedings of the First International Workshop on Implementing Automata. London Ontario Canada, 1996.
Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.
Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.
Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.
Markus Walther, Finite-State Reduplication in One-Level Prosodic Morphology. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 296-302 Seattle 2000.
Markus Walther, One-Level Prosodic Morphology. Marburger Arbeiten zur Linguistic 1, University of Marburg. 64 pp.
Up-to-date information on the program can be obtained via http://www.let.rug.nl/~vannoord/Fsa/. The latest version of the program should be available there too.
For information on the daVinci program, we refer to its homepage http://www.informatik.uni-bremen.de/~inform/forschung/daVinci/.
For information on dot/GraphViz, we refer to http://www.research.att.com:80/sw/tools/graphviz/.
For information on the VCG program: http://www.cs.uni-sb.de/RW/users/sander/html/gsvcg1.html
There is lots of interesting material at MLTT Xerox Grenoble: http://www.rxrc.xerox.com/research/mltt/fst/. Be sure to read the documentation, including a number of nice examples.
AT&T's FSM toolset for weighted finite-state automata is available from http://www.research.att.com/sw/tools/fsm.
For a web interface to FSA, refer to http://www.let.rug.nl/~vannoord/fsademo/.
For a web interface to FSA3, c.f.: http://i12www.ira.uka.de/Visualisierung.endlicher.Automaten/.
A tutorial for FSA by Gosse Bouma, in Dutch: http://www.let.rug.nl/~gosse/tt/fsa.html
An even simpler tutorial for FSA (in Dutch as well) used for highschool kids (!) is available as http://www.let.rug.nl/~vannoord/fsademo/fsademo/klas.html
Electronic versions of some of the papers mentioned above are available through the cmp-lg archive at http://xxx.lanl.gov:80/cmp-lg/.
A list of related projects at University of Western Ontario by Darrell Raymond is http://www.csd.uwo.ca/staff/drraymon/.grail/links.html. You can also obtain a copy of Ted Leslie's thesis from that site, which includes the algorithm to generate random automata, and which discusses density of automata related to determinization.
Finite state Utilities by Jan Daciuk at http://www.pg.gda.pl/~jandac/fsa.html. Useful tools for dictionary construction and spell checking. Also read his dissertation.
More finite-state software at Ribbit Software, http://www.RibbitSoft.com/ist/variants.html, mostly by Bruce Watson.
SICStus Prolog home page: http://www.sics.se/isl/sicstus.html.
Collection of links on Prolog and Regular Expressions: http://www.let.rug.nl/~vannoord/prolog-rx/PrologAndRegex.html
FILFLA Malta: Relic: Regular Languages Interactive Classroom http://www.cs.um.edu.mt/~gpace/relic_www/
Wiese's Little Automata Builder at http://www-ti.informatik.uni-tuebingen.de/~wiese/Automaton/
Another Java applet for finite automata at http://www.cs.duke.edu/~rodger/tools/jflap/index.html
Interesting papers on Gene Myers' Home Page http://www.cs.arizona.edu/people/gene
Copyright c 1995 - 1999 by Gertjan van Noord. This program is distributed under Gnu General Public License (cf. the file COPYING in distribution).
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Helpful suggestions, feedback, etc., from a variety of people, too many to list here. Gosse Bouma and Dale Gerdemann were exceptionally influential.
Gertjan van Noord, mailto:vannoord@let.rug.nl