PC-KIMMO REFERENCE MANUAL
                      version 1.0, May 1990
                         Evan L. Antworth
          Copyright 1990 Summer Institute of Linguistics

1 Introduction and technical specifications.....................1
2 Installing PC-KIMMO...........................................2
3 Starting PC-KIMMO.............................................3
4 Entering commands and getting on-line help....................4
5 Command reference by function.................................5
  5.1  Get help.................................................5
  5.2  Load rules and lexicon...................................5
  5.3  Select new language......................................6
  5.4  Take commands from a file................................6
  5.5  List rule names, feasible pairs, or sublexicon names.....7
  5.6  Set system parameters....................................7
  5.7  Turn logging on or off...................................9
  5.8  Show system status.......................................9
  5.9  Show rule or sublexicon..................................9
  5.10 Generate surface forms from a lexical form..............10
  5.11 Recognize lexical forms from a surface form.............10
  5.12 Compare data from a file................................10
  5.13 Generate forms from a file..............................12
  5.14 Recognize forms from a file.............................12
  5.15 Execute an operating system command.....................12
  5.16 Edit a file.............................................13
  5.17 Halt the program........................................13
6 Alphabetic list of commands..................................13
7 File formats.................................................17
  7.1 Rules file...............................................17
  7.2 Lexicon file.............................................20
  7.3 Generation comparison file...............................22
  7.4 Recognition comparison file..............................23
  7.5 Pairs comparison file....................................23
  7.6 Generation file..........................................24
  7.7 Recognition file.........................................24
  7.8 Summary of default file names and extensions.............24
8 Trace formats................................................25
  8.1 Generator trace..........................................26
  8.2 Recognizer trace.........................................28
9 Algorithms...................................................32
  9.1 Generating surface forms.................................32
  9.2 Recognizing lexical forms................................33
10 Error messages..............................................37
  10.1 Errors related to reading and parsing commands..........37
  10.2 Errors related to reading the rules file................39
  10.3 Errors related to reading the lexicon file..............43
  10.4 Errors related to recognizing or generating a form......45
  10.5 Errors that abort program execution.....................46
References.....................................................46
Errata.........................................................46


1 Introduction and technical specifications

PC-KIMMO is a new implementation for microcomputers of a program
dubbed KIMMO after its inventor Kimmo Koskenniemi. Koskenniemi's
two-level model was designed to generate and recognize words (see
Koskenniemi 1983). Work on PC-KIMMO was begun in 1985, following
the specifications of the LISP implementation of Koskenniemi's
model described in Karttunen 1983. The aim was to develop a
version of the two-level processor that would run on an IBM PC

PC-KIMMO Reference Manual                                  Page 2

compatible computer and that would include an environment for
testing and debugging a linguistic description. The PC-KIMMO
program is actually a shell program that serves as an interactive
user interface to the primitive PC-KIMMO functions. These
functions are available as a source code library that can be
included in a program written by the user.

The coding has been done in Microsoft C by David Smith and
Stephen McConnel under the direction of Gary Simons and under the
auspices of the Summer Institute of Linguistics. Every effort has
been made to maintain portability. Both the PC-KIMMO shell and
the program modules will run on any hardware using MS-DOS or
PC-DOS version 2.0 or higher. It can be run with as little as
256KB of memory, but will use up to 640KB. PC-KIMMO has also been
compiled and tested for UNIX System V (SCO UNIX V/386 and A/UX)
and for 4.2 BSD UNIX.

We have also ported PC-KIMMO to the Macintosh, though it retains
its command-line interface rather than using the graphical user
interface one expects from Macintosh programs. Also, a few
commands are not available in the Macintosh version; see the
README file on the Macintosh version of the PC-KIMMO release
diskette for detailed information.

There are two versions of the PC-KIMMO release diskette, one for
IBM PC compatibles and one for the Macintosh. Each contains the
executable PC-KIMMO program, examples of language descriptions,
and the source code library for the primitive PC-KIMMO functions.
The PC-KIMMO executable program and the source code library are
copyrighted but are made freely available to the general public
under the condition that they not be resold or used for
commercial purposes.

For those who wish to compile PC-KIMMO for their UNIX system, the
complete source code for both the user shell and the primitive
functions is available for the cost of the media and shipping
from Academic Computing Department, Summer Institute of
Linguistics, 7500 W. Camp Wisdom Road, Dallas, TX 75236.

The English description referred to in this chapter is based on
Karttunen and Wittenburg 1983 as modified by Steve Echerd and
Evan Antworth; see appendix A for a detailed exposition of the
English description. The English files are found in the ENGLISH
subdirectory on the PC-KIMMO release diskette.


2 Installing PC-KIMMO

The following instructions apply to installing the IBM PC version
of PC-KIMMO. Most of the information is also consistent with the
UNIX version. For information on installing and running the
Macintosh version, see the README file on the Macintosh version
of the PC-KIMMO release diskette.

If your computer has floppy disks only, make a working copy of
the PC-KIMMO release diskette that came with this book. Store the
original in a safe place. Insert your working copy of the
PC-KIMMO diskette in drive A of your machine.


PC-KIMMO Reference Manual                                  Page 3

If your computer has a hard disk, use the INSTALL.BAT procedure
on the PC-KIMMO diskette to install the system on your hard disk.
To do this, insert the PC-KIMMO diskette in one of your disk
drives. Type A: (or whatever the name of the drive is) in order
to log control to that disk. Now type INSTALL followed by the
name of the hard disk on which you want to install PC-KIMMO (for
instance, INSTALL C:). This will create a subdirectory called
PCKIMMO on your hard disk and copy the contents of the release
diskette (with all its subdirectories) into it.

Whether you are using a floppy or hard disk system, the operating
system's PATH variable must be set to include the directory where
the PC-KIMMO program is found. The AUTOEXEC.BAT file on your boot
disk should contain a path statement that specifies all the disks
and directories that contain programs. On a floppy disk system,
the path statement should include as a minimum the root directory
of drive A, for instance, PATH=A:\. On a hard disk system, add ;
C:\PCKIMMO to the end of the path statement. For the path
statement to become effective, you must reboot the computer. (If
you want to change the path variable without changing the
AUTOEXEC.BAT file and rebooting, enter a path command directly at
the operating system prompt.)

In order to use PC-KIMMO's EDIT command, you must set the
operating system environment variable EDITOR to the name of your
text editing program. This is done by including in the
AUTOEXEC.BAT file a line of this form:

        SET EDITOR=<filespec>

where <filespec> specifies the path and full file name of your
editing program. For example, if your editor's file name is
EMACS.EXE and is found in the UTIL subdirectory directly under
the root directory, include this line:

        SET EDITOR= \UTIL\EMACS.EXE


3 Starting PC-KIMMO

Be sure that DOS is logged onto the drive where PC-KIMMO is
located. To change to the subdirectory that contains the English
example, enter CD \ENGLISH on a floppy disk system, or CD
\PCKIMMO\ENGLISH on a hard disk system. Now type PCKIMMO (if your
PATH variable is not correctly set to include the PC-KIMMO
subdirectory, type ..\PCKIMMO). When PC-KIMMO has successfully
started up, you will see a version message and the PC-KIMMO
command line prompt.

PC-KIMMO can also be started with optional command line
arguments. The format of the command line is:

  pckimmo [-c  <char>] [-r  <rulefile>] [-l  <lexfile>] [-t <cmdfile>]

The options are used as follows:

    o The -c option changes the character used to delimit
comments in files used by PC-KIMMO. The argument <char> is a
single character. If this option is not specified, the semicolon
(;) will be used as the comment delimiter. This option is

PC-KIMMO Reference Manual                                  Page 4

equivalent to issuing the SET COMMENT command from the program
prompt.

    o The -r option specifies a rules file to be loaded. It is
equivalent to issuing the LOAD RULES command from the program
prompt.

    o The -l option specifies a lexicon file to be loaded. It is
equivalent to issuing the LOAD LEXICON command from the program
prompt. It must be used with the -r option.

    o The -t option specifies a command file from which PC-KIMMO
reads and executes commands. It is equivalent to issuing the TAKE
command from the program prompt.


4 Entering commands and getting on-line help

The user interacts with PC-KIMMO by entering commands at the
command line prompt, in much the same way that one enters
commands at the operating system prompt. Case is ignored for all
command keywords. Keywords can be shortened to any unambiguous
form. For instance, LOAD RULES, LOAD RUL, LOAD R, and LOA R are
all acceptable. Typing just L is ambiguous for the commands LOAD,
LOG, and LIST. However, because LOAD is such a frequently used
command, it takes special precedence over the other commands
beginning with L, which means that typing just L will execute
only the LOAD command.

PC-KIMMO can be used with a TSR (Terminate and Stay Resident)
command line editor such as CED or NDOSEDIT. This allows the user
to recall and edit several previous command lines. The list of
previous PC-KIMMO command lines is kept separate from the list of
previous operating system command lines. If you exit PC-KIMMO and
then run it again, the set of command lines from your previous
PC-KIMMO session is still available. Neither of the command line
editors remembers a command shorter than three characters. It
should be noted that CED uses the ^ character as a kind of
"virtual carriage return." This means that forms containing ^ as
an alphabetic character cannot be entered from the keyboard with
the GENERATE and RECOGNIZE commands, though of course such words
can be read from a file.

Screen scrolling can be halted by pressing Ctrl-S (that is, hold
down the Ctrl (Control) key and press S); any key will resume
scrolling.

Processing can be interrupted by pressing Ctrl-C. Note that this
action does not abort PC-KIMMO, but returns it to the program
prompt. It is useful for stopping a long screen display (such as
a trace) or a file processing command.

Pressing Ctrl-P causes screen output to be echoed to the printer.
Pressing Ctrl-P again stops printer echoing.

There are several ways to get on-line help:

    o To get a list of the available commands, type ?.

    o To get information on what these commands do, type HELP.

PC-KIMMO Reference Manual                                  Page 5

    o To get the specific syntax and use for a command, type HELP
plus a specific command name.

    o To get a list of the keywords that can go with a particular
command, type the command name followed by ?. Note however that
if the command does not take a keyword it will be executed; for
instance typing NEW ? will execute the NEW command.


5 Command reference by function

The following subsections document each command, arranged by
function, of the PC-KIMMO system. Square brackets in the command
line summaries indicate optional elements. The notation {x | y}
means either x or y (but not both). Command keywords and
arguments in boldface are typed literally; for instance, the
command summary SET TRACING {ON | OFF} means to type either SET
TRACING ON or SET TRACING OFF. Command arguments in italics are
replaced by elements of the specified type; for instance, the
command summary SET COMMENT <char> means to replace <char> with a
single character, such as set comment ;.


5.1 Get help

?

Displays a list of command names.


HELP [<command-name>]

Issuing the HELP command with no argument displays a list of
commands with a brief description of their function. Issuing the
HELP command with the name of a specific command displays a usage
summary for the command.


<command-name> ?

Typing a command name followed by ?, instead of a keyword,
displays a message listing the keywords expected for that
command.


5.2 Load rules and lexicon

The LOAD command is used to load either rules or a lexicon from a
file.

LOAD RULES [<filespec>]

The LOAD RULES command loads a set of rules from the file
specified on the command line. The <filespec> can contain a path,
for example, B:\ENGLISH\ENGLISH.RUL. The default file name
extension is .RUL; thus, the command LOAD RULES ENGLISH will load
the file ENGLISH.RUL. If no file name is given, the default file
name RULES.RUL is used. The rules file must be in the format
described later in this chapter (see section 7.1).


PC-KIMMO Reference Manual                                  Page 6

An error in the format of the rules file will cause the program
to stop loading the file, erase the rules already loaded, and
report an error message with the line number where the error was
encountered. Refer to section 10 on error messages for more
details.

The rules file must be loaded before the lexicon and before
performing any generation or recognition operations.

The LOAD RULES command can also be invoked by using the -r
command line option when starting up PC-KIMMO (see section 3).


LOAD LEXICON [<filespec>]

The LOAD LEXICON command loads a lexicon from the file specified
in the command line. The <filespec> can contain a path, for
example, B:\ENGLISH\ENGLISH.RUL. The default file name extension
is .LEX; thus, the command LOAD LEXICON ENGLISH will load the
file ENGLISH.LEX. If no file name is given, the default file name
LEXICON.LEX is used. The lexicon file must be in the format
described later in this chapter (see section 7.2).

An error in the format of the lexicon file will cause the program
to stop loading the file, erase the parts of the lexicon already
loaded, and report an error message with the line number where
the error was encountered. Refer to section 10 on error messages
for more details.

The rules file must be loaded before the lexicon. The lexicon
file must be loaded before performing any recognition operations.
A generation operation can be performed without loading the
lexicon.

The LOAD LEXICON command can also be invoked by using the -l
command line option when starting up PC-KIMMO (see section 3).


5.3 Select new language

NEW

The NEW command clears the rules and lexicon currently loaded.
Strictly speaking it is not needed, since the LOAD RULES command
erases all existing rules and the LOAD LEXICON command erases any
existing lexicon.


5.4 Take commands from a file

TAKE [<filespec>]

The TAKE command causes PC-KIMMO to read and execute commands
from a file. The <filespec> can contain a path, for example,
B:\KIMMO\ENGLISH.TAK. The TAKE command recognizes the default
file name PCKIMMO.TAK and the default file extension .TAK. The
command file can itself issue the TAKE command to call another
command file down to a depth of three files. That is, the user
can specify a command file <file1> that contains the command TAKE

PC-KIMMO Reference Manual                                  Page 7

<file2>, that itself contains the command TAKE <file3>. It would
be an error for <file3> to contain a TAKE command.

A command file can also be specified by using the -t command line
option when starting up PC-KIMMO (see section 3). Note that a
command file cannot submit forms to the special generator and
recognizer prompts (see sections 5.10 and 5.11).


5.5 List rule names, feasible pairs, or sublexicon names

The LIST command is used to display either rule names, feasible
pairs, or sublexicon names.

LIST PAIRS

The LIST PAIRS command displays on the screen the set of feasible
pairs specified by the set of rules currently turned on.


LIST RULES

The LIST RULES command displays on the screen the current state
of the rules that are loaded. The display consists of each rule
by number, an indication of whether the rule is on or off, and
the rule name from the header lines of its state table in the
rules file.


LIST LEXICON

The LIST LEXICON command displays on the screen the names of the
sublexicons of the lexicon currently in use.


5.6 Set system parameters

The SET command is used to turn tracing on or off, to turn on or
off certain rules, to turn on or off various processing flags,
and to change the comment delimiter character.

SET TRACING {ON | OFF | <level>}

The SET TRACING command allows you turn the tracing mechanism on
or off. When tracing is on, details of the analysis of a form are
displayed on the screen during generation or recognition
operations. If logging (see section 5.7) is on, the trace will
also be written to the log file. Tracing is operative for these
commands:  GENERATE, RECOGNIZE, FILE COMPARE GENERATE, FILE
COMPARE RECOGNIZE, FILE COMPARE PAIRS, FILE GENERATE, and FILE
RECOGNIZE.

The amount of detail shown in the trace display is set by the
tracing level. The <level> argument to the SET TRACING command
can range from 0 to 3, where 0 is no tracing at all and 3 is the
most detailed level of tracing. Issuing the command SET TRACING
OFF sets tracing to level 0. Issuing the command SET TRACING ON
sets tracing to level 2. At level 1, no information is given as
to which feasible pair is being tried or the condition of the
rules (that is, what state each automaton is in). Both the

PC-KIMMO Reference Manual                                  Page 8

generator and recognizer report each RESULT line, with all NULL
symbols being explicitly printed. The recognizer also displays
lexicon information; that is, it reports which sublexicon is
being entered or backed out of. At level 2, the feasible pairs
being tried and the state of each rule (automaton) is displayed.
The recognizer displays lexicon information as it does at level
1. At level 3, more detailed information is given on which
feasible pairs are being tried and the state of each rule. For
more information on the format of the trace display, see section
8 on trace formats.


SET RULES {ON | OFF} {<list-of-numbers> | ALL}

The SET RULES command allows you to turn selected rules on or off
for testing or debugging purposes. When a rule is turned off, it
is completely ignored in the recognition or generation of forms.
One effect of this is to cause the recalculation of feasible
pairs, considering only the rules which remain on. Use the LIST
PAIRS command to see the set of feasible pairs currently in use.

On the command line, you can specify the action ON or OFF
followed by a list of rule numbers or the keyword ALL (in which
case all rules are turned on or off). Specific rules are turned
on or off by listing their rule numbers (shown by the LIST RULES
command), each separated by a space.


SET COMMENT <char>

The SET COMMENT command changes the comment delimiter character
(see section 7). The default is semicolon (;). The comment
delimiter can also be set with the -c command line option when
starting up PC-KIMMO (see section 3).


SET LIMIT {ON | OFF}

The SET LIMIT command limits the result of a generation or
recognition function to one form. That is, if limit is set off,
then PC-KIMMO backtracks after finding a correct result so that
it can find every possible result. With limit set on, after
finding one correct result form PC-KIMMO does not backtrack to
try to find more results.


SET TIMING {ON | OFF}

The SET TIMING command uses the computer's system clock to time
the execution of generation and recognition operations. It
displays the result as the number of seconds the operation
lasted. It applies to these commands:  GENERATE, RECOGNIZE, FILE
COMPARE GENERATE, FILE COMPARE RECOGNIZE, FILE COMPARE PAIRS,
FILE GENERATE, and FILE RECOGNIZE.


SET VERBOSE {ON | OFF}

The SET VERBOSE command affects the amount of information
displayed on the screen during a file comparison operation

PC-KIMMO Reference Manual                                  Page 9

(either generate, recognize, or pairs, see section 5.12). If
verbose is set off, a file comparison operation displays only a
dot for each form correctly analyzed, though any exceptional
results will cause the complete form and warning messages to be
displayed. If verbose is set on, a file comparison operation
displays the complete contents of the file (minus comments) plus
confirmation and warning messages.


5.7 Turn logging on or off

The LOG and CLOSE commands are used to turn logging on and off.

LOG [<filespec>]

The LOG command turns the logging mechanism on. When logging is
on, the information displayed on the screen during execution of
generation or recognition operations is also written to a disk
file whose name is specified in the command line. The <filespec>
can contain a path, for example, B:\ENGLISH\ENGLISH.LOG. If no
file name is given, a log file named PCKIMMO.LOG is written to
the default directory. If a LOG command is given when a log file
is already open, then the open log file is closed before the new
log file is created. Logging records the processing of these
commands:  GENERATE, RECOGNIZE, FILE COMPARE GENERATE, FILE
COMPARE RECOGNIZE, FILE COMPARE PAIRS, FILE GENERATE, and FILE
RECOGNIZE. Tracing displays are also recorded in a log file.


CLOSE

The CLOSE command turns logging off and closes the log file.


5.8 Show system status

The STATUS command is used to display on the screen the status of
various system parameters.

STATUS

The STATUS command displays the names of the rules and lexicon
files currently loaded, the name of the log file (if logging is
on), the comment delimiter character, and the status of the
limit, timing, tracing and verbose flags. It can also be invoked
with the synonyms SHOW STATUS or SHOW.


5.9 Show rule or sublexicon

SHOW RULE <rule-number>

The SHOW RULE command first displays the number, on/off status,
and name of the rule (similar to the LIST RULES command). If the
rule is turned on, it then displays each column header of the
state table for that rule with the set of feasible pairs that it
specifies. This command is used primarily for debugging purposes.


PC-KIMMO Reference Manual                                 Page 10

SHOW LEXICON <sublexicon-name>

The SHOW LEXICON command displays the contents of a sublexicon.
It shows each lexical item, its gloss, and its continuation
class. If the continuation class of a lexical entry names an
alternation, the alternation is expanded into a list of
sublexicon names. Note that this command displays the parts of
the lexical entry in the following order (rather than the order
in which they appear in the lexicon file):  lexical item, gloss,
continuation class.


5.10 Generate surface forms from a lexical form

GENERATE [<lexical-form>]

The GENERATE command accepts as input a lexical form and returns
one or more surface forms. If no lexical form argument is given,
PC-KIMMO supplies a special generator prompt where forms can be
typed in directly without the GENERATE keyword. Entering a blank
line at the generator prompt returns the program to the main
command line prompt.


5.11 Recognize lexical forms from a surface form

RECOGNIZE [<surface-form>]

The RECOGNIZE command accepts as input a surface form and returns
one or more lexical forms. If no surface form argument is given,
PC-KIMMO supplies a special recognizer prompt where forms can be
typed in directly without the RECOGNIZE keyword. Entering a blank
line at the recognizer prompt returns the program to the main
command line prompt.


5.12 Compare data from a file

The COMPARE commands compare data prepared by the user to the
results of data processed by PC-KIMMO. The data are contained in
files whose formats are described in section 7.

[FILE] COMPARE GENERATE [<filespec>]

The COMPARE GENERATE command reads lexical forms from a file,
submits them to the generator for analysis, and compares the
resulting surface form(s) with the expected results listed in the
file. The <filespec> can contain a path, for example,
B:\ENGLISH\ENGLISH.GEN. A generation comparison file has the
default extension .GEN and the default file name DATA.GEN. The
format of the generation comparison file is described in section
7.3.

Results of the comparison are reported according to the setting
of the verbosity flag (see the SET VERBOSE command described in
section 5.6). If verbosity is set off, only exceptions (that is,
actual results from the generator that are different from the
expected results as specified in the file) are reported. A dot is
displayed on the screen as each input (lexical) form is
processed. If verbosity is set on, each group of lexical and

PC-KIMMO Reference Manual                                 Page 11

surface forms in the file is displayed, either with an error
message for wrong comparisons or the message OK if the actual and
expected results match exactly.


[FILE] COMPARE RECOGNIZE [<filespec>]

The COMPARE RECOGNIZE command reads surface forms from a file,
submits them to the recognizer for analysis, and compares the
resulting lexical form(s) with the expected results specified in
the file. The <filespec> can contain a path, for example,
B:\ENGLISH\ENGLISH.REC. A recognition comparison file has the
default extension .REC and the default file name DATA.REC. The
format of the recognition comparison file is described in section
7.4.

Results of the comparison are reported according to the setting
of the verbosity flag (see the SET VERBOSE command described in
section 5.6. If verbosity is set off, only exceptions (that is,
actual results from the recognizer that are different from the
expected results as specified in the file) are reported. A dot is
displayed on the screen as each input (surface) form is
processed. If verbosity is set on, each group of surface and
lexical forms in the file is displayed, either with an error
message for wrong comparisons or the message OK if the actual and
expected results compared identically.


[FILE] COMPARE PAIRS [<filespec>]

The COMPARE PAIRS command allows lexical:surface pairs of forms
listed in the file specified on the command line to be compared
in both directions. The <filespec> can contain a path, for
example, B:\ENGLISH\ENGLISH.PAI. A pairs comparison file has the
default extension .PAI and the default file name DATA.PAI. The
format of the pairs comparison file is described in section 7.5.

PC-KIMMO considers each pair of forms (a lexical form followed by
its surface form). The lexical form is input to the generator to
produce one or more surface forms. The surface form listed in the
file is compared with the generated surface forms to see if there
is a successful match. The surface form listed in the file is
then input to the recognizer to produce one or more lexical
forms. The lexical form listed in the file is compared with the
recognized lexical forms to see if there is a successful match.

Results of the comparison are reported according to the setting
of the verbosity flag (see the SET VERBOSE command described in
section 5.6). If verbosity is set off, only exceptions (that is,
one of the comparisons failed) are reported. A dot is displayed
on the screen as each pair of forms is processed. If verbosity is
set on, each pair of lexical and surface forms in the file is
displayed, either with an error message for wrong comparisons or
the message OK if the forms match exactly.


PC-KIMMO Reference Manual                                 Page 12

5.13 Generate forms from a file

FILE GENERATE <input-filespec> [<output-filespec>]

The FILE GENERATE command reads lexical forms from a file,
submits them to the generator for analysis, and returns each
lexical form followed by the resulting surface form(s). The
format of the generation input file is described in section 7.6.

If an <output-filespec> argument is specified, the results are
written to that file; otherwise, the results are displayed on the
screen. The format of the output file created by this command is
identical to a comparison generation file. The <filespec> of
either file can contain a path, for example,
B:\ENGLISH\ENGLISH.LST. The command does not recognize any
default file names or extensions.

The verbosity flag (see the SET VERBOSE command described in
section 5.6) has no effect on the FILE GENERATE command.


5.14 Recognize forms from a file

FILE RECOGNIZE <input-filespec> [<output-filespec>]

The FILE RECOGNIZE command reads surface forms from a file,
submits them to the recognizer for analysis, and returns each
surface form followed by the resulting lexical form(s). The
format of the recognition input file is described in section 7.7.
If an <output-filespec> argument is specified, the results are
written to that file; otherwise the results are displayed on the
screen. The format of the output file created by this command is
identical to a comparison recognition file. The <filespec> of
either file can contain a path, for example,
B:\ENGLISH\ENGLISH.LST. The command does not recognize any
default file names or extensions.

The verbosity flag (see the SET VERBOSE command described in
section 5.6) has no effect on the FILE RECOGNIZE command.

For details on the format of the recognition input file, see
section 7.7.


5.15 Execute an operating system command

SYSTEM [<system-command>]

The SYSTEM command allows you to execute an operating system
command from within PC-KIMMO. For example, on an IBM
PC-compatible computer, the command SYSTEM DIR will execute the
DOS directory command. If no command argument is given, then
PC-KIMMO is pushed into the background and a new system command
processor shell is started. While you are in the shell, you can
execute any commands or programs. To leave the shell and return
to PC-KIMMO, type EXIT. On an IBM PC-compatible computer, the
SYSTEM command will not work unless a copy of the DOS system file
COMMAND.COM is available. Note that if you are running PC-KIMMO
under MS-DOS version 2, issuing the SYSTEM command with no
argument will NOT invoke a new processor shell. To get a new

PC-KIMMO Reference Manual                                 Page 13

shell you must enter the command SYSTEM COMMAND. This will directly
execute COMMAND.COM. Type EXIT to return to PC-KIMMO.

The system command has the alias ! (exclamation point), which
does not require a space between it and the following command.
For example, !DIR performs the DOS directory command.


5.16 Edit a file

EDIT <filespec>

The EDIT command attempts to edit a file using the editing
program specified by the operating system environment variable
EDITOR. If this environment variable is not defined, then the
command will try to use EDLIN (on a DOS machine) or vi (on a UNIX
machine) to edit the file. To set the environment variable,
include a line such as this in your AUTOEXEC.BAT file:

        SET EDITOR=<filespec>

where <filespec> specifies the path and full file name of your
editing program, for example, \UTIL\EMACS.EXE.

You can use the EDIT command, for example, to invoke your text
editor and modify the rules or lexicon files. After saving the
files and leaving the editor, you must LOAD the files again in
order for PC-KIMMO to utilize the changes.


5.17 Halt the program

EXIT

The EXIT command causes PC-KIMMO to exit back to the operating
system.


QUIT

The command QUIT is the same as EXIT.


6 Alphabetic list of commands

This section documents each command, arranged alphabetically, of
the PC-KIMMO system. Square brackets in the command line
summaries indicate optional elements. The notation {x | y} means
either x or y (but not both). Command keywords and arguments in
boldface are typed literally; for instance, the command summary
SET TRACING {ON | OFF} means to type either SET TRACING ON or SET
TRACING OFF. Command arguments in italics are replaced by
elements of the specified type; for instance, the command summary
SET COMMENT <char> means to replace <char> with a single
character, such as set comment ;.


PC-KIMMO Reference Manual                                 Page 14

! [<system-command>]

Executes an operating system command or invoke a new command
processor shell (same as SYSTEM).


?

Displays a list of command names.


CLOSE

Turns logging off and closes the log file.


EDIT <filespec>

Edits <filespec> using the editing program specified by the
operating system environment variable EDITOR.


EXIT

Exits PC-KIMMO and returns to the operating system.


[FILE] COMPARE GENERATE [<filespec>]

Reads lexical forms from <filespec>, submits them to the
generator, and compares the resulting surface form(s) with the
expected results listed in <filespec>.


[FILE] COMPARE RECOGNIZE [<filespec>]

Reads surface forms from <filespec>, submits them to the
recognizer, and compares the resulting lexical form(s) with the
expected results listed in <filespec>.


[FILE] COMPARE PAIRS [<filespec>]

Reads pairs of lexical and surface forms from <filespec> and
analyzes them to see if the surface form can generated from the
lexical form and the lexical form can be recognized from the
surface form.


FILE GENERATE <input-filespec> [<output-filespec>]

Reads a list of lexical forms from <input-filespec>, submits them
to the generator, and returns each lexical form followed by the
resulting surface form(s).


FILE RECOGNIZE <input-filespec> [<output-filespec>]

Reads a list of surface forms from <input-filespec>, submits them
to the recognizer, and returns each surface form followed by the
resulting lexical form(s).


PC-KIMMO Reference Manual                                 Page 15

GENERATE [<lexical-form>]

Accepts as input a lexical form and returns one or more surface
forms.


HELP [<command-name>]

Without a command name argument, displays a list of commands with
a brief explanation of each. With a command name argument,
displays a usage summary for the command.


LIST LEXICON

Displays on the screen the names of the sublexicons of the
lexicon currently in use.


LIST PAIRS

Displays the set of feasible pairs specified by the set of rules
currently turned on.


LIST RULES

Displays the current state of the rules that are loaded.


LOAD LEXICON [<filespec>]

Loads the lexicon from <filespec>.


LOAD RULES [<filespec>]

Loads rules from <filespec>.


LOG [<filespec>]

Turns the logging mechanism on.


NEW

Clears the rules and lexicon currently loaded.


QUIT

Same as EXIT.


RECOGNIZE [<surface-form>]

Accepts as input a surface form and returns one or more lexical
forms.


PC-KIMMO Reference Manual                                 Page 16

SET COMMENT <char>

Changes the comment delimiter character. The default is semicolon
(;).


SET LIMIT {ON | OFF}

Limits the result of a generation or recognition function to one
form.


SET RULES {ON | OFF} {<list-of-numbers> | ALL}

Turns selected rules on or off.


SET TIMING {ON | OFF}

Times the execution of generation and recognition functions and
displays the result.


SET TRACING {ON | OFF | <level>}

Turns the tracing mechanism on or off.


SET VERBOSE {ON | OFF}

Determines the amount of information shown on the screen during a
file comparison operation.


SHOW [STATUS]

Same as STATUS.


SHOW LEXICON <sublexicon-name>

Displays the contents of the named sublexicon. For each lexical
entry it shows the lexical form, gloss, and continuation class.


SHOW RULE <rule-number>

Displays the number, on/off status, and name of the rule (similar
to the list rules command). If the rule is turned on, it then
displays each column header of the state table for that rule with
the set of feasible pairs that it specifies.


STATUS

Displays the names of the rules and lexicon files currently
loaded, the name of the log file (if logging is on), the comment
delimiter character, and the status of the limit, timing,
tracing, and verbose flags. Obeys the synonyms SHATUS and SHOW.


PC-KIMMO Reference Manual                                 Page 17

SYSTEM [<system-command>]

Executes an operating system command or invokes a new command
processor shell. See also !.


TAKE [<filespec>]

Reads and executes commands from <filespec>.


7 File formats

This section describes the formats for the files that are used as
input to PC-KIMMO. In any of the files, comments can be added to
any line by preceding the comment with the comment delimiter
character. This character is normally a semicolon (;), but can be
changed either on the PC-KIMMO command line with the -c option
(see section 3) or with the SET COMMENT command (see section
5.6). Anything following a comment delimiter (until the end of
the line) is considered part of the comment and is ignored by
PC-KIMMO.

In the descriptions below, reference to the use of a space
character implies any whitespace character (that is, any
character treated like a space character). The following control
characters when used in a file are whitespace characters:  ^I (
ASCII 9, tab), ^J ( ASCII 10, line feed), ^K ( ASCII 11, vertical
tab), ^L ( ASCII 12, form feed), and ^M ( ASCII 13, carriage
return).

The control character ^Z ( ASCII 26) cannot be used because
MS-DOS interprets it as marking the end of a file. Also the
control character ^@ ( ASCII 0, null) cannot be used.

Examples of each of the following file types are found on the
release diskette as part of the English description.


7.1 Rules file

The general structure of the rules file is a list of declarations
composed of a keyword followed by data. The set of valid keywords
is ALPHABET, NULL, ANY, BOUNDARY, SUBSET, RULE, and END. Only the
SUBSET and RULE keywords can appear more than once. The ALPHABET
declaration must appear first in the file. The other declarations
can appear in any order. The NULL, ANY, BOUNDARY, and SUBSET
declarations can even be interspersed among the rules. However,
these declarations must appear before any rule that uses them or
an error will result.

Figure 1 shows the structure of a rules file. The order of the
keyword declarations is according to common style. Note that the
notation {x | y} means either x or y (but not both). The
following specifications apply to the rules file.


PC-KIMMO Reference Manual                                 Page 18

    Figure 1  Structure of the rules file

    ALPHABET <alphabet character list>
    NULL <null symbol>
    ANY <"wildcard" symbol>
    BOUNDARY <word boundary symbol>
    SUBSET <subset name> <subset character list>
    . (more subsets)
    .
    .
    RULE <rule name> <number of states> <number of columns>
         <lexical element list>
         <surface element list>
    <state number>{: | .} <state number list>
    . (more states)
    .
    .
    .  (more rules)
    .
    .
    END

    o Extra spaces, blank lines, and comment lines are ignored.

    o The first line of the file (excluding comment lines) must
contain the keyword ALPHABET.

    o <alphabet character list> is a list of single characters
that make up the combined alphabet of all the characters used in
both lexical and surface representations. Each character must be
separated from the others by at least one space. The list can
span multiple lines, but ends with the next valid keyword. All
alphanumeric characters (such as a, B, and 2), symbols (such as $
and +), and punctuation characters (such as . and ?) are
available as alphabet members. The characters in the IBM extended
character set (above ASCII 127) are also available. Control
characters (below ASCII 32) can also be used, with the exception
of whitespace characters (see above), ^Z (end of file), and ^@
(null). The alphabet can contain a maximum of 255 characters.

    o After the ALPHABET declaration, the NULL, ANY, BOUNDARY,
SUBSET, and RULE declarations can occur in any order.

    o The BOUNDARY declaration is obligatory, even if the rules
do not use a BOUNDARY symbol. This is because the lexicon file
requires a BOUNDARY symbol. The NULL, ANY, and SUBSET
declarations are not obligatory if the rules do not use a NULL
symbol, an ANY symbol, or subsets.

    o The keyword NULL is followed by a <null symbol>, a single
character that represents a null (empty, zero) element. The NULL
symbol is considered to be an alphabetic character, but cannot
also be listed in the ALPHABET declaration. The NULL symbol
declared in the rules file is also used in the lexicon file to
represent a null lexical entry.

    o The keyword ANY is followed by a <"wildcard" symbol>, a
single character that represents a match of any character in the
alphabet. The ANY symbol is not considered to be an alphabetic
character, though it is used in the column headers of state

PC-KIMMO Reference Manual                                 Page 19

tables. It cannot be listed in the ALPHABET declaration. It is
not used in the lexicon file.

    o The keyword BOUNDARY is followed by a <word boundary
symbol>, a single character that represents an initial or final
word boundary. The BOUNDARY symbol is considered to be an
alphabetic character, but cannot also be listed in the ALPHABET
declaration. When used in the column header of a state table, it
can only appear as the pair #:# (where, for instance, # has been
declared as the BOUNDARY symbol). The BOUNDARY symbol is also
used in the lexicon file in the continuation class field of a
lexical entry to indicate the end of a word (that is, no
continuation class).

    o The keyword SUBSET is followed by the <subset name> and
<subset character list>. <subset name> is a single word (one or
more characters) that names the list of characters that follows
it. The subset name must be unique (that is, if it is a single
character it cannot also be in the alphabet or be any other
declared symbol). It can be composed of any characters (except
space); that is, it is not limited to the characters declared in
the ALPHABET section. It must not be identical to any keyword
used in the rules file. The subset name is used in rules to
represent all members of the subset of the alphabet that it
defines. Note that SUBSET declarations can be interspersed among
the rules. This allows subsets to be placed near the rule that
uses them if such a style is desired. However, a subset must be
declared before a rule that uses it.

    o <subset character list> is a list of single characters,
each of which is separated by at least one space. The list can
span multiple lines. Each character in the list must be a member
of the previously defined ALPHABET with the exception of the NULL
symbol, which can appear in a subset list but is not included in
the ALPHABET declaration. Neither the ANY symbol nor the BOUNDARY
symbol can appear in a subset character list.

    o The keyword RULE signals that a state table immediately
follows.

    o <rule name> is the name or description of the rule which
the state table encodes. It functions as an annotation to the
state table and has no effect on the computational operation of
the table. It is displayed by the LIST RULES and SHOW RULE
commands and is also displayed in traces. The rule name must be
surrounded by a pair of identical delimiter characters. Any
material can be used between the delimiters of the rule name with
the exception of the current comment delimiter character and of
course the rule name delimiter character of the rule itself. Each
rule in the file can use a different pair of delimiters. The rule
name must be all on one line, but it does not have to be on the
same line as the RULE keyword.

    o <number of states> is the number of states (rows in the
table) that will be defined for this table. The states must begin
at 1 and go in sequence through the number defined here (that is,
gaps in state numbers are not allowed).

    o <number of columns> is the number of state transitions
(columns in the table) that will be defined for each state.

PC-KIMMO Reference Manual                                 Page 20

    o <lexical character list> is a list of elements separated by
one or more spaces. Each element represents the lexical half of a
lexical:surface correspondence which, when matched, defines a
state transition. Each element in the list must be either a
member of the alphabet, a subset name, the NULL symbol, the ANY
symbol, or the BOUNDARY symbol (in which case the corresponding
surface character must also be the BOUNDARY symbol). The list can
span multiple lines, but the number of elements in the list must
be equal to the number of columns defined for the rule.

    o <surface character list> is a list of elements separated by
one or more spaces. Each element represents the surface half of a
lexical:surface correspondence which, when matched, defines a
state transition. Each element in the list must be either a
member of the alphabet, a subset name, the NULL symbol, the ANY
symbol, or the BOUNDARY symbol (in which case the corresponding
lexical character must also be the BOUNDARY symbol). The list can
span multiple lines, but the number of characters in the list
must be equal to the number of columns defined for the rule.

    o <state number> is the number of the state or row of the
table. The first state number must be 1, and subsequent state
numbers must follow in numerical sequence without any gaps.

    o {: | .} is the final or nonfinal state indicator. This
should be a colon (:) if the state is a final state and a period
(.) if it is a nonfinal state. It must follow the <state number>
with no intervening space.

    o <state number list> is a list of state transition numbers
for a particular state. Each number must be between 1 and the
number of states (inclusive) declared for the table. The list can
span multiple lines, but the number of elements in the list must
be equal to the number of columns declared for this rule.

    o item The keyword END follows all rules and indicates the
end of the rules file. Any material in the file thereafter is
ignored by PC-KIMMO. The END keyword is optional; the physical
end of the file also terminates the rules file.


7.2 Lexicon file

The general structure of the lexicon file is a list of
declarations composed of a keyword followed by data. The set of
valid keywords is ALTERNATION, LEXICON, INCLUDE, and END. The
only required declaration is LEXICON INITIAL; that is, a lexicon
file must minimally be composed of one sublexicon named INITIAL.
The declarations can appear in any order with the exception that
an alternation name used in the continuation class field of a
lexical entry (including a lexical entry in an INCLUDE file) must
first be declared with the ALTERNATION keyword.

Figure 2 shows the structure of a lexicon file. The order of the
keyword declarations is according to common style. Note that the
notation {x | y} means either x or y (but not both). The
following specifications apply to the lexicon file.


PC-KIMMO Reference Manual                                 Page 21

    Figure 2  Structure of the lexicon file

    ALTERNATION <alternation name> <alternation list>
    . (more alternations)
    .
    .
    LEXICON INITIAL
    <lexical item>   {<alternation name> | <BOUNDARY symbol>}
    <gloss>
    . (more lexical entries)
    .
    .
    INCLUDE <filespec>
    . (more include files)
    .
    .
    LEXICON <sublexicon name>
    <lexical item>   {<alternation name> | <BOUNDARY symbol>}
    <gloss>
    . (more lexical entries)
    .
    .
    . (more sublexicons)
    .
    .
    END

    o Extra spaces, blank lines, and comment lines are ignored.

    o The keyword ALTERNATION is followed by an <alternation
name> and an <alternation list>.

    o <alternation name> is a name associated with the following
<alternation list>. It is a word composed of one or more
characters, not limited to the ALPHABET characters declared in
the rules file. An alternation name can be any word other than a
keyword used in the lexicon file. The program does not check to
see if an alternation name is actually used in the lexicon file.

    o <alternation list> is a list of sublexicon names. It can
span multiple lines until the next valid keyword is encountered.
Each sublexicon name in the list must be declared at some point
in the file with the LEXICON keyword. Although it is not enforced
at the time the lexicon file is loaded, an undeclared sublexicon
named in an alternation list will cause an error when a
recognition function tries to use it.

    o The keyword LEXICON is followed by a <sublexicon name> and
a list of lexical entries.

    o <sublexicon name> is the name associated with a sublexicon.
It is a word composed of one or more characters, not limited to
the alphabetic characters declared in the rules file. A
sublexicon name can be any word other than a keyword used in the
lexicon file.

    o In each sublexicon section are lexical entries, each of
which is composed of three parts or fields separated by one or
more spaces. Each lexical entry must all be on one line. The

PC-KIMMO Reference Manual                                 Page 22

three parts are the lexical item, the continuation class, and the
gloss.

    o <lexical item> is one or more characters that represent an
element (typically a morpheme or word) of the lexicon. Each
character must be in the alphabet defined for the language. The
lexical item uses only the lexical subset of the alphabet.

    o {<alternation name> | <BOUNDARY symbol>} fills the
continuation class field of a lexical entry. It must be either an
alternation name or the BOUNDARY symbol declared in the rules
file.

    o <gloss> is a string of text surrounded by a pair of
identical delimiter characters. Whenever the lexical item in the
lexical entry is matched, everything between the delimiters is
appended to the result. If there is no gloss associated with the
lexical item, the gloss field must contain a pair of delimiters
with nothing in between (for example, ""). Any material can be
used between the delimiters of the gloss with the exception of
the current comment delimiter character and of course the gloss
delimiter character of the entry itself. Each lexical entry in
the file can use a different pair of delimiters. The gloss must
be all on one line with the rest of the lexical entry.

    o The INCLUDE keyword is followed by a <filespec> that names
a file containing another lexicon file. This included lexicon
file has the same structure and specifications as the main
lexicon file with the exception that it cannot contain an INCLUDE
declaration; that is, INCLUDE files cannot be nested. Alternation
names and sublexicon names in INCLUDE files must be unique; that
is, not used anywhere else in the lexicon. The END keyword (or
the physical end of the file) will terminate reading of the
included file and return to reading the main lexicon file.

    o The keyword END follows all lexical information and
indicates the end of the lexicon file. Any material in the file
thereafter is ignored by PC-KIMMO. See also the use of the END
keyword in an included file. The END keyword is optional; the
physical end of the file also terminates the lexicon file.


7.3 Generation comparison file

The generation comparison file serves as input to the COMPARE
GENERATE command (see section 5.12). It consists of groupings of
a lexical form followed by one or more surface forms that are
expected to be generated from the lexical form. The following
specifications apply to the generation comparison file.

    o Each form must be on a separate line.

    o Leading spaces are ignored.

    o A blank line (or end of file) indicates the end of a
grouping. Extra blank lines are ignored.

    o The first form in each grouping is the lexical form to be
input to the generator. Its gloss does not have to be included,
since the generator does not use the lexicon; however, including

PC-KIMMO Reference Manual                                 Page 23

a gloss with the lexical form does no harm--it is simply
ignored.

    o Succeeding forms in each grouping are surface forms that
are the expected output of the generator.


7.4 Recognition comparison file

The recognition comparison file serves as input to the COMPARE
RECOGNIZE command (see section 5.12). It consists of groupings of
a surface form followed by one or more lexical forms that are
expected to be recognized from the surface form. The following
specifications apply to the recognition comparison file.

    o Each form must be on a separate line.

    o Leading spaces are ignored.

    o A blank line (or end of file) indicates the end of a
grouping. Extra blank lines are ignored.

    o The first form in each grouping is the surface form to be
input to the recognizer.

    o Succeeding forms in each grouping are lexical forms that
are the expected output of the recognizer. The gloss of a form
follows it on the same line, separated by one or more spaces. The
gloss must match exactly (including spaces) the way it is output
from the recognizer.


7.5 Pairs comparison file

The pairs comparison file serves as input to the COMPARE PAIRS
command (see section 5.12). It consists of pairs of lexical and
surface forms; that is, a lexical form followed by exactly one
surface form. It is expected that the surface form will be
recognized from the lexical form and that the lexical form will
be generated from the surface form. Glosses do not have to be
included with lexical forms, since the generator does not use the
lexicon; however, including a gloss with the lexical form does no
harm--it is simply ignored. When recognizing a surface form, the
lexicon is used to identify the constituent morphemes and verify
that they occur in the correct order, but the gloss part of a
lexical entry is not used. The following specifications apply to
the pairs comparison file.

    o Each form must be on a separate line.

    o Leading spaces are ignored.

    o A blank line (or end of file) indicates the end of a
grouping. Extra blank lines are ignored.

    o The first form of a pair is the lexical form, which is
input to the generator. It is the expected output on inputting
the second (surface) form to the recognizer. The gloss is not
included with the lexical form.


PC-KIMMO Reference Manual                                 Page 24

    o The second form of a pair is the surface form, which is
input to the recognizer. It is the expected output on inputting
the first (lexical) form to the generator.


7.6 Generation file

The generation file consists of a list of lexical forms. It
serves as input to the FILE GENERATE command (see section 5.13),
which returns a file (or screen display) whose format is
identical to the generation comparison file. The following
specifications apply to the generation file.

    o Each form must be on a separate line.

    o Extra white space, blank lines, and comment lines are
ignored.

    o Each form is assumed to be a lexical form. If a gloss is
included, it is ignored.


7.7 Recognition file

The recognition file consists of a list of surface forms. It
serves as input to the FILE RECOGNIZE command (see section 5.14),
which returns a file (or screen display) whose format is
identical to the recognition comparison file. The following
specifications apply to the recognition file.

    o Each form must be on a separate line.

    o Extra spaces, blank lines, and comment lines are ignored.

    o Each form is assumed to be a surface form.


7.8 Summary of default file names and extensions

Figure 3 summarizes the default file names and extensions assumed
by PC-KIMMO. Two entries are given for the different kinds of
files. The first is the name PC-KIMMO will assume if no file name
at all is given to a command that expects that kind of file. The
second entry (with the *) shows what extension PC-KIMMO will add
if a file name without an extension is given.

        Figure 3  Default file names and extensions

        Rules file:                    RULES.RUL
                                           *.RUL
        Lexicon file:                LEXICON.LEX
                                           *.LEX
        Generation comparison file:     DATA.GEN
                                           *.GEN
        Recognition comparison file:    DATA.REC
                                           *.REC
        Pairs comparison file:          DATA.PAI
                                           *.PAI
        Take file:                   PCKIMMO.TAK
                                           *.TAK
        Log file:                    PCKIMMO.LOG


PC-KIMMO Reference Manual                                 Page 25

8 Trace formats

This section explains how to read the output of the generator and
recognizer traces. Traces are produced by the SET TRACING command
described in section 5.6. The amount of detail shown in the trace
display is set by the tracing level. The <level> argument to the
SET TRACING command can range from 0 to 3, where 0 is no tracing
at all and 3 is the most detailed level of tracing.

8.1 Generator trace

The purpose of the generator trace is to allow the user to see
how a lexical form is processed through multiple recursive calls
to the generator. The generator algorithm used to process the
form is described in section 9.1.

        Figure 4  Level 1 generator trace

        `fox+s

           RESULT = 0fox0es

        foxes

There are three levels of tracing differing in the amount of
detail they display:  Level 1 gives the least amount of detail,
level 2 (the default) gives a moderate amount of detail, and
level 3 gives the most detail. Figure 4 is a level 1 generator
trace of the lexical form `fox+s (taken from the English
example). The only difference from no tracing at all is that the
RESULT line is displayed. This line differs from the normal
result that is returned because it prints all NULL symbols in the
output surface form.

Figure 5 is from a level 2 generator trace for the form `fox+s.
To limit the size of the trace, the Gemination rules (14 and 15)
were turned off. Line numbers and column numbers are printed here
for reference in the description that follows. Each description
refers to an element beginning at the line and column indicated.


PC-KIMMO Reference Manual                                 Page 26

Figure 5  Level 2 generator trace

    1  2    3  4  5  6  7  8  9  10 11 12 13 14 15 16 17   18
 1  ` fox+s
 2  0  #:#  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 3  0  `:0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 4  1  f:f  1  1  1  1  1  1  1  2  2  1  1  1  1  1  1    0
 5  2  o:o  1  1  1  1  2  2  1  3  3  1  1  1  1  1  1    0f
 6  3  x:x  1  1  1  1  1  1  1  7  4  2  1  1  1  1  1    0fo
 7  4  +:0  1  1  3  3  2  2  1  4  4  1  1  1  1  1  1    0fox
 8  5  s:s  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
 9  6  #:#  1  1  6  2  2  2  3  3  4  1  1  1  1  1  1    0fox0s
10  6-     BLOCKED BY RULE 3: Epenthesis, 0:0 /<= [S|ch|sh|y:i] +:0___s[+:0|#]
11  5<      1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
12  5  s:0  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
13  5-     BLOCKED BY RULE 7: S-deletion, s:0 <=> +:0 (0:e) s +:0 '___
14  5  0:e  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
15  6  s:s  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    0fox0e
16  7  #:#  1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    0fox0es
17  7       1  1  1  1  1  1  1  3  4  1  1  1  1  1  1    0fox0es
18
19     RESULT = 0fox0es
20
21  6<      1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    0fox0e
22  6  s:0  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    0fox0e
23  6-     BLOCKED BY RULE 4: Epenthesis, 0:e => [S|ch|sh|y:i] +:0___s[+:0|#]
24  6  0:e  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    0fox0e
25  6-     BLOCKED BY RULE 4: Epenthesis, 0:e => [S|ch|sh|y:i] +:0___s[+:0|#]
26  5<      1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
27  4<      1  1  3  3  2  2  1  4  4  1  1  1  1  1  1    0fox
28  4  0:e  1  1  3  3  2  2  1  4  4  1  1  1  1  1  1    0fox
29  4-     BLOCKED BY RULE 4: Epenthesis, 0:e => [S|ch|sh|y:i] +:0___s[+:0|#]
   ...
39  0<      1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
40  0  0:e  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
41  0-     BLOCKED BY RULE 4: Epenthesis, 0:e => [S|ch|sh|y:i] +:0___s[+:0|#]
42  foxes

    o Line 1:  Input line. Lexical form input to the generator
function.

    o Line 19:  RESULT line. Surface form produced by the
generator function. At the point where the input lexical form is
empty and each automaton is in a final state, the trace shows
that the generator has recorded a result. The generator continues
looking for additional results (lines 21 and following).

    o Column 1:  Level number (all lines except 1, 19, and 42).
This represents the level of recursion. Level 0 represents the
initial invocation of the generator. Notice that the number
coincides with the number of characters in the result string so
far.

    o Column 1:  Backtracking indicator (lines 10, 11, 13). The
symbol - indicates that the generator is blocked at that level.
The symbol < indicates that the generator is backtracking (that
is, returning to a lower level to try another path).

    o Column 2:  Input pair (lines 2-9, 12, 14-16). This is the
lexical:surface pair (from the set of feasible pairs) that is

PC-KIMMO Reference Manual                                 Page 27

currently being considered by the generator (for example, f:f on
line 4). The rest of the line shows the results of stepping the
automata with the pair as input. The results are indicated by
either a new state configuration (for example, line 5) or a
BLOCKED BY RULE message (for example, line 10).

    o Lines 10, 13:  BLOCKED BY RULE message. Indicates that a
feasible pair input to the function that steps the automata
caused a rule to fail. Gives the number and name of the rule
(from the header line of the state table) that failed.

    o Columns 3-17:  State configuration (lines 2-9, 11-12,
14-17). These are the current states of each of the rules. The
leftmost number is the state of rule 1, the second is rule 2, and
so on.

    o Column 18:  Result (lines 4-9, 11-12, 14-17). This is the
current value of the result string.

    o Lines 21-41:  The generator continues to backtrack, looking
for other possible paths to a result, until finding no other path
it returns to its initial state.

There is one other tracing message not exemplified in the above
display. This is the END OF INPUT message. It indicates that the
end of the input form has been reached but the generator function
has failed on the rule specified because it was not in a final
state. For example,

        END OF INPUT, FAILED RULE 4: Palatalization

would indicate that when the end of the input form was reached,
rule 4 was not left in a final state.

Figure 6 is part of a level 3 trace for the same form.

Figure 6  Level 3 generator trace

    1  2    3  4  5  6  7  8  9  10 11 12 13 14 15 16 17   18
 1  `fox+s
 2  0  #:#  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 3  0  `:0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 4  1  f:f  1  1  1  1  1  1  1  2  2  1  1  1  1  1  1    0
 5  2  o:o  1  1  1  1  2  2  1  3  3  1  1  1  1  1  1    0f
 6  3  x:x  1  1  1  1  1  1  1  7  4  2  1  1  1  1  1    0fo
 7  4  +:0  1  1  3  3  2  2  1  4  4  1  1  1  1  1  1    0fox
 8  5  s:s  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
 9  6  #:#  1  1  6  2  2  2  3  3  4  1  1  1  1  1  1    0fox0s
10  6-      1  1  0  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?    0fox0s
11         BLOCKED BY RULE 3: Epenthesis, 0:0 /<= [S|ch|sh|y:i] +:0___s[+:0|#]
12  5<      1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
13  5  s:0  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
14  5-      1  1  1  1  1  1  0  ?  ?  ?  ?  ?  ?  ?  ?    0fox0
15         BLOCKED BY RULE 7: S-deletion, s:0 <=> +:0 (0:e) s +:0 '___
16  5  0:e  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    0fox0
17  6  s:s  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    0fox0e
18  7  #:#  1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    0fox0es
19  7       1  1  1  1  1  1  1  3  4  1  1  1  1  1  1    0fox0es
20
21     RESULT = 0fox0es
22
23  foxes


PC-KIMMO Reference Manual                                 Page 28

The level 3 trace differs from the level 2 trace in how it
displays rule failures that block the generator. Compare line 10
in the level 2 trace with lines 10 and 11 of the level 3 trace.
The level 3 trace explicitly shows what state the automata are in
after stepping them. In line 10 of the level 3 trace we can see
that the proposed input pair puts rule 3 in state 0, which means
that it fails. Notice that the rest of the state array is filled
with question marks. This is because if one rule fails the whole
configuration fails, so the rest of the rules are not even tried.
(This shows that even though conceptually the automata operate in
parallel they must still be stepped one at a time).


8.2 Recognizer trace

The purpose of the recognizer trace is to allow the user to see
how a surface form is processed through multiple recursive calls
to the recognizer. The recognizer algorithm used to process the
form is described in section 9.2.

There are three levels of tracing differing in the amount of
detail they display:  level 1 gives the least amount of detail,
level 2 (the default) gives a moderate amount of detail, and
level 3 gives the most detail.

Figure 7 is a level 1 recognizer trace of the surface form foxes
(taken from the English example).

    Figure 7  Level 1 recognizer trace

    foxes
        ENTERING LEXICON INITIAL
        ENTERING LEXICON N_ROOT
        ENTERING LEXICON NUMBER
        ENTERING LEXICON GENITIVE
        ENTERING LEXICON End

      RESULT = `fox+0s   [ N(fox)+ PL ]

        BACKING UP FROM LEXICON End TO LEXICON GENITIVE
        BACKING UP FROM LEXICON GENITIVE TO LEXICON NUMBER
        ENTERING LEXICON GENITIVE
        ENTERING LEXICON End
        BACKING UP FROM LEXICON End TO LEXICON GENITIVE
        BACKING UP FROM LEXICON GENITIVE TO LEXICON NUMBER
        BACKING UP FROM LEXICON NUMBER TO LEXICON N_ROOT
        BACKING UP FROM LEXICON N_ROOT TO LEXICON INITIAL
        ENTERING LEXICON ADJ_PREFIX
    ...
       BACKING UP FROM LEXICON V_ROOT_NEG TO LEXICON V_PREFIX
       BACKING UP FROM LEXICON V_PREFIX TO LEXICON INITIAL
    `fox+s     [ N(fox)+ PL ]

Like the level 1 generator trace, the level 1 recognizer trace
displays the RESULT line but does not show the feasible pairs as
they are tried or the states of the rules. However, it does
display a record of how the recognizer moves through the lexicon,
either with an ENTERING or a BACKING UP message.


PC-KIMMO Reference Manual                                 Page 29

Figure 8 is from a level 2 recognizer trace of the form foxes. To
limit the size of the trace, the Gemination rules (14 and 15)
were turned off. Line numbers and column numbers are printed here
for reference in the description that follows. Each description
refers to an element beginning at the line and column indicated.

Figure 8  Level 2 recognizer trace

    1  2    3  4  5  6  7  8  9  10 11 12 13 14 15 16 17   18
  1 foxes
  2 0  #:#  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  3        ENTERING LEXICON INITIAL
  4        ACCEPTING NULL ENTRY
  5        ENTERING LEXICON N_ROOT
  6 0  `:0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1    [
  7 1  s:0  1  1  1  1  1  1  1  2  2  1  1  1  1  1  1    `   [
  8 1-     BLOCKED BY RULE 7: S-deletion, s:0 <=> +:0 (0:e) s +:0 '___
  9 1  f:f  1  1  1  1  1  1  1  2  2  1  1  1  1  1  1    `   [
 10 2  o:o  1  1  1  1  2  2  1  3  3  1  1  1  1  1  1    `f   [
 11 3  x:x  1  1  1  1  1  1  1  7  4  2  1  1  1  1  1    `fo   [
 12        ENTERING LEXICON NUMBER
 13 4  +:0  1  1  3  3  2  2  1  4  4  1  1  1  1  1  1    `fox   [ N(fox)
 14 5  s:0  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    `fox+   [ N(fox)
 15 5-     BLOCKED BY RULE 7: S-deletion, s:0 <=> +:0 (0:e) s +:0 '___
 16 5  0:e  1  1  5  5  1  1  2  4  4  1  1  1  1  1  1    `fox+   [ N(fox)
 17 6  s:0  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    `fox+0   [ N(fox)
 18 6-     BLOCKED BY RULE 4: Epenthesis, 0:e => [S|ch|sh|y:i] +:0___s[+:0|#]
 19 6  s:s  1  1  1  6  1  1  2  4  4  1  1  1  1  1  1    `fox+0   [ N(fox)
 20        ENTERING LEXICON GENITIVE
 21 7  +:0  1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 22 8-     BLOCKED IN LEXICON GENITIVE, INPUT =
 23 7<      1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 24        ACCEPTING NULL ENTRY
 25        ENTERING LEXICON End
 26        ACCEPTING NULL ENTRY
 27 7  #:#  1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 28 7       1  1  1  1  1  1  1  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 29
 30    RESULT = `fox+0s   [ N(fox)+PL ]
...
108        BACKING UP FROM LEXICON V_ROOT_NEG TO LEXICON V_PREFIX
109 0<      1  1  1  1  1  1  1  1  1  1  1  1  1  1  1    [
110        BACKING UP FROM LEXICON V_PREFIX TO LEXICON INITIAL
111 0<      1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
112 `fox+s     [ N(fox)+PL ]

    o Line 1:  Input line. Surface form input to the recognizer
function.

    o Line 30:  RESULT line. At the point where there are no
lexicons in the continuation class of an entry, the input surface
form is empty, and each automaton is in a final state, the trace
shows that the recognizer has recorded a result. The recognizer
continues looking for additional results (lines 32 and
following).

    o Column 1:  Level number (lines 2, 6-11, 13-19, 21-23,
27-28). This represents the level of recursion. Level 0
represents the initial invocation of the recognizer. Notice that
the number coincides with the number of characters in the result
string so far.

PC-KIMMO Reference Manual                                 Page 30

    o Column 1:  Backtracking indicator (lines 8, 15, 18, 22-23).
The symbol - indicates that the recognizer is blocked at that
level. The symbol < indicates that the recognizer is backtracking
(that is, returning to a lower level to try another path).

    o Column 2:  Input pair (lines 2, 6-7, 9-11, and so on). This
is the lexical:surface pair (from the set of feasible pairs) that
is currently being considered by the recognizer (for example, f:f
on line 9). The results of stepping the automata with the pair as
input are indicated by either a new state configuration (for
example, line 10) or a BLOCKED BY RULE message (for example, line
15).

    o Lines 3, 5, 12, 20, 25:  ENTERING LEXICON message. This is
the name of the sublexicon that the recognizer is about to
search.

    o Lines 4, 24, 26:  ACCEPTING NULL ENTRY message. Indicates
that a null lexical entry (that is, an entry whose lexical item
is the NULL symbol) has been accepted.

    o Line 22:  BLOCKED IN LEXICON message. Indicates that no
lexical entry could be found in the current lexicon that
continues with the input pair under consideration. The remaining
part of the input form is displayed on the line (in line 22 it
happens that nothing is left of the input form).

    o Lines 108, 110:  BACKING UP message. Indicates that there
were no further sublexicons left in the continuation class, so
the recognizer must back up to the previous lexicon branch.

    o Lines 8, 15, 18:  BLOCKED BY RULE message. Indicates that a
feasible pair input to the function that steps the automata
caused a rule to fail. Gives the number and name of the rule
(from the header line of the state table) that failed.

    o Columns 3-17:  State configuration (lines 2, 6-7, 9-11, and
so on). These are the current states of each of the rules. The
leftmost number is the state of rule 1, the second is rule 2, and
so on.

    o Column 18:  Result (lines 6-7 and so on). This is the
current value of the result string.

    o Lines 108-111:  The recognizer continues to backtrack,
looking for other possible paths to a result, until finding no
other path it returns to its initial state.

The END OF INPUT message may also occur in a recognizer trace.
See section 8.1 on the generator trace for an explanation of it.


PC-KIMMO Reference Manual                                 Page 31

Figure 9  Level 3 recognizer trace

    1  2    3  4  5  6  7  8  9  10 11 12 13 14 15 16 17   18
  1 foxes
  2 0  #:#  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  3        ENTERING LEXICON INITIAL
  4 0- -:0 LEXICAL CHARACTER NOT MATCHED
  5 0- `:0 LEXICAL CHARACTER NOT MATCHED
  6 0- +:0 LEXICAL CHARACTER NOT MATCHED
  7 0- s:0 LEXICAL CHARACTER NOT MATCHED
  8 0- e:0 LEXICAL CHARACTER NOT MATCHED
  9 0- f:f LEXICAL CHARACTER NOT MATCHED
 10        ACCEPTING NULL ENTRY
 11        ENTERING LEXICON N_ROOT
 12 0- -:0 LEXICAL CHARACTER NOT MATCHED
 13 0  `:0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1    [
 14 1- -:0 LEXICAL CHARACTER NOT MATCHED
 15 1- `:0 LEXICAL CHARACTER NOT MATCHED
 16 1- +:0 LEXICAL CHARACTER NOT MATCHED
 17 1  s:0  1  1  1  1  1  1  1  2  2  1  1  1  1  1  1    `   [
 18 1-      1  1  1  1  1  1  0  ?  ?  ?  ?  ?  ?  ?  ?    `   [
 19        BLOCKED BY RULE 7: S-deletion, s:0 <=> +:0 (0:e) s +:0 '___
 ...
 75        ACCEPTING NULL ENTRY
 76 7  #:#  1  1  4  7  2  2  3  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 77 7       1  1  1  1  1  1  1  3  4  1  1  1  1  1  1    `fox+0s   [ N(fox)+PL
 78
 79    RESULT = `fox+0s   [ N(fox)+PL ]
 80
 81 `fox+s     [ N(fox)+PL ]

Figure 9 is part of a level 3 trace for the same form. Like level
3 of the generator trace, level 3 of the recognizer trace
explicitly shows the state array when a rule fails. Compare line
8 of the level 2 trace with lines 18 and 19 of the level 3 trace.
In addition, the level 3 recognizer trace shows pairs that are
weeded out by the lexicon even before they are tried with the
rules. Compare lines 3-4 of the level 2 trace with lines 3-10 of
the level 3 trace. In lines 4-9 the level 3 trace shows
explicitly several pairs that are tried but immediately fail.
Since the recognizer is at the beginning of the input form, the
only possible feasible pairs to try are those whose surface
character is 0 (the NULL symbol) or f (the first character of the
input form foxes). Rather than trying each of these pairs with
the rules, the recognizer first looks to see if the lexical
character of each pair matches any lexical character available in
the sublexicon it is currently searching. In each case the match
fails, indicated by the message LEXICAL CHARACTER NOT MATCHED.
After trying all the pairs, the lexicon accepts the null entry
and enters a new sublexicon. This exhaustive process takes place
at each point in the recognition process where the recognizer is
trying a new pair.


PC-KIMMO Reference Manual                                 Page 32

9 Algorithms

The algorithms used by PC-KIMMO to generate surface forms and
recognize lexical forms are based on descriptions in Karttunen
1983.

9.1 Generating surface forms

The generator function recursively computes surface forms from a
lexical form using a set of two-level rules expressed as finite
state automata. The generator function does not make use of the
lexicon. This means that it will accept input forms that are not
found in the lexicon or that even violate the lexicon's
constraints on morpheme order, and will still apply the
phonological rules to them. To produce a surface form from a
lexical form, the generator processes the input form one
character at a time, left to right. For each lexical character,
it tries every surface character that has been declared as
corresponding to it in a feasible pair sanctioned by the
description. The generator function has these inputs:

  Lexical form:  Initially the input form, this string contains
whatever is left to process. As the function is recursively
called, this string gets shorter as the result string gets
longer.

  Result:  Initially empty, this string contains the results of
the generator up to the point of the current function call.

  Rules:  This is the set of active finite state automata defined
for this language.

  Configuration:  This is an array representing the current state
of all rules (automata). Initially, all states are set to 1.

The generator function also uses a list of feasible pairs
sanctioned by the set of rules; these are all the lexical:surface
pairs of alphabetic characters that appear as column headers in
the state tables. The input pair is a feasible pair selected by
the generator as a possible next lexical:surface pair in the
process of computing a surface form that corresponds to the given
lexical form. Each time the generator is called it iteratively
goes through the list of feasible pairs, selecting one as the
input pair.

The generator algorithm works as follows:

  1. If the lexical form is empty (that is, there are no more
characters in it to process), do the following steps:

     (a) If any of the state tables contains a word boundary
column header, step the automata using an input pair consisting
of the BOUNDARY symbol as both the lexical and surface character.
If this fails, then the result is rejected and the function
returns to the previous level.

     (b) Check that the configuration array contains a valid
final state for each of the rules. If so, then the result is
accepted and added to the output list. Otherwise, it is rejected.
In either case, the function returns to the previous level.


PC-KIMMO Reference Manual                                 Page 33

Otherwise, if the lexical form is not empty (that is, it contains
more characters to process), do steps 2 and 3.

  2. For each input pair containing the first character in the
lexical form as the lexical character, do the following steps:

     (a) Step the automata using the input pair and the input
configuration array, producing a new configuration.

     (b) If this succeeds, recursively call the generator
function with these inputs:

         Lexical form:  This is the input lexical form with the
first character removed.

         Result:  This is the input result string with the
surface character from the current input pair appended.

         Configuration:  This is the state array produced by
stepping the automata.

     (c) If this fails, choose another input pair from the list
of feasible pairs and do either step 2 or step 3.

  3. For each input pair containing the NULL symbol as the
lexical character, do the following steps:

     (a) Step the automata using the input pair and the input
configuration array to produce a new configuration.

     (b) If this succeeds, recursively call the generator
function with these inputs:

         Lexical form:  This is the input lexical form with no
character removed (since the lexical character posited was NULL).

         Result:  This is the input result string with the
surface character from the current input pair appended.

         Configuration:  This is the state array produced by
stepping the automata.

     (c) If this fails, choose another input pair from the list
of feasible pairs and do either step 2 or step 3.


9.2 Recognizing lexical forms

The recognizer function recursively computes lexical forms from a
surface form using a lexicon and a set of two-level rules
expressed as finite state automata. The recognizer function
operates in a way similar to the generator, only in a surface to
lexical direction. The recognizer processes the surface input
form one character at a time, left to right. For each surface
character, it tries every lexical character that has been
declared as corresponding to it in a feasible pair sanctioned by
the description.

The recognizer also consults the lexicon. The lexical items
recorded in the lexicon are structured as a letter tree. When the

PC-KIMMO Reference Manual                                 Page 34

recognizer tries a lexical character, it moves down the branch of
the letter tree that has that character as its head node. If
there is no branch starting with that letter, the lexicon blocks
further progress and forces the recognizer to backtrack and try a
different lexical character. For example, figure 10 is a letter
tree for the lexical items spiel, spit, spy, and sty.

        Figure 10  A lexical letter tree

                          +-----e-----l
                          |
                          |
                  +-----i-+
                  |       |
                  |       |
          +-----p-+       +-----t
          |       |
          |       |
        s-|       +-----y
          |
          |
          +-----t-----y


Besides applying the phonological rules and identifying
morphemes, the recognizer also must enforce morpheme order
constraints. The PC-KIMMO lexicon is divided into classes of
lexical items that behave alike with respect to order
constraints. These lexical classes are called sublexicons. The
entry for each lexical item specifies the name of the sublexicon
that can follow it. This following sublexicon is called a
continuation class. Lexical items that occur only at the end of a
word have no continuation class, indicated by the BOUNDARY
symbol.

The names of the sublexicons that make up the entire lexicon are
used as nodes at the head of branches of the letter tree. The
piece of a letter tree shown in figure 10 may actually be under a
branch node called Noun. When the recognizer successfully finds a
lexical item in the letter tree, it looks at its specified
continuation class and jumps to the branch of the lexicon it
names.

It is often the case that at a given point in a word, more than
one continuation is possible. Sets of alternative continuing
sublexicons are called alternations. Thus the continuation class
field of a lexical entry may contain the name of an alternation
that specifies a list of the sublexicons that can follow it.

When the recognizer successfully recognizes a lexical item (word
or morpheme), it reads its gloss from its lexical entry and
appends it to the gloss string being built up for the entire
word.

The recognizer function has these inputs:

  Surface form:  Initially the input form, this string contains
whatever is left to process. As the function is recursively
called, this string gets shorter as the result string gets
longer.


PC-KIMMO Reference Manual                                 Page 35

  Result:  Initially empty, this string contains the results of
the recognizer up to the point of the current function call.

  Gloss:  Initially empty, this string contains glosses for the
lexical items contained in the result string.

  Rules:  This is the set of active finite state automata defined
for this language.

  Configuration:  This is an array representing the current state
of all rules (automata). Initially, all states are set to 1.

  Lexicon:  Initially, this is the entire lexicon defined for the
language. During the process of recognition it is restricted to a
branch of the lexicon.

Like the generator, the recognizer function uses a list of
feasible pairs sanctioned by the set of rules; these are all the
lexical:surface pairs of alphabetic characters that appear as
column headers in the state tables. The input pair is a feasible
pair selected by the recognizer as a possible next
lexical:surface pair in the process of computing a lexical form
that corresponds to the given surface form. Each time the
recognizer is called it iteratively goes through the list of
feasible pairs, selecting one as the input pair.

When a complete lexical item has been recognized, the lexicon is
at a terminal node of the letter tree. Terminal nodes have
glosses and continuation classes attached to them. The recognizer
algorithm is initialized as though it has successfully recognized
a lexical item and the lexicon is at a terminal node pointing to
a continuation class consisting of the INITIAL sublexicon. It
then proceeds as follows:

  1. If the input lexicon is at a terminal node, then for each
sublexicon in the continuation class of that item, recursively
call the recognizer function with these inputs:

     Surface form:  This string contains whatever is left to
process.

     Result:  This string contains the results of the recognizer
up to the point of the current function call.

     Gloss:  This is the input gloss string with the gloss of the
current lexical entry appended.

     Rules:  This is the input set of rules.

     Configuration:  This is the input configuration.

     Lexicon:  This is the current continuation sublexicon.

If the continuation class of the lexical entry is empty (that is,
the lexical item can only be followed by word boundary) and the
input surface form is empty, do the following steps:

     (a) If any of the state tables contains a word boundary
column header, step the automata using an input pair consisting
of the BOUNDARY symbol as both the lexical and surface character.

PC-KIMMO Reference Manual                                 Page 36

If this fails, then the result is rejected and the function
returns to the previous level.

     (b) Check that the configuration array contains a valid
final state for each of the rules. If so, then the result is
accepted, the gloss of the lexical entry is appended to the
gloss, and both the result and the gloss are added to the output
list. Otherwise, the result is rejected. In either case, the
function returns to the previous level.

If the continuation class of the lexical entry is empty but the
surface form is not empty, the result is rejected and the
function returns to the previous level.

  2. For each input pair that has the head of a branch in the
lexicon as the lexical character and the first character of the
surface form as the surface character, do the following steps:

     (a) Step the automata using the input pair and the input
configuration array to produce a new configuration.

     (b) If this succeeds, recursively call the recognizer
function with these inputs:

         Surface form:  This is the input surface form with the
first character removed.

         Result:  This is the input result string with the
lexical character from the current input pair appended.

         Gloss:  This is the input gloss string.

         Rules:  This is the input set of rules.

         Configuration:  This is the state array produced by
stepping the automata.

         Lexicon:  This is the branch of the lexicon
corresponding to the lexical character from the current input
pair.

  3. For each input pair that has the head of a branch in the
lexicon as the lexical character and has the NULL symbol as the
surface character, do the following steps:

     (a) Step the automata using the input pair and the input
configuration array to produce a new configuration   (b) If this
succeeds, recursively call the recognizer function with these
inputs:

         Surface form:  This is the input surface form.

         Result:  This is the input result string with the
lexical character from the current input pair appended.

         Gloss:  This is the input gloss string.

         Rules:  This is the input set of rules.


PC-KIMMO Reference Manual                                 Page 37

         Configuration:  This is the state array produced by
stepping the automata.

         Lexicon:  This is the branch of the lexicon
corresponding to the lexical character from the current input
pair.

  4. If the NULL symbol is the head of a branch of the lexicon
(that is, a null lexical entry), recursively call the recognizer
function with these inputs:

     Surface form:  This is the input surface form.

     Result:  This is the input result string.

     Gloss:  This is the input gloss string.

     Rules:  This is the input set of rules.

     Configuration:  This is the input state array.

     Lexicon:  This is the branch of the lexicon which has the
NULL symbol as its head.


10 Error messages

This section lists the various error and warning messages you may
encounter. They are listed in numerical sequence and are
generally grouped according to the type of error or warning. A
warning means that the operation in progress has successfully
completed, but an anomalous condition may have resulted. An error
means that the operation in progress could not be successfully
completed and was therefore prematurely terminated. Only in the
case of a memory error is the PC-KIMMO program aborted and
control returned to the operating system. Note that in the
following error messages the words printed in angled brackets are
not literal but are cover terms for a set of items of the type
suggested by the term. For instance, when the error message
"Missing keyword in <command-name> command" actually appears on
the computer screen, the term <command-name> will be replaced by
a specific command name, such as LOAD or SET.

10.1 Errors related to reading and parsing commands


WARNING 100
    Input line too long -- ignoring after first <number>
characters

ERROR 101
    Ambiguous command: <command-name>

    <command-name> did not specify a unique command. Type more of
the command name to insure that it is not ambiguous.

ERROR 102
    Invalid command: <command-name>

    <command-name> is not a valid command. Type ? or HELP for a
list of valid commands.


PC-KIMMO Reference Manual                                 Page 38

ERROR 103
    Missing keyword in <command-name> command

    Expected a keyword to be used with the command. Type the
command name followed by ? for a list of valid keywords.

ERROR 104
    Missing argument in <command-name> command

    Expected an argument to complete the command. Type HELP
followed by the command name for an explanation of what arguments
the command needs.

ERROR 105
    Ambiguous keyword in <command-name> command: <keyword>

    <keyword> did not specify a unique keyword. Type more of the
keyword to insure that it is not ambiguous.

ERROR 106
    Invalid keyword in <command-name> command: <keyword>

    <keyword> is not a valid keyword. Type the command name
followed by ? for a list of valid keywords for that command.

ERROR 107
    Invalid argument in <command-name> command: <argument>

    <argument> was not valid for the command. Type HELP followed
by the command name for an explanation of what arguments the
command needs.

ERROR 108
    Missing input file argument in <command-name> command

    Expected a file name with the command.

ERROR 109
    Cannot open input file <filename> in <command-name> command

    Cannot find the file <filename>. Check to see if the file is
in the current directory or the path you specified in the
command. The command may also be expecting a different default
file name or extension.

ERROR 110
    Cannot open output file <filename> in <command-name> command

    Check to see if the file is in the current directory or in
the path you specified in the command. The command may also be
expecting a different default file name or extension.

ERROR 111
    Must load rules before loading lexicon

    The rules file must be loaded before the lexicon in order to
verify the lexical forms in the lexicon against the alphabet
defined in the rules file.


PC-KIMMO Reference Manual                                 Page 39

ERROR 112
     TAKE files nested too deeply

     TAKE files can only be nested three deep.

ERROR 113
     TAKE file aborted due to invalid command: <command-name>

    <command-name> is not a valid command. Type ? or HELP for a
list of valid commands.

ERROR 114
    No log file was open

    Result of issuing the  CLOSE command when no log file has
been opened.

WARNING 115
    Closing the existing log file <filename>

    Occurs when the  LOG command is issued when a log file is
already open.

ERROR 116
    Missing file name for  EDIT command

    EDIT command must specify a file to be edited.


10.2 Errors related to reading the rules file

ERROR 200
    Rules file could not be opened: <filename>

    Check to see if the file is in the current directory or in
the path you specified in the command. The command may also be
expecting a different default file name or extension.

ERROR 201
    Unexpected end of rules file: <filename>

    The rules file is incomplete. Check to see if the last table
in the file has fewer states than expected.

ERROR 202
    Expected ALPHABET keyword

    The first declaration in a rules file must be the ALPHABET
declaration.

ERROR 203
    Alphabet contains no members

    The ALPHABET keyword does not have any characters listed
after it.

WARNING 204
    Too many characters in the alphabet

    The alphabet can contain a maximum of 255 characters.


PC-KIMMO Reference Manual                                 Page 40

WARNING 205
    Character is already in the alphabet: <character>

    A character has been repeated in the ALPHABET declaration.

ERROR 206
    No value given for NULL keyword

    A single character must appear after the NULL keyword.

ERROR 207
    Value given for NULL symbol was already declared as
alphabetic: <character>

    The character specified for NULL may not also be declared in
the ALPHABET.

ERROR 208
    The NULL symbol has already been defined

    There is more than one NULL declaration.

ERROR 209
    Value given for NULL symbol was already declared for ANY

ERROR 210
    Value given for NULL symbol was already declared for BOUNDARY

ERROR 211
    No value given for ANY keyword

    A single character must appear after the ANY keyword.

ERROR 212
    Value given for ANY symbol was already declared as
alphabetic: <character>

    The character specified for ANY may not also be declared in
the ALPHABET.

ERROR 213
    The ANY symbol has already been defined

    There is more than one ANY declaration.

ERROR 214
    Value given for ANY symbol was already declared NULL

ERROR 215
    Value given for ANY symbol was already declared for BOUNDARY

ERROR 216
    No value given for BOUNDARY keyword

    A single character must appear after the BOUNDARY keyword.


PC-KIMMO Reference Manual                                 Page 41

ERROR 217
    Value given for BOUNDARY symbol was already declared as
alphabetic: <character>

    The character specified for BOUNDARY may not also be declared
in the ALPHABET.

ERROR 218
    The BOUNDARY symbol has already been defined

    There is more than one BOUNDARY declaration.

ERROR 219
    Value given for BOUNDARY symbol was already declared for NULL

ERROR 220
    Value given for BOUNDARY symbol was already declared for ANY

ERROR 221
    Subset name not given

    Occurs if there is a SUBSET keyword with nothing after it
until the next keyword.

ERROR 222
    Subset name <subset-name> is not unique

    A subset name, if it is a single character, cannot be the
same as one of the characters specified in the ALPHABET, NULL,
ANY, or BOUNDARY declarations. If the subset name is more than
one character, then it is a duplicate of another subset name
already declared.

ERROR 223
    Subset <subset-name> contains no members

ERROR 224
    Subset <subset-name> contains a nonalphabetic character:
<character>

    All characters used in subsets must be listed in the ALPHABET
declaration, with the exception of the NULL symbol, which can
appear in a subset but is not included in the ALPHABET list.

WARNING 225
    Subset <subset-name> already contains <character>

    A character has been repeated.

ERROR 226
    Invalid keyword: <keyword>

    The only valid keywords in a rules file are ALPHABET, NULL,
ANY, BOUNDARY, SUBSET, and RULE.

WARNING 227
    ANY symbol not defined

    Are you sure the rules do not use an ANY symbol?


PC-KIMMO Reference Manual                                 Page 42

WARNING 228
    NULL symbol not defined

    Are you sure the rules do not use a NULL symbol?

WARNING 229
    BOUNDARY symbol not defined

    The BOUNDARY declaration is obligatory. Even if the BOUNDARY
symbol is not used in the rules file, it must be used in the
lexicon file.

WARNING 230
    Missing closing delimiter for the name of a rule: <rule-name>

    The first nonspace character after the RULE keyword is the
opening delimiter of the rule name. A matching delimiter
(identical character) was not found in the same line; thus
PC-KIMMO will use everything up to the end of the line as the
rule name. This is because the rule name must be contained in one
line.

ERROR 231
    Invalid number of rows: <number>

    Must be a number greater than zero.

ERROR 232
    Invalid number of columns: <number>

    Must be a number greater than zero.

ERROR 233
    Invalid state number: <number>

    State (row) numbers must start with 1 and ascend
consecutively.

ERROR 234
    Expected final (:) or nonfinal (.) state indicator:
<character>

    A state (row) number must be followed by colon (:) or period
(.) with no intervening space.

ERROR 235
    State table entry out of range: <number>

    <number> must not be greater than the specified number of
states for the table.

ERROR 236
    Lexical character not in alphabet: <character>

    A character in a table's lexical character list is not a
member of the alphabet declared earlier in the rules file.


PC-KIMMO Reference Manual                                 Page 43

ERROR 237
    Surface character not in alphabet: <character>

    A character in a table's surface character list is not a
member of the alphabet declared earlier in the rules file.

ERROR 238
    Nonnumeric character in state table: <character>

    Expected a numeric state table entry but found a nonnumeric
character.

ERROR 239
    Rule number <number>, column <number> pairs a BOUNDARY symbol
with something else: <column-header>

    Occurs if a column header consists of a BOUNDARY symbol is
paired with anything but another BOUNDARY symbol; only #:# is
allowed.

WARNING 240
    No feasible pairs for this set of rules

    Either there are no rules in the file or the rules contain
only subset correspondences. In the latter case, simple rules
listing all the default correspondences are needed.

WARNING 241
    RULE <number> (<rule-name>) -- <char>:<char> specified by
both columns <number> (<char>:<char>) and <number>
(<char>:<char>)

    There is an overlap between two columns of the state table.
Issue a SHOW RULE command for the rule causing the warning and
examine the set of pairs specified by each column header.

WARNING 242
    RULE <number> (<rule name>) -- <char>:<char> not specified by
any column

    The entire set of feasible pairs must be specified by each
table. The table is probably missing an ANY:ANY column.

ERROR 243
    Rule number <number>, column <number> pairs two NULL symbols:
<column-header>

    NULL:NULL is not a legal column header, since it cannot be a
feasible pair.


10.3 Errors related to reading the lexicon file

ERROR 300
    Lexicon file could not be opened: <filename>

    Check to see if the file is in the current directory or in
the path you specified in the command. The command may also be
expecting a different default file name or extension.


PC-KIMMO Reference Manual                                 Page 44

ERROR 301
    No data in lexicon file <filename>

ERROR 302
    Missing alternation name

    The ALTERNATION keyword must be followed by an alternation
name.

WARNING 303
    Empty alternation definition: <alternation-name>

    An ALTERNATION keyword was found with no following
alternation name or list of lexicon names.

WARNING 304
    Adding to existing alternation: <alternation-name>

ERROR 305
    No lexicon sections in lexicon file <filename>

    A lexicon file must contain sublexicons.

ERROR 306
    Missing lexicon name

    The keyword LEXICON must be followed by a sublexicon name.

WARNING 307
    Lexicon section <sublexicon-name> is not listed as a member
of any alternations

    This will not necessarily result in a processing error if
this is what you intended to do.

ERROR 308
    Expected continuation class or BOUNDARY symbol for <entry>

    A lexical entry is missing its continuation class element.

ERROR 309
    Invalid continuation class <name> for <entry>

    A name appearing in the continuation class field of a lexical
entry must be the name of an ALTERNATION that has already been
declared.

ERROR 310
    Expected gloss element for <entry>

    Each lexical entry must have a gloss element.

ERROR 311
    Invalid gloss element <gloss> for <entry>

    The gloss element must be bracketed by matching delimiters
(identical characters).


PC-KIMMO Reference Manual                                 Page 45

ERROR 312
    Form contains character not in alphabet: <character>

    Each character used in lexical items must be listed in the
ALPHABET declaration of the rules file.

ERROR 313
     INITIAL lexicon not found

    A lexicon file must as a minimum have a sublexicon named
INITIAL.

ERROR 314
    Cannot nest lexicon INCLUDE files

    An INCLUDE file cannot call another INCLUDE file.

ERROR 315
    Missing INCLUDE file name

    An INCLUDE keyword must be followed by a file name.

ERROR 316
    Lexicon INCLUDE file could not be opened: <filename>

ERROR 317
    Invalid lexicon file keyword: <word>

    The only valid keywords in a lexicon file are ALTERNATION,
LEXICON, INCLUDE, and END.


10.4 Errors related to recognizing or generating a form

WARNING 400
    Surface form not found in comparison pairs file

    A lexical:surface pair in a pairs comparison file is missing
the surface form.

ERROR 800
    Form [ <form> ] contains character not in alphabet:
<character>

    An input form contains a character that was not listed in the
ALPHABET declaration in the rules file.

ERROR 801
    RULE <number> is invalid--input <char>:<char> is not
specified by any column

    Could happen if a table does not have an ANY:ANY column.

ERROR 802
    Invalid lexicon for recognizer

    Probably will never occur!


PC-KIMMO Reference Manual                                 Page 46

ERROR 803
    Lexicon section <sublexicon-name> is empty

    There are no lexical entries in the named sublexicon.

ERROR 804
    Cannot recognize forms without a lexicon

    The lexicon is not loaded.


10.5 Errors that abort program execution

ERROR 900
    Out of memory

    The rules and lexicon are too large to fit in memory.

Runtime error--stack overflow

    Occurs when the generator or recognizer gets into an infinite
loop due to an incorrectly written rule or lexicon continuation.


References

Antworth, Evan L. 1990. PC-KIMMO: a two-level processor for
    morphological analysis. Occasional Publications in Academic
    Computing No. 16. Dallas, TX: Summer Institute of Linguistics.
    ISBN 0-88312-639-7, 273 pages, paperbound.

Karttunen, Lauri. 1983. KIMMO: a general morphological processor.
    Texas Linguistic Forum 22:163-186.

_____ and K. Wittenburg. 1983. A two-level morphological
    analysis of English. Texas Linguistic Forum 22:217-228.

Koskenniemi, Kimmo. 1983. Two-level morphology: a general
    computational model for word-form recognition and production.
    Publication No. 11. University of Helsinki: Department of
    General Linguistics.


Errata

The generator algorithm described in section 9.1 (pages 32-33) is
slightly misleading.  Step 3 (testing all feasible pairs
containing a NULL lexical character, and recursively invoking the
algorithm for each pair that successfully steps the automata)
should be carried out even when the lexical form is empty.  In
other words, Step 3 actually takes place before Step 1.

This reflects a bug in the implementation that was partially
fixed in version 1.0B, and fully fixed in version 1.0.3 of
PC-KIMMO.