SYNTAXSCAPE
A syntactic parser for the Internet
Juno R Suk (junowhoim@yahoo.com)
__________________________________________________________________
TUTORIAL CONTENTS
PURPOSE
STARTING THE PROGRAM
DESCRIPTION OF GRAPHICAL USER INTERFACE
LOADING AND SAVING A GRAMMAR
LOADING AND SAVING A CORPUS
GRAPHICAL VIEW
PARSE INTERFACE
GRAMMAR EDITOR
LEXICON EDITOR
CORPUS EDITOR
HISTORY CACHE
HYPERLINKING
PREFERENCE EDITOR
__________________________________________________________________
PURPOSE
This program, first and foremost, serves as a didactic tool
in visually demonstrating the concepts of syntactic parsing and
context-free grammars. Through an intuitive graphical user
interface, students can experiment with creating grammars and
lexicons, and inputting or retrieving sentences for syntactic
parsing based on the created grammar and lexicon.
The program provides functions for loading, editing, and saving
grammars and lexicons, retrieving corpii through either a URL,
the local file system, or a history cache, editing the corpus
and automatically saving the corpus in its current state to the
history cache, selectively parsing the contents of the corpus,
and viewing and printing the syntactic parse trees in one of two
available graphical views.
__________________________________________________________________
STARTING THE PROGRAM
You can start the program in one of two ways. The easiest way is
to just use the provided shell scripts to start the program.
The other way is to type in the actual java commandline.
The syntax for starting the program is
java -D_APP_HOME_DIR=
-D_DEBUG=
Synscape
An example of this syntax is found in the included shell/batch scripts:
(for UNIX)
sscape
testrun
(for MS-DOS)
sscape.bat
testrun.bat
__________________________________________________________________
DESCRIPTION OF GRAPHICAL USER INTERFACE
The GUI main screen is divided up into 5 parts. From top to
bottom, they are:
1. Menubar
2. Main operations Panel
3. URL Input Panel
4. Graphical Canvas
5. Parse Interface
The menubar provides access to many functions already
available in the interface. Also included in the menubar
are items such as loading and saving a grammar and viewing
this tutorial.
The main operations panel provides access to some of the
more commonly used functions. These include: loading a corpus
through either a local file or the history cache, opening
editors for modifying the grammar/lexicon/corpus, opening the
preference window for configuring program options.
The URL input panel provides a textfield in which to directly
type in new URL locations to load corpii.
The graphical canvas provides the view to the syntactic parse.
The view is either available in box or tree format which
can be specified through an option in the...
Parse interface includes the aforemention option as well as
a print function for printing out the current graphical canvas,
and arrows for traversing the different parses and sentences.
A reparse of the current sentence is also an option. At the
bottom of this interface is a status bar which keeps you up
to date on some important info- current sentence, current
parse, current view, maximum depth allowed on parse trees, and
a current action message box.
__________________________________________________________________
LOADING AND SAVING A GRAMMAR
A default grammar is usually loaded automatically upon program
start from a default file, but the user also has the option of
specifying an alternate grammar upon start up of program when
issuing the java command through command-line arguments (see
above, STARTING THE PROGRAM), or can load up a new grammar later
through the menubar under menu File.
Note: The grammar file should end in an extension ".grammar".
The default grammar folder is APP_HOME_DIR/GRAMMARS
Saving the grammar is done automatically during two events:
1. User loads up a new grammar
2. User shuts down program
If you wish to save the grammar under a different name, this
can be done by selecting "Save Grammar" under the File menu.
This will save the file under the name and directory of your
choice. It will automatically append the extension ".grammar"
if you did not do so already in the file dialog.
__________________________________________________________________
LOADING AND SAVING A CORPUS
You have four ways of loading a corpus.
1. By specifying an initial corpus url in the "java" command
line (see above STARTING THE PROGRAM).
2. Enter URL into the URL: textfield and press
3. Click on the "Local" icon and select a file from
your local file system.
4. Click on the "History" icon and select one of the
previously viewed pages.
Currently, corpii are assumed to be in plain text form and
are parsed as such. HTML files may be loaded as corpus but
HTML parsing is minimal and the current filter will probably
let many interesting non-sensical tags and phrases show up.
Saving a corpus is done automatically by the program upon the
following events:
1. User requests a new corpus
2. User shuts down program
__________________________________________________________________
GRAPHICAL VIEW
This is the area where the syntactially parsed sentence is
graphically shown. The parse tree can be displayed in three
ways.
1. Lexeme Only
This view automatically is shown when the current
sentence selected in the parse list is the top,
unparsed form.
2. Tree View
This is the default view. It shows each of
the tokens as a node in the tree and delineates
all the appropriate branches to its children
and parent.
3. Box View
This view, selected by clicking on the Tree/Box
view toggle button on the parse interface, will
switch the view from nodes and branches to an
overlapping boxes view. Each child token's box
is encompassed in its entirety by its parent's
box. And conversely, all a token's children are
encompassed by its box.
__________________________________________________________________
PARSE INTERFACE
The parse interface helps you navigate through the individual
sentences and their parses.
The Next Sentence and Previous Sentence buttons will move
you through all the sentences currently loaded in the corpus.
The Next Parse and Previous Parse buttons will move you through
all the parses currently available for the selected sentence.
The desired parse can also be directly selected by clicking on
the appropiate list item right below this bar.
The Reparse button will parse the current selected sentence.
By default, the sentences will not be parsed initially, so you
will either have to parse them by clicking this button or
by clicking on the Parse button in the Corpus Editor.
The Tree/Box View button will toggle your view between the
corresponding views, as aforementioned in the Graphical
View section. If the current selected parse is the top item
on the parse list, then there is no Tree or Box view available
since the top list item is always the non-parsed, original
sentence.
The Print button will send a print job of the current parse tree
image. Note that the print image will probably differ from the
one shown in the program. This is because the print image is
recalculated to fit-to-size in the printer document dimensions.
Sometimes, the image resulting from this recalculation is
extremely different in appearance, though the structure is
retained.
__________________________________________________________________
GRAMMAR EDITOR
Gives a listing of all the grammar definitions in the currently
loaded grammar. There are options here for the user to modify
this list by either adding, updating, or deleting from this list.
The operations here are intuitive. Be careful that all inputs
here are valid and intended. Any change in the grammar will be
saved to disk.
__________________________________________________________________
LEXICON EDITOR
Gives a listing of all the lexemes existing in the current
context-free grammar lexicon section. To the right is also
a list of Parts-Of-Speech, which is updated to show the
parts of speech associated with the selected lexeme on the
left.
There are options here for adding, deleting, updating lexemes as
well as adding and deleting parts-of-speech to the currently
selected lexeme. As was with the Grammar Editor, be careful that
all inputs here are valid and intended. Any change in the grammar
will be saved to disk.
One more thing to note - if you add a word to the lexicon,
remember to also add at least one POS to it as well. The way
the CFGs are defined, if a word has no POS's associated with
it, it will be lost since the definitions are based on POS's
and not by the words themselves.
__________________________________________________________________
CORPUS EDITOR
Provides a listing of all the sentences in the currently loaded
corpus. Clicking on one of these sentences will update the
graphical view and parse interface to jump to that sentence.
The user also has options here to update or delete a sentence
in the corpus, and also is able to request a parse of the
sentence as well (i.e. in the event that he has just modified the
grammar/lexicon and wants to see the change).
__________________________________________________________________
HISTORY CACHE
The history cache is similar to the drop-down URL menu available
on Netscape and Microsoft IE browsers. The program saves retrieved
corpuses to a cache directory and allows you to retrieve them
quickly through the history cache. Files in the history cache
will also be loaded automatically when the user chooses a local
file or the user types into the URL field a URL that matches
one already in the cache.
Previously parses are recorded in these cache-saved files so
that files retrieved through the cache need not parse sentences
over again.
The number of files allowed in the cache can be set through the
Preference Editor. The number of files in cache can also be
set to indefinite through the Preference Editor if the user
does not want any of the files to be deleted in the cache.
The option to clear all items in the cache is available through
the History window.
__________________________________________________________________
HYPERLINKING
An additional enhancement to the graphical view, clicking on
a node in the visual graph will cause the corresponding lists
to auto-select the clicked-on element in either the Grammar
Editor or the Lexicon Editor.
If the user clicks on either a terminal node (a lexeme) or
the parent of a terminal node (a part of speech), the Lexicon
Editor will automatically be sent to front (if visible) with
these items selected.
If the user clicks on any other node, the Grammar Editor will
come to front with the appropriate grammar definition selected.
__________________________________________________________________
PREFERENCE EDITOR
This configuration window gives the user the means to modify
default values associated with different parts of the program.
The user can specify which grammar file or corpus URL to load
up open program start, as well as specify whether the graphical
view will fit its output to user-specified dimensions or will
automatically resize its dimensions to fit the tree rendered
at default font size/ node spacing/ node padding/ etc... The
size of the history cache and whether the program will limit
the saved corpus files to this number of go to indefinitely
can be set here. The maximum depth of the parse tree can also
be specified.
__________________________________________________________________