|
 |
 |
- [For syntax fans]
Extend the rules with ‘slashes’ from Lab 3b to
handle all the roughly 100 distinct
sentences in the original Generalized Phrase structure grammar paper -
either with or without features. The
reference: Unbounded dependencies and
coordinate structure,'' Linguistic Inquiry, 12,
155-184.
Reprinted in Walter J. Savitch, Emmon Bach, William Marsh and Gila
Safran-Naveh,
eds. The Formal Complexity of Natural
Language, Dordrecht: Reidel, 183-226
(1987). This is fun if you like syntax
(trust me).
- Build a set of
feature-based context-free rules that will handle
Spanish, Italian, German, Hebrew, Gaelic, Russian, Esperanto, …
sentences. Please include sentences including
conjunctions (and discuss this with me in advance). (Yes, people have
done
Gaelic in the past.) (For Russian, see
the last item – it is a ‘free’ word order language.)
- Examine the Scheme
parser in http://www.mit.edu/~niyogi/minimal.htm
(See http://www.mit.edu/~niyogi/m0.scm
)
and (i) add a probabilistic component to the rules; or (ii) add a
well-formed
substring table to the method it uses for parsing. This method is
based on very recent linguistic theories (‘minimalism’)
– I can give you references on that, too.
- Link grammars: Look at
another method for parsing distinct from
forming ‘tree structures’ – e.g., ‘link grammar’, which builds
syntactic
relations between words, rather than trees.
Working parser for windoze/linux/etc is available, along with many
details;
see http://www.link.cs.cmu.edu/link/.
Compare this to ‘traditional’ context-free parsing for natural
languages –
which one works better? How could you
tell?
- Investigate
optimizations/efficiency hacks for Earley
parsing: precomputing ‘chains’ of
rules; lookahead; using an ‘ill-formed’ substring table; matrix
multiplication.
I can give you a reference for this from Graham, Harrison, and Ruzzo.
- [For psychologically
oriented folks] Push deeper into the
distinctions between shift-reduce/shift-shift conflicts as
psychologically
valid models for human sentence processing – implement the 2-stage
Chomsky-Miller parser, based on intonational structure/shallow phrase
breaks,
followed by tree ‘readjustment’
- Extend a feature parser
so that it can be used in a ‘relaxation
mode’ for grammar checking (a la Microsoft Word).
- Compare ‘deep’ (ordinary
context-free) vs. ‘shallow’ chunking
parsers for a task like message understanding: see the parser in the
course
locker /mit/6.863/sco1h/ (and see me for assistance in using it); there
is an
associated paper by S. Abney scol.ps in
this directory under the /doc subdir. This parser is also on the CD
that was
handed out at the beginning of the term.
- Build a parser that
directly computes precedence and dominance
relations, rather than using context-free rules, and see if you can
modify
chart parsing to handle this. [for algorithm/programming fans]
- Parallel context-free
parsing: using a chart parser, or parallel
stacks. Harder: fold probabilistic
rules into the system.
- Implement a method to
adjudicate between structurally different
parses using semantic information on-line – e.g., use the
semantic interpreter to rule out certain subtrees as the
Earley algorithm tries to build them, much as the way that token
lookahead is
used, to ‘fix’ examples like “I shot the elephant in my pajamas’.
- [For linguistically
oriented folks] Use a feature-based grammar system
to implement a part of a ‘real’ linguistic theory, like Chomsky’s
‘Government-Binding’ theory. I have a reference paper for this as well,
to see
which aspects should be implemented.
- Probabilistic
context-free parsing: See http://www.cog.brown.edu/~mj/code/inside-outside.tgz
for a version written in C. Project: apply this to a corpus of
your choosing.
- Apple Pie probabilistic
parser: see http://nlp.cs.nyu.edu/app/index.html
for a downloadable probabilistic parser, windoze executable is
available. Project: apply this to a corpus of your
choosing, and evaluate the results.
- Probabilistic
context-free
parsing to find rules from treebanks: see
http://www.cog.brown.edu/~mj/code/cky.tbz
along with the rule induction method in http://www.cog.brown.edu/~mj/code/johnson-97.pdf
- Build an automatic
phrase structure compiler: a system
that (i) inputs tree structures that a linguist would build outputs the
associated context-free rules to build those trees; and then (ii) given
a
collection of tree structures, induces a metarule to generate that
collection.
- Parsing and free word
order languages. We can generalize context-free
rules so that they had `free' right-hand sides, in the form of sets,
e.g., VP -> {V, NP} means either VP è V NP or VP
è NP V. Generalize
the Earley parser so that it can handle rules of
this sort. Establish the time complexity of this extended algorithm.
Apply it
to a language with ‘free’ worder order, like Russian, etc.
.
|
|
|
 |