|
 |
|
- Improve the Brill tagger
by adding simple grammatical information from a parser, e.g., that a
CASE marker like ``of'' tells us that a Noun
should follow. (This improves the Brill tagger a great deal, as
it turns out.)
- Improve the Brill tagger
by fixing its biggest bug: adding 'guaranteed pretagging'. As it
stands, the Brill tagger rules will even change the tag of a word that
one is absolutely sure about. Fix this so it doesn't happen,
- Modify the Brill tagger
so that it can tag email. (You will need to possibly add new tags to
handle the form of email.) In addition,
map the resulting output into classification ``bins'' so that the email
is classified as to its semantic (or syntactic) "type".
- Investigate different
'smoothing' methods for languages that vary greatly from English, that
don't have as much hand-tagged data, and see how one could build
adaptive tagging systems, bootstrapping from a core, small set of
tagged sentences
- Author identification
(like plagiarism): implement a real system for this - can it be adapted
to, e.g., compute program code?
.
|
|
|
 |