6.863J/9.611J Tagging Project Ideas

Improve the Brill tagger by adding simple grammatical information from a parser, e.g., that a CASE marker like ``of'' tells us that a Noun
should follow. (This improves the Brill tagger a great deal, as it turns out.)
Improve the Brill tagger by fixing its biggest bug: adding 'guaranteed pretagging'. As it stands, the Brill tagger rules will even change the tag of a word that one is absolutely sure about. Fix this so it doesn't happen,
Modify the Brill tagger so that it can tag email. (You will need to possibly add new tags to handle the form of email.) In addition,
map the resulting output into classification ``bins'' so that the email is classified as to its semantic (or syntactic) "type".
Investigate different 'smoothing' methods for languages that vary greatly from English, that don't have as much hand-tagged data, and see how one could build adaptive tagging systems, bootstrapping from a core, small set of tagged sentences
Author identification (like plagiarism): implement a real system for this - can it be adapted to, e.g., compute program code?