6.863J/9.611J Natural Language Processing

 Course Home
 Calendar & Lecture Schedule
 Lecture Notes & Readings
 Course Tools & Software
 Class messages  & Discussion Forum

Tagging projects


  1. Improve the Brill tagger by adding simple grammatical information from a parser, e.g., that a CASE marker like ``of'' tells us that a Noun
    should follow.  (This improves the Brill tagger a great deal, as it turns out.)
  2. Improve the Brill tagger by fixing its biggest bug: adding 'guaranteed pretagging'. As it stands, the Brill tagger rules will even change the tag of a word that one is absolutely sure about.  Fix this so it doesn't happen,
  3. Modify the Brill tagger so that it can tag email. (You will need to possibly add new tags to handle the form of email.)  In addition,
    map the resulting output into classification ``bins'' so that the email is classified as to its semantic (or syntactic) "type".
  4. Investigate different 'smoothing' methods for languages that vary greatly from English, that don't have as much hand-tagged data, and see how one could build adaptive tagging systems, bootstrapping from a core, small set of tagged sentences
  5. Author identification (like plagiarism): implement a real system for this - can it be adapted to, e.g., compute program code?


MIT Home
Massachusetts Institute of Technology Terms of Use Privacy