6.863J/9.611J Natural Language Processing

 Course Home
 Calendar & Lecture Schedule
 Lecture Notes & Readings
 Course Tools & Software
 Class messages  & Discussion Forum



  1. Build a complete set of KIMMO rules to handle all of Spanish verbal morphology (or, almost all).
    Spanish verb conjugator/compjuga by Daniel M.German at http://compjugador.sourceforge.net/.  The data files in the archive text-compjugador-0.1.tar.gz can conjugate all the verbs in ‘official’ Spanish (as in the Diccionario de la Real Academia) - close to 10,000 verbs.  See also the Ispell site at:  http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html#Spanish-dicts
  2. Portuguese.  For some references & data on this language see the following http://www.linguateca.pt/
    This is a collection of links to various resources on the Portuguese language.
    The pages of this site contain sections such  as "Ajuda a redaccao" (that includes references to the ISPELL dictionaries for the Portuguese spoken in Portugal and for Brazilian Portuguese), "Componentes basicos de um sistema de Processamento de Linguagem Natural: analisadores ou geradores da lingua", "Conjugadores verbais", as well as links to numerous "Dicionarios gerais" with more complete accounts of the Portuguese inflectional system.
  3. Italian. For inflectional forms, see http://members.xoom.virgilio.it/trasforma/ispell/  There are other links to Italian morphology to use, which I can provide.
  4. Japanese: ftp://crl.nmsu.edu/CLR/lexica/jmorphdict/
  5. Greek:  http://www.csd.auth.gr/~setn02/poster_papers/053.pdf  This paper explores the limits of Kimmo, and you might want to do the same.
  6. Many other language examples are possible: Pig-Latin; Portuguese; Esperanto at http://www.cis.upenn.edu/~cis639/home.html  (Under ‘assignments’)
  7. Turkish:  For a set of examples for Turkish, along with data, we can furnish you a previous year's laboratory with complete instructions.
  8. [Harder, but more fun]  Rules to handle a non-concatenative language, like Arabic, Hebrew, etc.  (Reference for Arabic: J. McCarthy's PhD. thesis at MIT, in the MIT Humanities Library).  Note: in the past, people have implemented their own system (not Kimmo) to do this, in Scheme.   See the Arabic demo at http://www.cis.upenn.edu/~cis639/home.html  for another approach.  I can provide you with many additional references on  Semitic languages.
  9. mplement  finite-state rule compilation (i.e.,  combining the fst's from rules into one large one, via the methods of composition and/or intersection)
  10. Implement a method to use re-write ‘arrow rules’ as input to Kimmo (partly done in the current implementation – but incomplete), so that one can write rules without reference at all to finite-state tables.  E.g.,  a:0 -> V a:s X, where V, X are left and right contexts.


MIT Home
Massachusetts Institute of Technology Terms of Use Privacy