6.863J: Natural Language & the

Computer Representation of Knowledge

Spring 2002

Where: 4-149
When: MW, 9:30AM-11:00AM
Laboratory time to be arranged
Last modified: 

Instructor: Robert C. Berwick TA: Sourabh Niyogi
Email: berwick@ai.mit.edu Email: niyogi@mit.edu; http://www.mit.edu/~niyogi/
Office: 35-423 Office: 35-419
Phone: (617) 253-8918 Phone: (617) 253-7255
Office hours: Weds 12-2 and by appointment  Office hours: T12-2 (35-419)

New on this course's web pages:

Prerequisites & Relation to Other Courses:

Students should have some programming experience in a programming language such as Scheme, Lisp, C, C++, Java, and/or Perl. 6.034 is listed as a prerequisite but can be waived by permission of the instructor.

The material covered in this course is selected in such a way that at its completion you should be able to understand current papers in the field of Natural Language Processing (NLP).   No background in NLP is necessary.  All lectures will be published on this page in powerpoint (ppt), Adobe pdf (pdf) and postscript (ps) form; the latter two are more useful for downloading and printing. If you do not have Adobe Acrobat Reader for pdf files on your computer, you can download it from www.adobe.com.


Assignments & Due Dates:


This course is lab-oriented; that is, the work of the course is done via a series of laboratory exercises.  These will be handed out once approximately every two weeks.  There are no exams, in particular, there will be no final exam.  The final project will involve an element of non-determinism, i.e., so-called 'free will',  in that you will be able to choose your own project and combine elements from the previous laboratories, or do something completely new.  For the final project, we will have people work in teams of 2 or 3 (but not more).

The laboratory exercises are designed to be carried out on Athena.  If you are clever and adventuresome, you are certainly free to download the software used and get it running on your own PC/laptop, but this must be own 'your own nickel' - i.e., we cannot guarantee that you will succeed, nor can we offer technical support to do so.

Turning in the Assignments


The Assignments

No. Due date Task Resources
Feb 20 Word Formation 1a: Introduction PC-Kimmo; www.sil.org/computing/catalog/pc-kimmo; Users' guide: http://www.sil.org/pckimmo/v2/doc/guide.htmlSee also: http://www.sil.org/pckimmo/
#1b Feb 27 Word formation 1b spanish.recspanish conjugator hereSee overview & sample here
Mar 18 Word Tagging Brill part of speech tagger; web version of Brill tagger here. Paper on Brill tagger here. (ps) Hmm tagger paper here. (ps)
Apr 1 Phrase parsing, Part 1 deMarcken parser; documentation here[pdf]; sentences here
Apr 18 Semantic Interpretation Syntax-directed semantic inerpretation system
 #5 Apr 22 Probabilistic parsing
 #6 May 15 Final project Your own design
Open to submissions Closed to submissions

Additional resources:

Tentative Course Schedule:

Week 1
INTRODUCTION: the NLP enterprise, from words to meaning
Reading in Textbook
or Notes
W 02/06 Introduction, Organization, Homeworks. Course Overview: Intro to NLP. Main Issues; fsa's [ppt] [pdf] [ps] For fun, try this link: postmodern
pp.1-57  Notes 1
Week 2
WORD MODELING: automata and linguistics
pp. 58-90 
M 02/11 Linguistics: Phonology and Morphology I.; 2-level morphology, Kimmo  Notes 2 here: [pdf][ps] Lab 1a here: [pdf] [ps
 Notes 2
W 02/13 Linguistics: Phonology and Morphology II. [ppt] [pdf]
pp. 287-321
Week 3
WORD MODELING: statistical approaches & part of speech tagging
T 02/19 President's Day Class (MIT turns Tuesday into Monday)ïF½  Kimmo in detail: lab 1b  [ppt] [pdf][ps] Lab 1b here: [pdf] [ps
pp. 235-284
W 02/20 HMM Tagging; Statistical Transformation Rule-Based Tagging; Precision, Recall, Accuracy. [ppt] [pdf][ps]
For fun (and learning) try this link to an online Brill tagger here. Note: Default tagging is for Swedish. Click 'English' and click tracing 'on' if you want to see how it works.
Week 4
 Notes 3[pdf] [ps]
M 02/25 Tagging: the Brill Tagger [pdf][ps]
pp. 357-394
W 02/27 Introduction to Parsing; Linguistics: Syntax & Parsing Lab 2 here: [pdf] [ps
Week 5
 Notes 4[pdf][ps]
M 03/04 Shift-Reduce Parsers in Detail. Earley's Algorithm and Chart Parsing  [ppt] [pdf]
W 03/06 Context-free Parsing and Beyond: Efficiency Issues; Feature-based parsing; NL system design [ppt][pdf]
pp. 477-498
F 03/08 Add Date
Week 6
M 03/11 Shift-Reduce Parsers in Detail. Earley's Algorithm and Chart Parsing  [ppt] [pdf]  
W 03/13  
Week 7
pp. 395-446;pp. 447-476
M 03/19 Writing Grammars  [ppt] [pdf]
W 03/21 Feature Grammars  [ppt] [pdf]
3/25-3/29 No classes - Spring Break. Go to someplace warm if you can.
Week 8
Notes 5
M 04/01 PCFG learning: inside-outside algorithm   [ppt] [pdf]  
W 04/03 Semantic Interpretation I: compositionality  [ppt] [pdf]
pp. 501-544
Week 9
M 04/08 Semantic Interpretation II: compositionality and quantifiers   [ppt] [pdf]
 pp. 545-588
W 04/10 Semantic Interpretation III  [ppt] [pdf]
Week 10
pp. 589-630
M 04/15 No class - Patriot's Day.  Run the Boston Marathon if you can.
W 04/17 Lexical Semantics I  [ppt] [pdf]  
Week 11
M 04/22 Determiners and Quantifiers  [ppt] [pdf]
W 04/24 Determiners & Quantifiers, II    [ppt] [pdf]
pp. 631-666
Th 04/25 Drop Date
Week 12
Notes 6
M 04/29 Principle-based Parsing   [ppt] [pdf]
W 05/01 Classical Models of Machine Translation: an Overview   [ppt] [pdf]
pp. 799-830
Week 13
M 05/06 Statistical Machine Translation (MT).Alignment and Parameter Estimation for MT, I   [ppt] [pdf]
W 05/08 Language learning I   [ppt] [pdf]
Week 14
Notes 7
M  05/13 Computational Models of Language Learning   [ppt] [pdf]
W  05/15 Final project due.  Computational Models of Language Change and the Origins of Language

Grading Weights:

Assignments (1-6) 70%
Final Project 20%
Class Participation 10%