6.863J: Natural Language & the

Computer Representation of Knowledge

Spring 2001

Where: 4-145
When: MW, 9:30AM-11:00AM
Laboratory time to be arranged
Last modified: 

Instructor: Robert C. Berwick TA: Sourabh Niyogi
Email: berwick@ai.mit.edu Email: niyogi@mit.edu
Office: 35-423 Office: 35-317
Phone: (617) 253-8918 Phone: (617) 253-1467
Office hours: Weds 12-2 and by appointment  Office hours: TBA shortly

New on this course's web pages:

Prerequisites & Relation to Other Courses:

Students should have some programming experience in a programming language such as Scheme, Lisp, C, C++, Java, and/or Perl. 6.034 is listed as a prerequisite but can be waived by permission of the instructor.

The material covered in this course is selected in such a way that at its completion you should be able to understand current papers in the field of Natural Language Processing (NLP).   No background in NLP is necessary.  All lectures will be published on this page in powerpoint (ppt), Adobe pdf (pdf) and postscript (ps) form; the latter two are more useful for downloading and printing. If you do not have Adobe Acrobat Reader for pdf files on your computer, you can download it from www.adobe.com.


Assignments & Due Dates:


This course is lab-oriented; that is, the work of the course is done via a series of laboratory exercises.  These will be handed out once approximately every two weeks.  There are no exams, in particular, there will be no final exam.  The final project will involve an element of non-determinism, i.e., so-called 'free will',  in that you will be able to choose your own project and combine elements from the previous laboratories, or do something completely new.  For the final project, we will have people work in teams of 2 or 3 (but not more).

The laboratory exercises are designed to be carried out on Athena.  If you are clever and adventuresome, you are certainly free to download the software used and get it running on your own PC/laptop, but this must be own 'your own nickel' - i.e., we cannot guarantee that you will succeed, nor can we offer technical support to do so.

Turning in the Assignments


The Assignments

No. Due date Task Resources
Feb 21 Word Formation 1: Introduction PC-Kimmo; www.sil.org/pckimmo/
#1b Mar 05 Word formation 1b spanish.rec
Mar 09 Word Tagging Brill part of speech tagger; web version of Brill tagger here.
Mar 19 Phrase parsing, Part 1 deMarcken parser; documentation here[pdf]; sentences here
Apr 18   Probabilistic Phrase parsing Inside-outside parser; Apple pie parser
 #5 Apr 25 Semantic Interpretation Syntax-directed semantic inerpretation system
 #6 May 16 Final project Your own design
Open to submissions Closed to submissions

Additional resources:

Tentative Course Schedule:

Week 1
INTRODUCTION: the NLP enterprise, from words to meaning
Reading in Textbook
or Notes
W 02/06 Introduction, Organization, Homeworks. Course Overview: Intro to NLP. Main Issues; fsa's [ppt] [pdf] [ps] For fun, try this link: postmodern
Notes 1
Week 2
WORD MODELING: automata and linguistics
M 02/12 Linguistics: Phonology and Morphology I.; 2-level morphology, Kimmo  Notes 2 here: [pdf][ps] Lab 1a here: [pdf] [ps
Notes 2
W 02/14 Linguistics: Phonology and Morphology II.
Week 3
WORD MODELING: statistical approaches & part of speech tagging
Ch.2, 10
T 02/20 President's Day Class (MIT turns Tuesday into Monday)  Elements of Probability & Information Theory; HMMs  [ppt] [pdf][ps] Lab 1b here: [pdf] [ps
W 02/21 HMM Tagging; Statistical Transformation Rule-Based Tagging; Precision, Recall, Accuracy. [ppt] [pdf][ps
For fun (and learning) try this link to an online Brill tagger here. Note: Default tagging is for Swedish. Click 'English' and click tracing on if you want to see how it works.
Week 4
Ch.3, Notes 3[pdf] [ps]
M 02/26 Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking. [pdf][ps]
W 02/28 Linguistics: Syntax & Parsing Lab 2 here: [pdf] [ps
Week 5
 Notes 4[pdf][ps]
M 03/05 Shift-Reduce Parsers in Detail. Earley's Algorithm and Chart Parsing  [ppt] [pdf]
W 03/07 Context-free Parsing and Beyond: Efficiency Issues; Feature-based parsing; NL system design [ppt][pdf]
F 03/09 Add Date
Week 6
M 03/12 Feature based Parsing I  [ppt] [pdf]
W 03/14 Feature based Parsing II   [ppt] [pdf]  
Week 7
Chs. 9, 11, 12
M 03/19 Probabilistic Parsing: Introduction. Probabilistic CFG: Best parse. Probability of a string. [ppt] [pdf]
W 03/21 PCFG Parameter Estimation and Learning. [ppt] [pdf]
3/26-3/30 No classes - Spring Break. Go to someplace warm if you can.
Week 8
Notes 5
M 04/02 PCFG learning: inside-outside algorithm   [ppt] [pdf]  
W 04/04 Semantic Interpretation I: compositionality  [ppt] [pdf]
Week 9
Chs. 5, 7, 8
M 04/09 Semantic Interpretation II: compositionality and quantifiers   [ppt] [pdf]  
W 04/11 Semantic Interpretation III  [ppt] [pdf]
Week 10
M 04/16 No class - Patriot's Day.  Run the Boston Marathon if you can.
W 04/18 Lexical Semantics I  [ppt] [pdf]
Week 11
M 04/23 Lexical Semantics II: the internal structure of words  [ppt] [pdf]
W 04/25 Word sense diambiguation and information retrieval    [ppt] [pdf]
F 04/27 Drop Date
Week 12
Notes 6
M 04/30 Principle-based Parsing   [ppt] [pdf]
W 05/02 Classical Models of Machine Translation: an Overview   [ppt] [pdf]
Week 13
M 05/07 Statistical Machine Translation (MT).Alignment and Parameter Estimation for MT, I   [ppt] [pdf]
W 05/09 Language learning I   [ppt] [pdf]
Week 14
Notes 7
M  05/14 Computational Models of Language Learning   [ppt] [pdf]
W  05/16 Final project due.  Computational Models of Language Change and the Origins of Language

Grading Weights:

Assignments (1-6) 70%
Final Project 20%
Class Participation 10%