6.863J: Natural Language & the

Computer Representation of Knowledge

Spring 2002

Where: 4-149
When: MW, 9:30AM-11:00AM
Laboratory time to be arranged
Last modified: 


Instructor: Robert C. Berwick TA: Sourabh Niyogi
Email: berwick@ai.mit.edu Email: niyogi@mit.edu; http://www.mit.edu/~niyogi/
Office: 35-423 Office: 35-419
Phone: (617) 253-8918 Phone: (617) 253-7255
Office hours: Weds 12-2 and by appointment  Office hours: T12-2 (35-419)



New on this course's web pages:

Prerequisites & Relation to Other Courses:

Students should have some programming experience in a programming language such as Scheme, Lisp, C, C++, Java, and/or Perl. 6.034 is listed as a prerequisite but can be waived by permission of the instructor.

The material covered in this course is selected in such a way that at its completion you should be able to understand current papers in the field of Natural Language Processing (NLP).   No background in NLP is necessary.  All lectures will be published on this page in powerpoint (ppt), Adobe pdf (pdf) and postscript (ps) form; the latter two are more useful for downloading and printing. If you do not have Adobe Acrobat Reader for pdf files on your computer, you can download it from www.adobe.com.

Readings:


Assignments & Due Dates:

Assignments

This course is lab-oriented; that is, the work of the course is done via a series of laboratory exercises.  These will be handed out once approximately every two weeks.  There are no exams, in particular, there will be no final exam.  The final project will involve an element of non-determinism, i.e., so-called 'free will',  in that you will be able to choose your own project and combine elements from the previous laboratories, or do something completely new.  For the final project, we will have people work in teams of 2 or 3 (but not more).

The laboratory exercises are designed to be carried out on Athena.  If you are clever and adventuresome, you are certainly free to download the software used and get it running on your own PC/laptop, but this must be own 'your own nickel' - i.e., we cannot guarantee that you will succeed, nor can we offer technical support to do so.

Turning in the Assignments

Policies

The Assignments

No. Due date Task Resources
#1a
Feb 20 Word Formation 1a: Introduction PC-Kimmo; www.sil.org/computing/catalog/pc-kimmo; Users' guide: http://www.sil.org/pckimmo/v2/doc/guide.htmlSee also: http://www.sil.org/pckimmo/
#1b Feb 27 Word formation 1b spanish.recspanish conjugator hereSee overview & sample here
#2
Mar 18 Word Tagging Brill part of speech tagger; web version of Brill tagger here. Paper on Brill tagger here. (ps) Hmm tagger paper here. (ps)
#3
Apr 1 Phrase parsing, Part 1 deMarcken parser; documentation here[pdf]; sentences here
#4
Apr 18 Semantic Interpretation Syntax-directed semantic inerpretation system
 #5 Apr 22 Probabilistic parsing
 #6 May 15 Final project Your own design
Open to submissions Closed to submissions

Additional resources:


Tentative Course Schedule:

Week 1
INTRODUCTION: the NLP enterprise, from words to meaning
Reading in Textbook
or Notes
W 02/06 Introduction, Organization, Homeworks. Course Overview: Intro to NLP. Main Issues; fsa's [ppt] [pdf] [ps] For fun, try this link: postmodern
pp.1-57  Notes 1
Week 2
WORD MODELING: automata and linguistics
pp. 58-90 
M 02/11 Linguistics: Phonology and Morphology I.; 2-level morphology, Kimmo  Notes 2 here: [pdf][ps] Lab 1a here: [pdf] [ps
 Notes 2
W 02/13 Linguistics: Phonology and Morphology II. [ppt] [pdf]
pp. 287-321
Week 3
WORD MODELING: statistical approaches & part of speech tagging
 
T 02/19 President's Day Class (MIT turns Tuesday into Monday)ïF½  Kimmo in detail: lab 1b  [ppt] [pdf][ps] Lab 1b here: [pdf] [ps
pp. 235-284
W 02/20 HMM Tagging; Statistical Transformation Rule-Based Tagging; Precision, Recall, Accuracy. [ppt] [pdf][ps]
For fun (and learning) try this link to an online Brill tagger here. Note: Default tagging is for Swedish. Click 'English' and click tracing 'on' if you want to see how it works.
Week 4
LINGUISTICS & GRAMMARS; PARSING ALGORTHMS I
 Notes 3[pdf] [ps]
M 02/25 Tagging: the Brill Tagger [pdf][ps]
pp. 357-394
W 02/27 Introduction to Parsing; Linguistics: Syntax & Parsing Lab 2 here: [pdf] [ps
Week 5
PARSING ALGORITHMS II
 Notes 4[pdf][ps]
M 03/04 Shift-Reduce Parsers in Detail. Earley's Algorithm and Chart Parsing  [ppt] [pdf]
W 03/06 Context-free Parsing and Beyond: Efficiency Issues; Feature-based parsing; NL system design [ppt][pdf]
pp. 477-498
F 03/08 Add Date
Week 6
PARSING ALGORITHMS, CONTD
 
M 03/11 Shift-Reduce Parsers in Detail. Earley's Algorithm and Chart Parsing  [ppt] [pdf]  
W 03/13  
Week 7
FEATURE Parsing; TREE BANKS & PROBABILISTIC PARSING
pp. 395-446;pp. 447-476
M 03/19 Writing Grammars  [ppt] [pdf]
W 03/21 Feature Grammars  [ppt] [pdf]
3/25-3/29 No classes - Spring Break. Go to someplace warm if you can.
Week 8
SEMANTIC INTERPRETATION
Notes 5
M 04/01 PCFG learning: inside-outside algorithm   [ppt] [pdf]  
W 04/03 Semantic Interpretation I: compositionality  [ppt] [pdf]
pp. 501-544
Week 9
 SEMANTICS II
 
M 04/08 Semantic Interpretation II: compositionality and quantifiers   [ppt] [pdf]
 pp. 545-588
W 04/10 Semantic Interpretation III  [ppt] [pdf]
Week 10
WORDS & LEXICAL SEMANTICS, I
pp. 589-630
M 04/15 No class - Patriot's Day.  Run the Boston Marathon if you can.
W 04/17 Lexical Semantics I  [ppt] [pdf]  
Week 11
WORDS & LEXICAL SEMANTICS, II
M 04/22 Determiners and Quantifiers  [ppt] [pdf]
W 04/24 Determiners & Quantifiers, II    [ppt] [pdf]
pp. 631-666
Th 04/25 Drop Date
Week 12
MACHINE TRANSLATION, I
Notes 6
M 04/29 Principle-based Parsing   [ppt] [pdf]
W 05/01 Classical Models of Machine Translation: an Overview   [ppt] [pdf]
pp. 799-830
Week 13
MACHINE TRANSLATION, II
M 05/06 Statistical Machine Translation (MT).Alignment and Parameter Estimation for MT, I   [ppt] [pdf]
W 05/08 Language learning I   [ppt] [pdf]
Week 14
EVOLUTIONARY MODELS OF LANGUAGE LEARNING & ORIGINS
Notes 7
M  05/13 Computational Models of Language Learning   [ppt] [pdf]
W  05/15 Final project due.  Computational Models of Language Change and the Origins of Language


Grading Weights:


Assignments (1-6) 70%
Final Project 20%
Class Participation 10%