Multilingual Conversational System Research

9807-11

Progress Report: July 1, 1999–December 31, 1999

James Glass and Stephanie Seneff

 

 

Project Overview

The long-term goals of this research project are to foster collaboration between MIT and NTT speech and language researchers and to develop language-independent approaches to speech understanding and generation. We will initiate this effort by developing the necessary human language technologies that will enable us to port our conversational interfaces from English to Japanese. The Jupiter weather information system will be used as the basis of this porting process. This work will involve the close collaboration with NTT researchers both in Japan and at MIT.

 

Progress Through December 1999

During this period, we have made major progress in the development of the Japanese Jupiter system called Mokusei. At this time, a laboratory prototype exists which is ready to be deployed on a wider scale.

An initial speech recognizer was created with a vocabulary of approximately 2200 words. The pronunciation dictionary was generated by rule, and a set of Japanese phonological rules were developed by examining recognition alignments on Japanese speech recorded for this domain. The acoustic models were initialized from a generic set of English models and then trained on a set of 2000 Japanese read sentences for the Mokusei domain. Currently, an interpolated set of English and Japanese models are then used for the deployed system.

We are exploring two different language models, a traditional class n-gram and a simple context-free grammar derived automatically from our natural language grammar.

We have completely redesigned the grammar for natural language parsing, making use of a "trace" mechanism to greatly improve the parsing speed. Major constituents are first parsed into a shallow parse tree and later dropped off after a look-ahead has revealed the appropriate structural role. This is far more efficient than pre-parsing into every possible role and later pruning the theories that are inappropriate. We have also expanded the coverage substantially, so that the grammar now parses over 95% of a set of nearly 2400 training sentences.

We have continued to refine our speech generation component, which translates English weather reports into Japanese. After determining some limitations of our generation component, we have decided to redesign our GENESIS system to be more effective for Japanese generation. An initial version of the new GENESIS system is now operational, and we are beginning to create a set of generation rules for Japanese weather reports.

We have augmented our content to now include nearly 50 cities in Japan. We also added support for a geographical hierarchy for Japan, such that the system can list the cities that it knows in Kansai, for example.

A complete end-to-end Mokusei system became functional in late November. Since then, we have collected over 1000 utterances from native Japanese speakers, and we are using those data to refine every aspect of the system.

We have developed several different configurations of the system that allow us to easily assess its performance. In one configuration, the user speaks to the system in Japanese and it answers in English. This configuration is useful for system developers who do not understand Japanese. We also have a batch-mode configuration that allows us to reprocess user queries through a later version of the system.

 

Research Plan for the Next Six Months

Now that a prototype system is available, we will devote major effort to acquiring and utilizing speech data from native Japanese speakers. We will expand the natural language grammar to accommodate novel sentence expressions, and we will retrain the speech recognizer as more data become available. We will continue to explore the idea of acquiring a language model for the recognizer automatically from the natural language grammar. We will devote significant effort to translation of the weather reports into Japanese, making use of the new version of GENESIS, which we will continue to develop.