The Communications Linker System: An Overview

John C. Mallery
Artificial Intelligence Laboratory
Massachusetts Institute of Technology

Prepared for presentation to the panel on Networked Political Communication at The 1994 Meeting of the American Political Science Association, New York City, September 1, 1994.

URL: http://www.ai.mit.edu/projects/iiip/doc/comlink/overview.html


Introduction

During the 1992 presidential campaign, the author developed an email-based communications system that was used to distribute campaign information for five presidential campaigns, provide various interactive services to citizens, and run three automatic surveys over the Internet. The system was centered around a database of persistent objects that represented the information necessary to route documents to people, route their questions or suggestions to campaigns, or store their responses to surveys.

Starting in March 1993, the author ``decampaignified'' the original system, reengineering it into a general substrate for fielding interactive applications over email and the World-Wide Web (WWW). The new system was called the Communications Linker System (COMLINK) because it is a communications system that seeks to connect people with information or other people. The system incorporates a Common Lisp HTTP Server, which makes possible WWW interfaces for applications. To date, the main applications of comlink have centered on distribution and retrieval of documents as well as survey research. This paper will overview some of the technology involved and discuss two applications:

White House Publications System

The White House Publication System is charged with routing documents to people according to their interests. Whereas the campaign system allowed people to add themselves to five mailing lists (news, speeches, economy, foreign, social), the COMLINK system replaces the concept of a mailing list with a distribution taxonomy.
  1. Taxonomic Document Routing: Mailing lists are static and carry no ability to mix and match based on the content of the document stream. The taxonomic document routing in the new system allows people to subscribe to essentially boolean combinations of categories. This not only provides people with the ability to combine different streams but also to suppress within those streams certain types of documents which are not of interest to them. Although mailing lists themselves could be organized as a taxonomy (and this was implemented in the campaign system), the full cross product of categories with negation is exponential in the number of categories, and so undesirable on technical grounds. Routing documents on the basis of a boolean match of the document's categories to the user's subjects does not have these technical drawback and is highly efficient.

  2. Automatic Subscription Maintenance: The comlink system maintains a persistent representation of objects involved in document distribution, including users, documents, categories, and user subscriptions (document selectors). Interfaces via subject line commands and automatic form processing allow the user to edit their subscriptions. Because boolean queries pose a barrier to access by non-technical people, subscription cliches for popular document streams are provided as a simpler subscription interface.

  3. Publications Interface: A graphic user interface to the taxonomy is used to distribute documents. At first, machine learning techniques will be used to guess default categories, but an operator will correct any mistakes, thereby ensuring more accurate coding and providing feedback to the learning algorithm.

  4. Failed Mail Processing: Whenever documents are distributed over email to a large population, there is a steady stream of failed mail that bounces back to the distribution hub. Howard Shrobe and Mark Nahabedian developed a rule based system that helps an operator remove subscriptions for email addresses which no longer work and track down problems with specific mail drops.

  5. Document Retrieval: If documents can be distributed, they can also be retrieved on demand. The same infrastructure used to route documents to subscribers is recycled to support document retrieval. Documents can be retrieved via forms over email or via fill-out form interface on the World Wide Web. An email interface also supports document retrieval based on the full-text of documents. At present, email forms on the publications server rely on relies on a WAIS document retrieval from the University of North Carolina.

  6. Standing Usage Survey: The publications server provides access to standing survey designed to determine how the documents are being used.

Automatic Survey Research

The automatic survey facility in COMLINK supports the full survey cycle from design to administration to analysis. It allows researchers to survey populations accessible via email (and soon via WWW). For now, email is preferred because it carries a minimal level of user authentication, the email address. At present, there are no reliable ways to authenticate respondents over WWW.
  1. Survey Research Based on Automatic Form Processing: An automatic survey facility was implemented as an application of automatic form processing. Users receive questionnaires as email forms, and when they return the forms, the server automatically records their answers in the database. Each question can use specialized presentation types that restrict answers to either closed form choices or typed input (e.g., dates, zipcodes, ....).

  2. Hierarchical and Adaptive Questions: The survey system incorporates an if-then rule system that is used to branch between survey instruments. This facility allows the survey designer to write if-then rules that will administer any number of subsequent instruments based on the respondent's answers to any previous questions. Hierarchical branching is a way to ask follow up questions that are tailored for the specific respondent without forcing everyone else to look at irrelevant or inappropriate questions.

  3. User-Friendly Follow-Up for Failing or Omitted Queries: The form processing system checks each answer to make sure that it conforms to the legal answers or class of answers for the question. Whenever a mistake is made answering a question or a required question is omitted, the system automatically retries only these incomplete questions. In order help the user, the retry email message is prefaced with an explanation of what was wrong with each question that is being retried.

  4. Interfaces for Rapid Survey Design, Release, and Analysis: Benjamin Renaud has developed a window interface for survey design that speeds up the process and nearly eliminates the possibility technical errors in survey designs. Another interface, developed primarily by Mallery, is available for a survey operator to release surveys, monitor progress, perform rudimentary analyses, and convert returns into different data formats.

  5. Machine Learning Algorithms to Explore Returns: The Feature Vector Editor is available for learning if-then rules from survey returns.

  6. Acquiring Relational Models of People's Beliefs and Norms: Future research will examine the possibility of acquiring relational information, much like the example for international conflict management.

Automatic Form Processing

The COMLINK system interacts with users via textual forms exchanged in email, as well as conventional ``subject line'' commands, much like those found in standard listservers. Email servers are associated with a command interface that provides access to all the forms and subject line commands.

Forms are composed of a series of queries. In addition to a question or instructions, a query has an associated CLIM presentation type that presents any default value and parses any new value supplied by the user. Forms are written to a stream by calling the WRITE-FORM function with a set of value bindings for each query of the form. As WRITE-FORM calls the generic operation to present each query, the queries present themselves by calling the PRESENT method for their presentation type.

Forms are parsed by finding queries and converting the textual input associated with each query into its internal representation. The basic procedure for parsing a query is:

When query values are successfully parsed, the form's response function is applied to the parsed query values to perform the computation associated with the form. If there are query parsing errors, the system returns to the user a form with all the correct values defaulted and an explanation about how to correct the failing queries for successful resubmission.

Graphical Form Authoring Tool

A graphical form authoring tool was written by Renaud (1994) for the COMLINK system. Coded in CLIM, the interface defines meta-level abstractions for forms, queries, and presentation types that allow users to define automatic surveys and forms without having to write LISP code. At the same time, the data structures are abstracted in a way that forms can be defined dynamically under program control. For the set of presentation types previously defined for COMLINK, Renaud defined new presentation and accept multimethods that dispatch on the HTML presentation view. This means that the same presentation type, which already worked for the email view, could now operate for the HTML view, displaying itself in HTML and accepting its input with the HTML form-processing facilities.

Conclusions

The COMLINK system provides a flexible and general environment for fielding applications that rely on automatic form processing over email and the world-wide web. The White House Publications system routes documents to people via a taxonomy of categories, and thus, allows people to more finely tune their subscriptions than would be possible with conventional listserve technology. The automatic survey system makes it possible to run hierarchical adaptive surveys over large populations very quickly and very inexpensively. The automatic form processing in COMLINK provides a general framework for interactivity.

Acknowledgments

This paper was improved by comments from ?. This paper describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the M.I.T. Artificial Intelligence Laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under contract number MDA972-93-1-003N7.