The Communications Linker System: An Overview
John C. Mallery
Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Prepared for presentation to the panel on Networked
Political Communication at The 1994 Meeting of
the American Political Science
Association, New York City, September 1, 1994.
URL:
http://www.ai.mit.edu/projects/iiip/doc/comlink/overview.html
Introduction
During the 1992 presidential campaign, the author developed an email-based
communications system that was used to distribute campaign information for
five presidential campaigns, provide various interactive services to citizens,
and run three automatic surveys over the Internet. The system was centered
around a database of persistent objects that represented the information
necessary to route documents to people, route their questions or suggestions
to campaigns, or store their responses to surveys.
Starting in March 1993, the author ``decampaignified'' the original
system, reengineering it into a general substrate for fielding interactive
applications over email and the World-Wide
Web (WWW). The new system was called the Communications Linker System
(COMLINK) because it is a communications system that seeks to connect people
with information or other people. The system incorporates a
Common Lisp HTTP Server, which makes possible WWW interfaces for
applications. To date, the main applications of comlink have centered on
distribution and retrieval of documents as well as survey research.
This paper will overview some of the technology involved and discuss two
applications:
- White House Publications System This application was developed to
serve as a main hub for electronically publishing the daily releases by the
Office of Media Affairs. See: Publications@Research.AI.MIT.EDU.
- Automatic Surveying over the Internet: This application was
developed to run automatic surveys over the Internet, in particular to
determine how many people receive the White House documents on a daily basis.
See Surveys@Research.AI.MIT.EDU. It currently running surveys on working
women for the Department of Labor and Federal workers for the National
Performance Review. See Surveys@Town-Hall.AI.MIT.EDU.
The White House Publication System is charged with routing documents to people
according to their interests. Whereas the campaign system allowed people to
add themselves to five mailing lists (news, speeches, economy, foreign,
social), the COMLINK system replaces the concept of a mailing list with a distribution
taxonomy.
- Taxonomic Document Routing: Mailing lists are static and carry no
ability to mix and match based on the content of the document stream. The
taxonomic document routing in the new system allows people to subscribe to
essentially boolean combinations of categories. This not only provides people
with the ability to combine different streams but also to suppress within
those streams certain types of documents which are not of interest to them.
Although mailing lists themselves could be organized as a taxonomy (and this
was implemented in the campaign system), the full cross product of categories
with negation is exponential in the number of categories, and so undesirable
on technical grounds. Routing documents on the basis of a boolean match of the
document's categories to the user's subjects does not have these technical
drawback and is highly efficient.
- Top-down Distribution: In the current distribution model,
categories are added to documents as they are released and users subscribe on
the basis of them. There are about 160 categories which are designed to be
relatively invariant over time.
- Bottom-up by words and phrases: A pregiven taxonomy cannot
anticipate all the precise interests of users. Consequently, research is
underway to allow subscribers to receive documents containing specific words
or phrases. Initial approaches will employ statistical and machine learning
methods to match documents, whereas later approaches will apply natural
language understanding technologies.
- Automatic Subscription Maintenance: The comlink system maintains a
persistent representation of objects involved in document distribution,
including users, documents, categories, and user subscriptions (document
selectors). Interfaces via subject line commands and automatic form processing allow the user
to edit their subscriptions. Because boolean queries pose a barrier to
access by non-technical people, subscription cliches
for popular document streams are provided as a simpler subscription interface.
- Publications Interface: A graphic user interface to the
taxonomy is used to distribute documents. At first, machine learning
techniques will be used to guess default categories, but an operator will
correct any mistakes, thereby ensuring more accurate coding and providing
feedback to the learning algorithm.
- Failed Mail Processing: Whenever documents are distributed
over email to a large population, there is a steady stream of failed mail that
bounces back to the distribution hub. Howard Shrobe and Mark Nahabedian
developed a rule based system that helps an operator remove subscriptions for
email addresses which no longer work and track down problems with specific
mail drops.
- Document Retrieval: If documents can be distributed, they can
also be retrieved on demand. The same infrastructure used to route documents
to subscribers is recycled to support document retrieval. Documents can be
retrieved via forms over email or via fill-out form
interface on the World Wide Web. An email interface also supports document
retrieval based on the full-text of documents. At present, email forms on the
publications server rely on relies on a WAIS document retrieval from the
University of North Carolina.
- Standing Usage Survey: The publications server provides access
to standing survey designed to determine how the documents are being used.
Automatic Survey Research
The automatic survey facility in COMLINK supports the full survey cycle from
design to administration to analysis. It allows researchers to survey
populations accessible via email (and soon via WWW). For now, email is
preferred because it carries a minimal level of user authentication, the email
address. At present, there are no reliable ways to authenticate respondents
over WWW.
- Survey Research Based on Automatic Form Processing: An automatic
survey facility was implemented as an application of automatic form processing. Users receive
questionnaires as email forms, and when they return the forms, the server
automatically records their answers in the database. Each question can use
specialized presentation types that restrict answers to either closed form
choices or typed input (e.g., dates, zipcodes, ....).
- Hierarchical and Adaptive Questions: The survey system
incorporates an if-then rule system that is used to branch between survey
instruments. This facility allows the survey designer to write if-then rules
that will administer any number of subsequent instruments based on the
respondent's answers to any previous questions. Hierarchical branching is a
way to ask follow up questions that are tailored for the specific respondent
without forcing everyone else to look at irrelevant or inappropriate
questions.
- User-Friendly Follow-Up for Failing or Omitted Queries: The
form processing system checks each answer to make sure that it conforms to the
legal answers or class of answers for the question. Whenever a mistake is
made answering a question or a required question is omitted, the system
automatically retries only these incomplete questions. In order help
the user, the retry email message is prefaced with an explanation of what was
wrong with each question that is being retried.
- Interfaces for Rapid Survey Design, Release, and Analysis:
Benjamin Renaud has developed a window interface for survey design that speeds
up the process and nearly eliminates the possibility technical errors in
survey designs. Another interface, developed primarily by Mallery, is
available for a survey operator to release surveys, monitor progress, perform
rudimentary analyses, and convert returns into different data formats.
- Machine Learning Algorithms to Explore Returns: The Feature
Vector Editor is available for learning if-then rules from survey returns.
- Acquiring Relational Models of People's Beliefs and Norms:
Future research will examine the possibility of acquiring relational
information, much like the example for international
conflict management.
The COMLINK system interacts with users via textual forms exchanged in
email, as well as conventional ``subject line'' commands, much like those
found in standard listservers. Email servers are associated with a command
interface that provides access to all the forms and subject line commands.
Forms are composed of a series of queries. In addition to a
question or instructions, a query has an associated CLIM presentation type that presents any
default value and parses any new value supplied by the user. Forms are
written to a stream by calling the WRITE-FORM function with a set of value
bindings for each query of the form. As WRITE-FORM calls the generic operation
to present each query, the queries present themselves by calling the PRESENT
method for their presentation type.
Forms are parsed by finding queries and converting the textual input
associated with each query into its internal representation. The basic
procedure for parsing a query is:
- Scan: Locate the query between special delimiters.
- Intern: Convert the query name into the query object associated with
the form.
- Accept: Parse the textual representation that follows the
delimited query name by calling the interned query's accept method, which in
turn calls the accept method for the associated presentation type.
- Handle Errors: If the query value fails to conform to
requirements of the presentation type, signal the condition so that the user
can be asked to respecify failing queries. Typed error objects carry the
information needed to provide the user with user-friendly explanations about
what was wrong and how to correct it.
When query values are successfully parsed, the form's response function is
applied to the parsed query values to perform the computation associated with
the form. If there are query parsing errors, the system returns to the user a
form with all the correct values defaulted and an explanation about how to
correct the failing queries for successful resubmission.
Graphical Form Authoring Tool
A graphical form authoring tool was written by Renaud (1994) for
the COMLINK system. Coded in CLIM, the
interface defines meta-level abstractions for forms, queries, and presentation
types that allow users to define automatic surveys and forms without having to
write LISP code. At the same time, the data structures are abstracted in a
way that forms can be defined dynamically under program control. For the set
of presentation types previously defined for COMLINK, Renaud defined new
presentation and accept multimethods that
dispatch on the HTML presentation view. This means that the same
presentation type, which already worked for the email view, could now
operate for the HTML view, displaying itself in HTML and accepting its
input with the HTML form-processing facilities.
Conclusions
The COMLINK system provides a flexible and general environment for fielding
applications that rely on automatic form processing over email and the
world-wide web. The White House Publications system routes documents to
people via a taxonomy of categories, and thus, allows people to more finely
tune their subscriptions than would be possible with conventional listserve
technology. The automatic survey system makes it possible to run hierarchical
adaptive surveys over large populations very quickly and very inexpensively.
The automatic form processing in COMLINK provides a general framework for
interactivity.
This paper was improved by comments from ?. This paper describes research done
at the Artificial Intelligence Laboratory
of the Massachusetts Institute of
Technology. Support for the M.I.T. Artificial Intelligence Laboratory's
artificial intelligence
research is provided in part by the Advanced Research Projects Agency of the Department of Defense under
contract number MDA972-93-1-003N7.