AINFORMATION SPACE DESIGN RATIONALE

Mark A. Foltz


The Purpose of this Document

This document is a design rationale: an explanation of reasoning behind the choices made in the design and construction of the information access environment it accompanies. It purposes are:

The Design Problem

The JAIR Information Space is an information access environment for the Journal of Artificial Intelligence Research. Such an environment is designed to assist the user in obtaining information which fulfills an information need [1]. In this case, the space assists the user in determining if articles in JAIR can fulfill an underlying information need, and if so, provides a means for downloading and displaying them.

The design of any information access environment must take into account a typical user's knowledge of the domain, his familiarity with search tools and search strategies, and the nature of the information accessible from the environment [2]. For the JAIR information space the typical user is assumed to be familiar with main topics of research within the field of AI and the terminology used to describe them. Also assumed is familiarity with navigating hypertext, using common user interface widgets (buttons, scrollbars), direct-manipulation interfaces, and full-text search interfaces.

The nature of the information accessed through the space is described in the journal's charter, excerpted here:

"JAIR's editorial board is dedicated to the rapid dissemination of important research results to the global AI community. The journal's scope encompasses all areas of Artificial Intelligence, including automated reasoning, cognitive modeling, knowledge representation, learning, natural language, neural networks, perception, and robotics."
Specifically, this space encompasses all of the eighty-odd articles published in the Journal. All of them are available in electronic form, and can be downloaded and displayed by the user on demand.

Designing the Information Space

An information space is an information design in which representations of information objects are situated in a principled space. In a principled space direction and location have meaning, so that mapping and navigation become possible.

Dimensionality and Presentation

The first consideration in information space design is how to map information into the space. One way to do so is to choose dimensions for the space. Dimensions in this sense extend beyond the physical locations of pixels on the screen; rather, a dimension is an independent means of representing information in the context of a particular media. For typical computing environments examples of dimensions other than pixel position are pixel color, sound, animation, and icon choice.

For this space, the primary space chosen to represent the documents was two-dimensional. Two motivations were the prevention of occlusion, which is a recurring problem in a number of information visualizations [3] [4] [5] and restricts focus at undesirable times, and the flexibility of leaving a third spatial dimension through projection to allow representation of additional information.

To make that dimension available, the two-dimensional layout is displayed in orthographic projection with a manipulable viewpoint. The projection permits information to be represented in the orthogonal dimension, and the orthographic projection preserves distance relationships throughout (as opposed to a perspective projection). The manipulable viewpoint allows the user to adjust the visualization to shift focus, prevent occlusion, and gives the impression of exploring a concrete artifact instead of viewing an image.

Article Layout

Previous designs feature spatial layout of document icons in relation to pre-stored queries [6] or based on term-vector similarity [7]. Instead, we chose to first classify the articles into categories based on the major subtopic of AI they addressed and use that classification as the basis for spatial layout. In this way, a two-level hierarchy is used for positioning: primary positioning among categories, and secondary positioning within categories. This structure resembles a projected cone tree [8] but without multilevel hierarchical structure. Since the classification was a simple partition on the articles without weighting, the articles are positioned equidistantly about category centers. The radius of each category is proportional to the square root of the size of the category, so that the area is proportional to size. In this way, easy judgments can be made about the relative distribution of articles among topics. This arrangement also facilitates browsing by maintaining regular spacing.

The categorization was performed manually. The topics were chosen from the 1991 ACM Classification System. 17 of the articles were given a primary category and a secondary category, when judged relevant to more than one topic. These secondary category assignments were used as a metric for category similarity; the more articles that were assigned to a pair of categories, the higher similarity value was assigned. Kruskal's multidimensional scaling algorithm [9] [10] was used to arrange the category centers to find a configuration in which the rank-ordering of dissimilarities is most closely preserved by inter-category distances. Kruskal's algorithm was modified to prevent category overlap. The desired interpretation of the category layout is that, on average, categories with more shared articles are closer together than categories with fewer shared articles.

Colors were chosen to emphasize the information-bearing elements of the visualization, and to render structural elements more subtly. This prevents visual distractions Tufte [11] calls "chartjunk," which hinder perception and interpretation without adding information content to the visualization. For example, the most important parts of the visualization for determining topic relevance and article access are bright green and yellow. Structural features, such as the category perimeters and the ground plane, are in dim gray.

Additional Features

Full-text searching has proven to be an effective information access tool [12] [2]. The JAIR information space permits multi-term searches, with results displayed as lines rising above article-icons in an orthogonal dimension to the ground plane. Several features are worth noting:

Since downloading and viewing articles are expensive operations, any means of conveying more information about an article's contents beforehand is useful. The information space is augmented with a details-on-demand [13] feature that displays an article's full bibliographic entry and an excerpt of the abstract when the pointer is moved over the corresponding article-icon. This facilitates rapid browsing of the space's contents, an important feature in information access environments [14].

Two other browsing methods are supported: by author and by title. Full-list browsing, although inefficient from the standpoint of navigation [15], directly enumerates the complete set of articles and authors, which is not possible with fielded search.

Finally, when accessing an article, all of the documents associated with the article are presented as choices for downloading. In this way, the user can choose to view the article as HTML in the browser window, or download compressed or uncompressed PostScript versions. Each of these options has tradeoffs in download time and convenience for viewing, saving, or printing. In addition, many articles have additional files as appendices, which are accessible from the same list.

Platform Choice

The ideal implementation environment would be integrated with the existing online information infrastructure in which JAIR is published and require minimal user maintenance or client resources. The Java implementation and associated libraries integrated with popular Web browsers satisfy the first two criteria, but resource requirements are problematic for older systems. Performance degradation occurs primarily in viewpoint manipulation, so that the application remains functionally usable.

Future Directions

It is straightforward to evolve the design by adding other layout algorithms to the space for categories and documents. Possibilities for similarity metrics include term-vector similarity, co-citation analysis, or visitation patterns. Entirely different information space designs are possible while retaining the usability features added to this initial design.

References

[1] Belkin, N. J. Information concepts for information science. Journal of Documentation, 34(10):55--85, 1978.
[2] Marchionini, Gary. Information Seeking in Electronic Environments. Cambridge Series on Human-Computer Interaction. Cambridge University Press, 1995.
[3] Carri{\`e}re, Jeromy and Kazman, Richard. Interacting with huge hierarchies: Beyond cone trees. In Gershon and Eick [16], pages 74--81. Atlanta, Georgia.
URL: ftp://ftp.cgl.uwaterloo.ca/pub/users/rnkazman/fsviz.ps.Z
[4] Chuah, Mei C., Roth, Steven F., Mattis, Joe, and Kolojejchick, John. {SDM:} malleable information graphics. In Gershon and Eick [16], pages 36--42. Atlanta, Georgia.
[5] Chalmers, Matthew, Ingram, Robert, and Franger, Christoph. Adding imageability features to information displays. In ACM Symposium on User Interface Science and Technology, pages 33--39. Association for Computing Machinery, 1996. Seattle, Washington.
[6] Olsen, K. A. et al.. Visualization of a document collection: The {VIBE} system. Information Processing and Management, 29(1):69--81, 1993.
[7] Chalmers, Matthew. Using a landscape metaphor to represent a corpus of documents. In Spatial Information Theory, Andrew U. Frank and Irene Campari, editors, Lecture Notes in Computer Science 716, pages 377--390. Springer-Verlag, 1993. Proceedings of COSIT '93.
[8] Robertson, George, Card, Stuart, and MacKinlay, Jock. Cone trees: Animated {3D} visualizations of hierarchical information. In Proceedings of CHI '91: Human Factors in Computing Systems. Association for Computing Machinery, 1991. New Orleans, Lousiana.
[9] Kruskal, J. B. Multidimensional scaling by optimizing goodness-of-fit to a nonmetric hypothesis. Psychometrika, 29(1):1--27, March 1964.
[10] Kruskal, J. B. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29(2):115--129, June 1964.
[11] Tufte, Edward R. Envisioning Information. Graphics Press, 1990.
[12] Salton, Gerald. Automatic information organization and retrieval. McGraw-Hill, 1968.
[13] Shneiderman, Ben. The eyes have it: A task by data type taxonomy for information visualization. Endnote address, 1996. Boulder, Colorado.
[14] Marchionini, G. and Schneiderman, B. Finding facts versus browsing knowledge in hypertext systems. Computer, 21(1):70--80, 1988.
[15] Furnas, George W. Effective view navigation. In Proceedings of CHI '97: Human Factors in Computing Systems. Association for Computing Machinery, 1997. Atlanta, Georgia.
URL: http://www.si.umich.edu/~furnas/EPapers/CHI97-EVN/gwf.html
[16] Gershon, N. and Eick, S. G., editors. Proceedings of the IEEE Symposium on Information Visualization (InfoVis '95). Institute for Electrical and Electronics Engineers, October 1995. Atlanta, Georgia.



Information Architecture
MIT Artificial Intelligence Laboratory
jairspace@ai.mit.edu
Last modified: Fri Oct 24 17:49:06 1997