This page describes our approach for converting a handwritten mathematical expression into an equivalent expression in a typesetting command language such as TeX or MathML. We have used these ideas to build a demonstration system in Java which recognizes simple expressions written with a pen tablet. A screen image of the system is provided below. The white area is for pen input, and output is provided below in three forms, from left to right: a typeset image, TeX, and MathML (Typesetting provided by WebEQ).

We divide the recognition problem into three modular subproblems
called *isolated symbol classification*, *stroke set
partitioning*, and *parsing*. This division has the
advantage that each subproblem can be solved and its performance
evaluated essentially independent of the others, so that improvements
can be made in each area while still maintaining the integrity of the
entire system.

Recognition is performed in two discrete stages, indicated in the
diagram below. First, the strokes are partitioned into
*symbols* by the stroke partitioner. Then, the identity of the
symbols and the structure of the expression is determined by the
parser. Symbol models, estimated from examples of handwriting, are
used by the partitioner in order to decide how likely it that a
particular set of strokes constitutes a distinct symbol. After
recognition is complete, the user interface displays the resulting
expression.

One important aspect of this division is that hard decisions are made at each stage, so that new information cannot change the results of earlier processing. This has the advantage that it makes the algorithm fast, since it reduces the number of possibilities that it considers in its interpretations. However, it does have the disadvantage that higher-level information, such as the structure of the interpreted expression, cannot change lower-level decisions, such as the stroke set partitioning.

One of the basic problems in handwriting recognition is to take a set of strokes and determine, out of context, which symbol it best represents. To solve this problem we use a statistical approach, creating a single Gaussian model for each symbol class based on examples of a single user's handwriting. The example symbols were preprocessed and then used to estimate the model parameters for each class of symbol in a training phase. Then, during the processing of an unrecognized expression, potential symbols are similarly processed and compared to the models for classification.

Even if one is provided with a perfect symbol classification
algorithm, there is still no way of knowing, *a priori*, which
strokes in an expression should be combined together into symbols or
even how many symbols are present. Consider the example of the
ambiguous expression in the diagram on the right. In this example, it
is unclear whether there are three symbols present, *1 < x* or
whether there are just two, *kx*. The best interpretation of
the overall expression depends on how the strokes are partitioned.

Therefore, the capabilities of the symbol classifier need to be
used within a larger framework to determine the quantity and locations
of the symbols present in an expression. In printed text, such as
mathematics, each pen stroke belongs to a distinct symbol; i.e. no two
two symbols share a stroke. Thus, solving the partitioning problem for
printed text is equivalent to finding the best grouping of a set of
strokes into a set of symbols, called a *partition* of an
expression.

In general, the number of possible partitions of a set of strokes into symbols is exponential in the number of strokes. Therefore, some constraint on the search is necessary if the algorithm is to take a reasonable amount of time to finish. One typical solution is to use the time ordering of the strokes as a constraint; that is, only consider symbols which consist of strokes which were drawn one after the other. While this is efficient, it prevents the user from changing their expression as they write, say by turning a minus sign into a plus sign.

Instead of a stroke ordering constraint, we choose to use a
geometric constraint by only exploring possibilities which are covered
by a *minimum spanning tree* over the strokes. This tree is
defined to be the set of edges connecting the centers of each stroke,
such that there are no cycles in the tree, there is a path between
every pair of strokes, and the sum of the lengths of the edges is
minimized.

Once a minimum spanning tree is found for a set of strokes, the
search for an optimal partition can be constrained by considering only
partitions that form connected subtrees in the MST. In the example
shown below, there are three two-stroke symbols, the *x*, the
*y*, and the *+*. To correctly partition the
expression, it is necessary to consider each of these stroke-pairs as
a possible two stroke symbol. This does in fact occur, because these
pairs are connected in the MST. The *2* and the
*alpha*, on the other hand, are not considered as possibly
belonging to the same symbol since they are not connected.

Once the expression has been correctly partitioned into symbols, there still remains the problem of determining which characters the symbols represent and deciding on the structure of the resulting typeset expression. This can be done using a geometric grammar whose elements are inspired by the atomic elements of TeX's typesetting engine. In this grammar, each character belongs to a single grammar type and combines with other characters in well defined ways based on simple relationships between their bounding boxes. In addition, simple characters also have a baseline which aids in determining when a character is superscripted.

Last Modified Saturday, 20-Sep-2003 18:59:11 EDT

Please send comments or questions to matsakis@mit.edu