[HARLEQUIN][Common Lisp HyperSpec (TM)] [Previous][Up][Next]


2.1.4 Character Syntax Types

The Lisp reader constructs an object from the input text by interpreting each character according to its syntax type. The Lisp reader cannot accept as input everything that the Lisp printer produces, and the Lisp reader has features that are not used by the Lisp printer. The Lisp reader can be used as a lexical analyzer for a more general user-written parser.

When the Lisp reader is invoked, it reads a single character from the input stream and dispatches according to the syntax type of that character. Every character that can appear in the input stream is of one of the syntax types shown in Figure 2-6.

constituent  macro character  single escape  
invalid      multiple escape  whitespace[2]  

Figure 2-6. Possible Character Syntax Types

The syntax type of a character in a readtable determines how that character is interpreted by the Lisp reader while that readtable is the current readtable. At any given time, every character has exactly one syntax type.

Figure 2-7 lists the syntax type of each character in standard syntax.

character  syntax type                 character  syntax type             
Backspace  constituent                 0--9       constituent             
Tab        whitespace[2]               :          constituent             
Newline    whitespace[2]               ;          terminating macro char  
Linefeed   whitespace[2]               <          constituent             
Page       whitespace[2]               =          constituent             
Return     whitespace[2]               >          constituent             
Space      whitespace[2]               ?          constituent*            
!          constituent*                @          constituent             
"          terminating macro char      A--Z       constituent             
#          non-terminating macro char  [          constituent*            
$          constituent                 \          single escape           
%          constituent                 ]          constituent*            
&          constituent                 ^          constituent             
'          terminating macro char      _          constituent             
(          terminating macro char      `          terminating macro char  
)          terminating macro char      a--z       constituent             
*          constituent                 {          constituent*            
+          constituent                 |          multiple escape         
,          terminating macro char      }          constituent*            
-          constituent                 ~          constituent             
.          constituent                 Rubout     constituent             
/          constituent                 

Figure 2-7. Character Syntax Types in Standard Syntax

The characters marked with an asterisk (*) are initially constituents, but they are not used in any standard Common Lisp notations. These characters are explicitly reserved to the programmer. ~ is not used in Common Lisp, and reserved to implementors. $ and % are alphabetic[2] characters, but are not used in the names of any standard Common Lisp defined names.

Whitespace[2] characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions. Macro characters are divided into two kinds, terminating and non-terminating, depending on whether or not they terminate a token. The following are descriptions of each kind of syntax type.

2.1.4.1 Constituent Characters

2.1.4.3 Invalid Characters

2.1.4.4 Macro Characters

2.1.4.5 Multiple Escape Characters

2.1.4.6 Single Escape Character

2.1.4.7 Whitespace Characters


[Starting Points][Contents][Index][Symbols][Glossary][Issues]
Copyright 1996, The Harlequin Group Limited. All Rights Reserved.