This document describes a library of data structures and functions developed over the years for programs in the Occasional Publications in Academic Computing series published by SIL International. (For SIL International, "academic" refers to linguistics, literacy, anthropology, translation, and related fields.) It is hoped that this documentation will make future maintenance of these programs easier.
The basic goal behind choosing names in the OPAC function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name.
Preprocessor macro names are written entirely in capital letters. If
the name requires more than one word for an adequate description, the
words are joined together with intervening underscore (_)
characters.
Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words.
Variable names in the OPAC function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book Code Complete on pages 202-206.
Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix.
The type prefix has the following basic possibilities:
b
char, short, or int
c
char but sometimes a short or
int
d
double
e
enum or as a char,
short, or int
i
int, short, long, or
(rarely) char
s
struct statement
sz
pf
In addition, the basic types may be prefixed by these qualifiers:
u
a
p
The descriptive name portion of a variable name consists of one or more
capitalized words concatenated together. There are no underscores
(_) separating these words from each other, or from the type
prefix. For the OPAC function library, the descriptive
name for global variables
may begin with the name of the most relevant data strucure, if any.
The scope suffix has these possibilities:
_g
_m
static)
_in
_out
_io
_s
static)
The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call.
Global function names in the OPAC function library have
two parts: a verb that is all lowercase followed by a noun phrase
containing one or more capitalized words. These pieces are
concatanated without any intervening underscores (_). For the
OPAC library functions, the noun phrase section
includes
the name of the most relevant data strucure, if any.
Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to.
SAMPLE_NAME
SampleName
pSampleName
writeSampleName
SampleName).
This chapter describes the data structures defined for the OPAC function library. These include both general purpose data collection structures and specialized linguistic processing data structures. For each data structure that the library provides, this information includes which header files to include in your source to obtain its definition.
#include "textctl.h" /* or template.h or opaclib.h */
typedef struct caseless_letter {
unsigned char * pszLetter;
struct caseless_letter * pNext;
} CaselessLetter;
The CaselessLetter data structure is normally used only inside a
TextControl data structure. It stores a multibyte character
string that represents a single caseless letter.
The fields of the CaselessLetter data structure are as follows:
pszLetter
pNext
`textctl.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
typedef struct change_list {
char * pszMatch;
char * pszReplace;
ChangeEnvironment * pEnvironment;
char * pszDescription;
struct change_list * pNext;
} Change;
A Change data structure stores a single "consistent change" to
apply to character strings. Such consistent changes are usually used
as ordered lists of changes rather than being applied in isolation here
and there.
The fields of the Change data structure are as follows:
pszMatch
pszReplace
pEnvironment
pszDescription
pNext
`change.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
typedef struct chg_envir {
short bNot;
ChgEnvItem * pLeftEnv;
ChgEnvItem * pRightEnv;
struct chg_envir * pNext;
} ChangeEnvironment;
The ChangeEnvironment data structure is normally used only
inside a Change data structure.
The fields of the ChangeEnvironment data structure are as follows:
bNot
pLeftEnv
pRightEnv
pNext
`change.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
typedef struct chg_env_item {
char iFlags;
union { char * pszString;
StringClass * pClass; } u;
struct chg_env_item * pNext;
} ChgEnvItem;
The ChgEnvItem data structure is normally used only inside a
ChangeEnvironment data structure, which is normally used only
inside a Change data structure.
The fields of the ChgEnvItem data structure are as follows:
iFlags & E_NOT
iFlags & E_CLASS
iFlags & E_ELLIPSIS
iFlags & E_OPTIONAL
u.pszString
iFlags & E_CLASS is 0.
u.pClass
StringClass data structure if iFlags & E_CLASS
is not 0.
See section 3.8 StringClass.
pNext
`change.h'
#include "record.h" /* or opaclib.h */
typedef struct {
char * pCodeTable;
unsigned uiCodeCount;
char * pszFirstCode;
} CodeTable;
The CodeTable data structure is used to map between the field
codes used in a standard format file and single characters used in
case labels inside switch statements in C code.
The fields of the CodeTable data structure are as follows:
pCodeTable
"match1\0A\0match2\0B\0". Note that the replacement strings
are assumed to be single characters.
uiCodeCount
pCodeTable.
pszFirstCode
pCodeTable.
`record.h'
#include "textctl.h" /* or template.h or opaclib.h */
typedef struct lower_letter {
unsigned char * pszLower;
StringList * pUpperList;
struct lower_letter * pNext;
} LowerLetter;
The LowerLetter data structure is normally used only inside a
TextControl data structure. It stores a multibyte character
string that represents a single lowercase letter. It also stores a list
of the corresponding uppercase multigraph character strings.
The fields of the NumberedMessage data structure are as follows:
pszLower
pUpperList
pNext
`textctl.h'
#include "rpterror.h" /* or opaclib.h */
typedef struct {
int eType;
unsigned uiNumber;
char * pszMessage;
} NumberedMessage;
The NumberedMessage data structure stores the information for a
single numbered error or warning message. This is the style of error
reporting used by the PC-Kimmo and PC-PATR programs.
The fields of the NumberedMessage data structure are as follows:
eType
ERROR_MSG
WARNING_MSG
DEBUG_MSG
uiNumber
pszMessage
printf style format string for the message.
`rpterror.h'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
typedef struct string_class {
char * pszName;
StringList * pMembers;
struct string_class * pNext;
} StringClass;
The StringClass data structure stores a labeled set of strings.
The intention is that any one of the set of strings may be used in a
matching operation.
The fields of the StringClass data structure are as follows:
pszName
pMembers
pNext
`strclass.h'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
typedef struct strlist
{
char * pszString;
struct strlist * pNext;
} StringList;
The StringList data structure is used to store a collection of
character strings. This collection may be a set (no duplicate
strings), an ordered list, or an unordered list, depending on how the
programmer adds strings to the list.
The fields of the StringList data structure are as follows:
pszString
pNext
This is one of the most commonly used data structures in the OPAC function library.
3.9.3 Source File `strlist.h'
#include "textctl.h" /* or template.h or opaclib.h */
typedef struct text_control {
char * pszTextControlFile;
LowerLetter * pLowercaseLetters;
UpperLetter * pUppercaseLetters;
CaselessLetter * pCaselessLetters;
Change * pOrthoChanges;
Change * pOutputChanges;
StringList * pIncludeFields;
StringList * pExcludeFields;
unsigned char cFormatMark;
unsigned char cAmbig;
unsigned char cDecomp;
unsigned char cBarMark;
unsigned char * pszBarCodes;
char bIndividualCapitalize;
char bCapitalize;
unsigned uiMaxAmbigDecap;
} TextControl;
The TextControl data structure is used to control reading a text
file into a (sequence of) WordTemplate data structure(s), or
writing a (sequence of) WordTemplate data structure(s) to a text
file.
The fields of the TextControl data structure are as follows:
pszTextControlFile
pLowercaseLetters
pUppercaseLetters
pCaselessLetters
pOrthoChanges
pOutputChanges
pIncludeFields
pExcludeFields
cFormatMark
cAmbig
cDecomp
cBarMark
pszBarCodes
bIndividualCapitalize
bCapitalize
uiMaxAmbigDecap
`textctl.h'
#include "trie.h" /* or opaclib.h */
typedef struct s__trienode
{
unsigned char cLetter;
struct s__trienode * pChildren;
struct s__trienode * pSiblings;
void * pTrieInfo;
} Trie;
A trie is a data structure designed for relatively fast insertion and relatively fast retrieval of information referenced by a "key" string. See Knuth 1973, pages 481-505, for an extended treatment of tries.
The fields of the Trie data structure are as follows:
cLetter
pChildren
cLetter in
their key at this point.
pSiblings
cLetter in their key at this point.
pTrieInfo
`trie.h'
#include "textctl.h" /* or template.h or opaclib.h */
typedef struct upper_letter {
unsigned char * pszUpper;
StringList * pLowerList;
struct upper_letter * pNext;
} UpperLetter;
The UpperLetter data structure is normally used only inside a
TextControl data structure. It stores a multibyte character
string that represents a single uppercase letter. It also stores a list
of the corresponding lowercase multigraph character strings.
The fields of the NumberedMessage data structure are as follows:
pszUpper
pLowerList
pNext
Application programmers should not need to use this data structure
directly, as its only use is for a list embedded in the
TextControl data structure.
3.12.3 Source File `textctl.h'
#include "template.h" /* or opaclib.h */
typedef struct word_analysis {
char * pszAnalysis;
char * pszDecomposition;
char * pszCategory;
char * pszProperties;
char * pszFeatures;
char * pszUnderlyingForm;
char * pszSurfaceForm;
struct word_analysis * pNext;
} WordAnalysis;
The WordAnalysis data structure is normally used a part of a
WordTemplate data structure to record the result of
morphological analysis.
The fields of the WordAnalysis data structure are as follows:
pszAnalysis
pszDecomposition
cDecomp field of a TextControl data structure.
pszCategory
=).
pszProperties
=).
pszFeatures
=).
pszUnderlyingForm
cDecomp field of a TextControl data
structure.
pszSurfaceForm
pNext
`template.h'
typedef struct {
char * pszFormat;
char * pszOrigWord;
char ** paWord;
char * pszNonAlpha;
short iCapital;
short iOutputFlags;
WordAnalysis * pAnalyses;
StringList * pNewWords;
} WordTemplate;
The WordTemplate data structure is used to hold a single word
for processing, with the original capitalization and punctuation
preserved for restoration on output.
The fields of the WordTemplate data structure are as follows:
pszFormat
pszOrigWord
paWord
NULL-terminated array of alternative surface forms
after decapitalization and orthography changes.
pszNonAlpha
iCapital
NOCAP
INITCAP
ALLCAP
4-65535
4 is the first letter being capitalized, 8
is the second letter being capitalized, and so on. This scheme handles
only the first 14 characters of the word.
iOutputFlags & WANT_DECOMPOSITION
pAnalyses->pszDecomposition) to
be written to an output file if set (nonzero).
iOutputFlags & WANT_CATEGORY
pAnalyses->pszCategory) to be
written to an output file if set.
iOutputFlags & WANT_PROPERTIES
pAnalyses->pszProperties) to be
written to an output file if set.
iOutputFlags & WANT_FEATURES
pAnalyses->pszFeatures) to
be written to an output file if set.
iOutputFlags & WANT_UNDERLYING
pAnalyses->pszUnderlyingForm)
to be written to an output file if set.
iOutputFlags & WANT_ORIGINAL
pszOrigWord) to be written to an
output file if set.
pAnalyses
pNewWords
`template.h'
This chapter gives the proper usage information about each of the global variables found in the OPAC function library. For each global variable that the library provides, this information includes which header files to include in your source to obtain the extern declaration for that variable.
#include "allocmem.h" /* or opaclib.h */ extern void (* pfOutOfMemory_g)(size_t uiSize_in);
pfOutOfMemory_g points to a function used by allocMemory
and related functions whenever malloc or realloc return a
NULL. This function has one argument, the size of the
allocation request that failed. It is assumed that this function does
not return normally, so that programs that use allocMemory do
not need to check for a successful memory allocation. This can be
satisfied either by aborting the program or by judicious use of
setjmp and longjmp.
The default value for pfOutOfMemory_g is NULL.
This causes a function to be used which simply displays an error
message (using szOutOfMemoryMarker_g) and aborts the program.
4.1.3 Example
#include <stdio.h>
#include <setjmp.h>
#include "allocmem.h"
...
static jmp_buf jmpNoMemory_m;
static void out_of_memory(uiRequest_in)
size_t uiRequest_in;
{
fprintf(stderr,
"Out of memory requesting %lu bytes---trying to recover",
(unsigned long)uiRequest_in);
longjmp( jmpNoMemory_m, 1 );
}
char * processData()
{
char * p;
if (setjmp( jmpNoMemory_m ))
{
/* free any memory left hanging in mid air */
...
return NULL;
}
pfOutOfMemory_g = out_of_memory;
p = processSafely();
pfOutOfMemory_g = NULL; /* restore default behavior */
return p;
}
`allocmem.c'
#include "record.h" /* or opaclib.h */ extern char * pRecordBuffer_g;
pRecordBuffer_g points to the dynamically allocated buffer used
by readStdFormatRecord for its return value. Allocating this
buffer is handled automatically (but perhaps not optimally) if the
programmer does not allocate it explicitly.
4.2.3 Example
#include "record.h" #include "allocmem.h" #define BIG_RECSIZE 16000 #define SMALL_RECSIZE 500 ... /* * allocate space for records */ pRecordBuffer_g = (char *)allocMemory( BIG_RECSIZE ); uiRecordBufferSize_g = BIG_RECSIZE; ... /* * reduce amount of memory allocated for records */ freeMemory( pRecordBuffer_g ); pRecordBuffer_g = (char *)allocMemory( SMALL_RECSIZE ); uiRecordBufferSize_g = SMALL_RECSIZE; ... /* * release memory allocated for records */ cleanupAfterStdFormatRecord();
`record.c'
#include "allocmem.h" /* or opaclib.h */ extern char szOutOfMemoryMarker_g[/*101*/];
szOutOfMemoryMarker_g is a character array used by
allocMemory and friends whenever malloc or realloc
return a NULL and pfOutOfMemory_g is NULL. The
contents of the character array are used as part of the error message
notifying the user that a request for more memory has failed.
The default value for szOutOfMemoryMarker_g is to be empty (all
NUL bytes). This means that no context sensitive information is
provided in the error message displayed just before the program aborts.
4.3.3 Example
#include "allocmem.h" ... int * piArray; ... strncpy(szOutOfMemoryMarker_g, "creating huge array", 100); piArray = allocMemory( 100000 * sizeof(int) );
`allocmem.c'
#include "record.h" /* or opaclib.h */ /*#define MAX_RECKEY_SIZE 64*/ extern char szRecordKey_g[MAX_RECKEY_SIZE];
readStdFormatRecord stores the first MAX_RECKEY_SIZE-1
characters following the record marker in szRecordKey_g. This
may or may not be useful information.
4.4.3 Example
#include <stdio.h>
#include "record.h"
#include "rpterror.h"
...
void load_dictionary(
char * pszInputFile_in,
CodeTable * pCodeTable_in,
int cComment_in)
{
FILE * pInputFP;
char * pRecord;
char * pszField;
char * pszNextField;
unsigned uiRecordCount = 0;
pInputFP = fopen(pszInputFile_in, "r");
if (pInputFP == NULL)
{
reportError(WARNING_MSG, "Cannot open dictionary file %s\n",
pszInputFile_in);
return;
}
while ((pRecord = readStdFormatRecord(pInputFP,
pCodeTable_in,
cComment_in,
&uiRecordCount)) != NULL)
{
pszField = pRecord;
while (*pszField)
{
pszNextField = pszField + strlen(pszField) + 1;
switch (*pszField)
{
case 'A':
...
break;
case 'B':
...
break;
...
default:
reportError(WARNING_MSG,
"Warning: unrecognized field in record %u (%s)\n%s\n",
uiRecordCount, szRecordKey_in, pszField);
break;
}
...
pszField = pszNextField;
}
...
}
cleanupAfterStdFormatRecord();
fclose(pInputFP);
...
}
`record.c'
#include "record.h" /* or opaclib.h */ extern unsigned uiRecordBufferSize_g;
uiRecordBufferSize_g stores the number of bytes allocated for
pRecordBuffer_g.
4.5.3 Example See section 4.2 pRecordBuffer_g.
4.5.4 Source File `record.c'
#include "trie.h" /* or opaclib.h */ extern size_t uiTrieArrayBlockSize_g;
Trie nodes are allocated uiTrieArrayBlockSize_g nodes at
a time for efficiency.
The default value for uiTrieArrayBlockSize_g is 2000, which
minimizes the number of calls to allocateMemory, but potentially
wastes several thousand bytes of memory.
4.6.3 Example
#include "strlist.h"
#include "trie.h"
...
Trie * pLexicon = NULL;
StringList * pNewString;
...
VOIDP addStringToList(VOIDP pNew_in, VOIDP pList_in)
{
StringList * pList = pList_in;
StringList * pNew = pNew_in;
pNew->pNext = pList;
return pNew;
}
...
uiTrieArrayBlockSize_g = 63; /* less time efficient, but
more space efficient */
...
pNewString = mergeIntoStringList(NULL, "Test value");
pLexicon = addDataToTrie(pLexicon, pNewString->pszString, pNewString,
addStringToList, 3);
`trie.c'
This chapter gives the proper usage information about each of the functions found in the OPAC function library. For each function that the library provides, this information includes which header files to include in your source to obtain prototypes and type definitions relevent to the use of that function.
#include "trie.h" /* or opaclib.h */
Trie * addDataToTrie(Trie * pTrieHead_io,
const char * pszKey_in,
void * pInfo_in,
void * (* pfLinkInfo_in)(void * pNew_in,
void * pList_io),
int iMaxTrieDepth_in);
addDataToTrie adds information to a trie, using the given
insertion key.
The arguments to addDataToTrie are as follows:
pTrieHead_io
NULL the first time
addDataToTrie is called. Each subsequent call should use the
value returned by the preceding call.
pszKey_in
pInfo_in
Trie for data storage and retrieval.
pfLinkInfo_in
pTrieInfo
field of the leaf Trie data structure found or created for this
key. The function has two arguments:
pNew_in
pList_io
Trie node
(Trieinfo), or is NULL.
pTrieInfo.
iMaxTrieDepth_in
a pointer to the head of the modified trie
5.1.4 Example
#include <stdio.h>
#include <string.h>
#include "trie.h"
#include "rpterror.h"
#include "allocmem.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next item */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
unsigned long uiLexiconCount_g;
static char szWhitespace_m[7] = " \t\r\n\f\v";
...
static void * add_lex_item(void * pNew_in, void * pList_in)
{
LexItem * pLex;
/*
* be a little paranoid
*/
if (pNew_in == NULL)
return pList_in;
/*
* link the list of items that start out the same
*/
((LexItem *)pNew_in)->pLink = (LexItem *)pList_in;
/*
* link the list of homographs
*/
for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink )
{
if (strcmp(((LexItem *)pNew_in)->pszForm, pLex->pszForm) == 0)
{
((LexItem *)pNew_in)->pNext = pLex;
break;
}
}
return pNew_in;
}
void load_lexicon(char * pszLexiconFile_in)
{
FILE * pLexiconFP;
char szBuffer[512];
char * pszForm;
char * pszGloss;
char * pszCategory;
LexItem * pLexItem;
if (pszLexiconFile_in == NULL)
{
reportError(WARNING_MSG, "Missing input lexicon filename\n");
return;
}
pLexiconFP = fopen(pszLexiconFile_in, "r");
if (pLexiconFP == NULL)
{
reportError(WARNING_MSG, "Cannot open lexicon file %s for input\n",
pszLexiconFile_in);
return;
}
while (fgets(szBuffer, 512, pLexiconFP) != NULL)
{
pszForm = strtok(szBuffer, szWhitespace_m);
pszGloss = strtok(NULL, szWhitespace_m);
pszCategory = strtok(NULL, szWhitespace_m);
if ( (pszForm == NULL) ||
(pszGloss == NULL) ||
(pszCategory == NULL) )
continue;
pLexItem = (LexItem *)allocateMemory((unsigned)sizeof(LexItem));
pLexItem->pLink = NULL;
pLexItem->pNext = NULL;
pLexItem->pszForm = duplicateString(pszForm);
pLexItem->pszGloss = duplicateString(pszGloss);
pLexItem->uiCategory = index_lexical_category(pszCategory);
pLexicon_g = addDataToTrie(pLexicon_g, pszForm, pLexItem,
add_lex_item, 3);
++uiLexiconCount_g;
}
fclose(pLexiconFP);
}
`trie.c'
#include "textctl.h" /* or template.h or opaclib.h */
void addLowerUpperWFChars(char * pszLUPairs_in,
TextControl * pTextCtl_io);
addLowerUpperWFChars scans the input string for character pairs.
The first member of each pair is added to the set of (multibyte)
lowercase alphabetic characters, and the second member is added to the
set of (multibyte) uppercase alphabetic characters. Note that there may
be a many-to-many mapping between lowercase and uppercase characters.
The arguments to addLowerUpperWFChars are as follows:
pszLUPairs_in
pTextCtl_io
none
5.2.4 Example
#include "textctl.h"
...
TextControl sTextInputCtl_m;
...
void set_alphabetic(pszField_in)
char * pszField_in;
{
int code;
char * psz;
psz = pszField_in;
code = *psz++;
switch (code)
{
case 'A': /* alphabetic (word formation) characters */
addWordFormationChars(psz, &sTextInputCtl_m);
break;
case 'L': /* lower-upper word formation characters */
addLowerUpperWFChars(psz, &sTextInputCtl_m);
break;
case 'a': /* multibyte alphabetic (word formation) characters */
addWordFormationCharStrings(psz, &sTextInputCtl_m);
break;
case 'l': /* multibyte lower-upper word formation characters */
addLowerUpperWFCharStrings(psz, &sTextInputCtl_m);
break;
default:
break;
}
}
void reset_alphabetic()
{
resetWordFormationChars(&sTextInputCtl_m);
}
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
void addLowerUpperWFCharStrings(char * pszLUPairs_in,
TextControl * pTextCtl_io);
addLowerUpperWFCharStrings scans the input string for pairs of
multibyte characters. The first member of each pair is added to the set
of multibyte lowercase alphabetic characters, and the second member is
added to the set of multibyte uppercase alphabetic characters. Note that
there may be a many-to-many mapping between lowercase and uppercase
multibyte characters.
The arguments to addLowerUpperWFChars are as follows:
pszLUPairs_in
pTextCtl_io
none
5.3.4 Example See section 5.2 addLowerUpperWFChars.
5.3.5 Source File `myctype.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
StringClass * addStringClass(char * pszField_in,
StringClass * pClasses_io);
addStringClass adds a string class to the list of string
classes. String classes are used in string environments such as those
in the consistent change notation supported by the OPAC function
library.
The arguments to addStringClass are as follows:
pszField_in
pClasses_io
NULL the
first time addStringClass is called. Each subsequent call
should use the value returned by the preceding call.
a pointer to the head of the updated list of string classes
5.4.4 Example
#include "change.h" /* includes strclass.h */
...
static Change * pChanges_m = NULL;
static StringClass * pClasses_m = NULL;
...
void store_change_info(pszField_in)
char * pszField_in;
{
Change * pChg;
char * psz;
int code;
if (pszField_in == NULL)
return;
psz = pszField_in;
code = *psz++; /* grab the table code */
switch (code)
{
case 'C': /* change */
pChg = parseChangeString( psz, pClasses_m );
if (pChg != (Change *)NULL)
{
pChg->pNext = pChanges_m;
pChanges_m = pChg;
}
break;
case 'S': /* string class */
pClasses_m = addStringClass( psz, pClasses_m );
break;
default:
break;
}
}
`strcla.c'
#include "strlist.h"
StringList * addToStringList(StringList * pList_in,
const char * pszString_in);
addToStringList adds a string to the beginning of a list of
strings. It does not check whether the string is already in the list.
The arguments to addToStringList are as follows:
pList_in
NULL to signal an empty list.
pszString_in
NUL-terminated character string.
a pointer to the revised list
5.5.4 Example
#include "strlist.h"
...
StringList * pStrings = NULL;
...
/* pStrings-->NULL */
pStrings = addToStringList(pStrings, "this");
/* pStrings-->"this"-->NULL */
pStrings = addToStringList(pStrings, "test");
/* pStrings-->"test"-->"this"-->NULL */
pStrings = addToStringList(pStrings, "is");
/* pStrings-->"is"-->"test"-->"this"-->NULL */
pStrings = addToStringList(pStrings, "a");
/* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */
pStrings = addToStringList(pStrings, "test");
/* pStrings-->"test"-->"a"-->"is"-->"test"-->"this"-->NULL */
`add_sl.c'
#include "textctl.h" /* or template.h or opaclib.h */
void addWordFormationChars(char * pszLetters_in,
TextControl * pTextCtl_io);
addWordFormationChars scans the input string for non-whitespace
characters. Each such character is added to the set of alphabetic
characters that do not have a lowercase/UPPERCASE distinction. (An
English example would be the apostrophe character.)
The arguments to addWordFormationChars are as follows:
pszLetters_in
pTextCtl_io
none
5.6.4 Example See section 5.2 addLowerUpperWFChars.
5.6.5 Source File `myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
void addWordFormationCharStrings(char * pszLetters_in,
TextControl * pTextCtl_io);
addWordFormationCharStrings scans the input string for multibyte
characters. Each such multibyte character sequence is added to the set
of multibyte caseless alphabetic characters.
The arguments to addWordFormationCharStrings are as follows:
pszLetters_in
pTextCtl_io
none
5.7.4 Example See section 5.2 addLowerUpperWFChars.
5.7.5 Source File `myctype.c'
#include "allocmem.h" /* or opaclib.h */ void * allocMemory(size_t uiSize_in);
allocMemory provides a "safe" interface to malloc. If
the requested memory cannot be allocated, the function pointed to by
pfOutOfMemory_g is called. If pfOutOfMemory_g is
NULL, then the default behavior is to display an error message
incorporating the string stored in szOutOfMemoryMarker_g and
abort the program.
It is assumed that allocMemory always returns a good value.
This implies that any function pointed to by pfOutOfMemory_g
either aborts the program or uses longjmp to escape to a safe
place in the program.
allocMemory has a single argument:
uiSize_in
a pointer to the beginning of the memory area allocated
5.8.4 Example
#include "allocmem.h" ... char * p; ... p = allocMemory(75);
`allocmem.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
char * applyChanges(const char * pszString_in,
const Change * pChangeList_in);
applyChanges applies a list of consistent changes to a string.
The function steps through the list of changes, applying each change as
often as necessary before trying the next change in the list. The
input string is not changed; rather, a copy is created, modified, and
returned.
The arguments to applyChanges are as follows:
pszString_in
pChangeList_in
a pointer to a dynamically allocated and (possibly) changed string
5.9.4 Example
#include "change.h"
...
Change * pChanges_m;
...
char * pszChanged;
...
pszChanged = applyChanges("this is a test", pChanges_m);
...
freeMemory( pszChanged );
`change.c'
#include "opaclib.h"
char * buildAdjustedFilename(const char * pszFilename_in,
const char * pszBasePathname_in,
const char * pszExtension_in);
buildAdjustedFilename builds a filename from the pieces given.
If the base pathname contains directory information, and the input
filename is not an absolute pathname, the leading directory information
is added to the output filename. If the extension is given, and the
input filename does not have an extension, the extension is added to
the output filename if the file cannot be opened for input without it.
The arguments to buildAdjustedFilename are as follows:
pszFilename_in
pszBasePathname_in
NULL.
pszExtension_in
NULL.
a pointer to a dynamically allocated filename string
5.10.4 Example
#include "opaclib.h"
...
int readControlFile(char * pszControlFile_in)
{
char * pszIncludeFile;
char szBuffer[512];
FILE * pControlFP;
char * p;
pControlFP = fopen(pszControlFile_in, "r");
if (pControlFP == NULL)
return 0;
while (fgets(szBuffer, 512, pControlFP) != NULL)
{
p = szBuffer + strlen(szBuffer) - 1;
if (*p == '\n')
*p = '\0';
if (strncmp(szBuffer, "\\include", 8) == 0)
{
pszIncludeFile = szBuffer + 8;
pszIncludeFile += strspn(pszIncludeFile, " \t\r\n\f");
if (*pszIncludeFile == '\0')
continue;
pszIncludeFile = buildAdjustedFilename(pszIncludeFile,
pszControlFile_in,
".ctl");
readControlFile(pszIncludeFile);
freeMemory(pszIncludeFile);
}
...
}
fclose(pControlFP);
return 1;
}
`adjfname.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ char * buildChangeString(const Change * pChange_in);
buildChangeString builds a textual representation of the given
consistent change data structure.
buildChangeString has one argument:
pChange_in
pNext
field of the Change data structure is ignored.)
a pointer to a dynamically allocated string representing the change, or
NULL if an error occurs
5.11.4 Example
#include "change.h"
...
void displayChangeList(Change * pChanges_in)
{
Change * pChange;
char * pszChange;
for ( pChange = pChanges_in ; pChange ; pChange = pChange->pNext )
{
pszChange = buildChangeString( pChange );
fprintf(stderr, "%s\n", pszChange);
freeMemory( pszChange );
}
}
`change.c'
#include <stdio.h>
#include "opaclib.h"
void checkFileError(FILE * pOutputFP_in,
const char * pszProcessName_in,
const char * pszFilename_in);
checkFileError checks for an error in the output file
pOutputFP_in whose name is given by pszFilename_in. If
an error occurred, the output file is deleted and the program exits
with an error message.
The arguments to checkFileError are as follows:
pOutputFP_in
pszProcessName_in
pszFilename_in
none
5.12.4 Example
#include <stdio.h> #include "cportlib.h" ... FILE * fp; char filename[100]; ... checkFileError(fp, "Program Name", filename); fclose(fp);
`fulldisk.c'
#include "record.h" /* or opaclib.h */ void cleanupAfterStdFormatRecord(void);
cleanupAfterStdFormatRecord frees any memory allocated for
readStdFormatRecord.
cleanupAfterStdFormatRecord does not have any arguments.
5.13.3 Return Value none
5.13.4 Example
#include <stdio.h>
#include "record.h"
static CodeTable sLexTable_m = { "\\w\0W\0\\c\0C\\f\0F\\g\0G\0",
4, "\\w" };
...
int load_lexicon(pszLexiconFile_in, cComment_in)
char * pszLexiconFile_in;
int cComment_in;
{
FILE * fp;
unsigned uiRecordCount = 0;
char * pRecord;
/*
* open the lexicon file
*/
if (pszLexiconFile_in == NULL)
return( 0 );
fp = fopen(pszLexiconFile_in, "r");
if (fp == (FILE *)NULL)
return( 0 );
/*
* load all the records from the lexicon file
*/
uiRecordCount = 0;
while ((pRecord = readStdFormatRecord(fp,
&sLexTable_m,
cComment_in,
&uiRecordCount)) != NULL)
{
...
}
/*
* close the lexicon file and erase the temporary data structures
*/
fclose(fp);
cleanupAfterStdFormatRecord();
return( 1 );
}
`record.c'
#include "textctl.h" /* or template.h or opaclib.h */
const unsigned char * convLowerToUpper(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
convLowerToUpper checks whether the input string begins with a
multibyte lowercase character. If so, it returns the (first)
corresponding multibyte uppercase character.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars and addLowerUpperWFCharStrings are
implicitly called by loadIntxCtlFile and loadOutxCtlFile.)
The arguments to convLowerToUpper are as follows:
pszString_in
NUL-terminated character string.
pTextCtl_in
a pointer to a NUL-terminated string containing the (primary)
corresponding multibyte uppercase character, or NULL if the input
string does not begin with a multibyte lowercase character. This may
point to a static buffer that may be overwritten by the next call to
convLowerToUpper.
5.14.4 Example
#include "textctl.h"
...
static TextControl sTextCtl_m;
static StringClass * pStringClasses_m;
static char szOutxFilename_m[100];
...
loadOutxCtlFile(szOutxFilename_m, ';', &sTextCtl_m, &pStringClasses_m);
...
unsigned char * upcaseString(unsigned char * pszString_in)
{
size_t iCharSize;
size_t iUCSize;
size_t iUpperLength;
unsigned char * p;
unsigned char * pUC;
unsigned char * pszUpper;
unsigned char * q;
if (pszString_in == NULL)
return NULL;
for ( p = pszString_in ; *p ; p += iCharSize )
{
if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0)
iCharSize = 1;
pUC = convLowerToUpper(p, &sTextCtl_m);
if (pUC != NULL)
iUpperLength += strlen((char *)pUC);
else
iUpperLength += iCharSize;
}
pszUpper = allocMemory(iUpperLength + 1);
for ( p = pszString_in, q = pszUpper ; *p ; p += iCharSize )
{
if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0)
iCharSize = 1;
pUC = convLowerToUpper(p, &sTextCtl_m);
if (pUC != NULL)
{
iUCSize = strlen((char *)pUC);
memcpy(q, pUC, iUCSize);
q += iUCSize;
}
else
{
memcpy(q, p, iCharSize);
q += iCharSize;
}
}
pszUpper[iUpperLength] = NUL;
return pszUpper;
}
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
const StringList * convLowerToUpperSet(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
convLowerToUpperSet checks whether the input string begins with a
multibyte lowercase character. If so, it returns the complete set of
corresponding multibyte uppercase characters.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars and addLowerUpperWFCharStrings are
implicitly called by loadIntxCtlFile and loadOutxCtlFile.)
The arguments to convLowerToUpperSet are as follows:
pszString_in
NUL-terminated character string.
pTextCtl_in
a pointer to a list of NUL-terminated strings containing the
corresponding multibyte uppercase characters, or NULL if the input
string does not begin with a multibyte lowercase character. This may
point to a static buffer that may be overwritten by the next call to
convLowerToUpperSet.
5.15.4 Example #include "textctl.h" #include "rpterror.h" ... StringList * upcaseWord(pszWord_in, pTextCtl_in) char * pszWord_in; const TextControl * pTextCtl_in; { size_t uiCharCount; size_t uiLowerCount; size_t uiNumberAlternatives; size_t uiSpan; size_t uiWordLength; size_t k; int iLength; unsigned char * p; StringList * pUpcaseList = NULL; const StringList * pUpperSet; const StringList * ps; /* * count the number of multibyte characters in the string * count the lowercase letters * calculate the number of (ambiguous) upcase conversions * calculate the maximum length of the upcased word */ uiCharCount = 0; uiLowerCount = 0; uiNumberAlternatives = 1; uiWordLength = 1; /* count the terminating NUL byte */ for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength ) { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; ++uiCharCount; if (matchLowercaseChar(p, pTextCtl_in) != 0) { ++uiLowerCount; pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNumberAlternatives *= getStringListSize( pUpperSet ); uiSpan = 0; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { k = strlen( ps->pszString ); if (k > uiSpan) uiSpan = k; } } else uiSpan = iLength; uiWordLength += uiSpan; } if (uiLowerCount == 0) { /* * the word is already all uppercase */ return addToStringList(NULL, pszWord_in); } else { /* * convert word to all uppercase (possibly ambiguosly) */ char * pszCapWord; char * pszUpper; size_t uiNum; int iUpperLength; size_t i; size_t j;
if (uiNumberAlternatives < 1) { reportError(ERROR_MSG, "error getting uppercase equivalents for \"%s\"\n", pszWord_in); return NULL; } if (uiNumberAlternatives > 500) { reportError(WARNING_MSG, "%lu uppercase equivalents is too many: storing only 500\n", uiNumberAlternatives); uiNumberAlternatives = 500; } pszCapWord = allocMemory(uiWordLength); for ( i = 0 ; i < uiNumberAlternatives ; ++i ) { strcpy(pszCapWord, pszWord_in); uiSpan = 1; j = 0; for ( p = (unsigned char *)pszCapWord ; *p ; p += iLength ) { iLength = matchLowercaseChar(p, pTextCtl_in); if (iLength != 0) { pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNum = getStringListSize(pUpperSet); pszUpper = pUpperSet->pszString; if (uiNum > 1) { k = (i / uiSpan) % uiNum; uiSpan *= uiNum; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { if (k == 0) { pszUpper = ps->pszString; break; } --k; } } /* * replace the lowercase multibyte character with an * equivalent uppercase multibyte character */ iUpperLength = strlen(pszUpper); if (iUpperLength != iLength) memmove(p + iUpperLength, p + iLength, strlen((char *)p + iLength) + 1); memcpy(p, pszUpper, iUpperLength); iLength = iUpperLength; } else { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } ++j; } pUpcaseList = addToStringList(pUpcaseList, pszCapWord); } freeMemory( pszCapWord ); } return pUpcaseList; }
5.15.5 Source File `myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
const unsigned char * convUpperToLower(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
convUpperToLower checks whether the input string begins with a
multibyte uppercase character. If so, it returns the (first)
corresponding multibyte lowercase character.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars and addLowerUpperWFCharStrings are
implicitly called by loadIntxCtlFile and loadOutxCtlFile.)
The arguments to convUpperToLower are as follows:
pszString_in
NUL-terminated character string.
pTextCtl_in
a pointer to a NUL-terminated string containing the (primary)
corresponding multibyte lowercase character, or NULL if the input
string does not begin with a multibyte uppercase character. This may
point to a static buffer that may be overwritten by the next call to
convUpperToLower.
5.16.4 Example
#include "textctl.h"
...
static TextControl sTextCtl_m;
static StringClass * pStringClasses_m;
static char szIntxFilename_m[100];
...
loadIntxCtlFile(szIntxFilename_m, ';', &sTextCtl_m, &pStringClasses_m);
...
unsigned char * downcaseString(unsigned char * pszString_in)
{
size_t iCharSize;
size_t iLCSize;
size_t iLowerLength;
unsigned char * p;
unsigned char * pLC;
unsigned char * pszLower;
unsigned char * q;
if (pszString_in == NULL)
return NULL;
for ( p = pszString_in ; *p ; p += iCharSize )
{
if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0)
iCharSize = 1;
pLC = convUpperToLower(p, &sTextCtl_m);
if (pLC != NULL)
iLowerLength += strlen((char *)pLC);
else
iLowerLength += iCharSize;
}
pszLower = allocMemory(iLowerLength + 1);
for ( p = pszString_in, q = pszLower ; *p ; p += iCharSize )
{
if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0)
iCharSize = 1;
pLC = convUpperToLower(p, &sTextCtl_m);
if (pLC != NULL)
{
iLCSize = strlen((char *)pLC);
memcpy(q, pLC, iLCSize);
q += iLCSize;
}
else
{
memcpy(q, p, iCharSize);
q += iCharSize;
}
}
pszLower[iLowerLength] = NUL;
return pszLower;
}
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
const StringList * convUpperToLowerSet(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
convUpperToLowerSet checks whether the input string begins with a
multibyte uppercase character. If so, it returns the complete set of
corresponding multibyte lowercase characters.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars and addLowerUpperWFCharStrings are
implicitly called by loadIntxCtlFile and loadOutxCtlFile.)
The arguments to convUpperToLowerSet are as follows:
pszString_in
NUL-terminated character string.
pTextCtl_in
a pointer to a list of NUL-terminated strings containing the
corresponding multibyte lowercase characters, or NULL if the input
string does not begin with a multibyte uppercase character. This may
point to a static buffer that may be overwritten by the next call to
convUpperToLowerSet.
5.17.4 Example
#include "textctl.h"
#include "rpterror.h"
...
StringList * downcaseWord(pszWord_in, pTextCtl_in)
char * pszWord_in;
const TextControl * pTextCtl_in;
{
size_t uiCharCount;
size_t uiUpperCount;
size_t uiNumberAlternatives;
size_t uiSpan;
size_t uiWordLength;
size_t k;
int iLength;
unsigned char * p;
StringList * pDowncaseList = NULL;
const StringList * pLowerSet;
const StringList * ps;
/*
* count the number of multibyte characters in the string
* count the uppercase letters
* calculate the number of (ambiguous) downcase conversions
* calculate the maximum length of the downcased word
*/
uiCharCount = 0;
uiUpperCount = 0;
uiNumberAlternatives = 1;
uiWordLength = 1; /* count the terminating NUL byte */
for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength )
{
iLength = matchAlphaChar(p, pTextCtl_in);
if (iLength == 0)
iLength = 1;
++uiCharCount;
if (matchUppercaseChar(p, pTextCtl_in) != 0)
{
++uiUpperCount;
pLowerSet = convUpperToLowerSet(p, pTextCtl_in);
uiNumberAlternatives *= getStringListSize( pLowerSet );
uiSpan = 0;
for ( ps = pLowerSet ; ps ; ps = ps->pNext )
{
k = strlen( ps->pszString );
if (k > uiSpan)
uiSpan = k;
}
}
else
uiSpan = iLength;
uiWordLength += uiSpan;
}
if (uiUpperCount == 0)
{
/*
* the word is already all lowercase
*/
return addToStringList(NULL, pszWord_in);
}
else
{
/*
* convert word to all lowercase (possibly ambiguosly)
*/
char * pszDecapWord;
char * pszLower;
size_t uiNum;
int iLowerLength;
size_t i;
size_t j;
if (uiNumberAlternatives < 1)
{
reportError(ERROR_MSG,
"error getting lowercase equivalents for \"%s\"\n",
pszWord_in);
return NULL;
}
if (uiNumberAlternatives > 500)
{
reportError(WARNING_MSG,
"%lu lowercase equivalents is too many: storing only 500\n",
uiNumberAlternatives);
uiNumberAlternatives = 500;
}
pszDecapWord = allocMemory(uiWordLength);
for ( i = 0 ; i < uiNumberAlternatives ; ++i )
{
strcpy(pszDecapWord, pszWord_in);
uiSpan = 1;
j = 0;
for ( p = (unsigned char *)pszDecapWord ; *p ; p += iLength )
{
iLength = matchUppercaseChar(p, pTextCtl_in);
if (iLength != 0)
{
pLowerSet = convUpperToLowerSet(p, pTextCtl_in);
uiNum = getStringListSize(pLowerSet);
pszLower = pLowerSet->pszString;
if (uiNum > 1)
{
k = (i / uiSpan) % uiNum;
uiSpan *= uiNum;
for ( ps = pLowerSet ; ps ; ps = ps->pNext )
{
if (k == 0)
{
pszLower = ps->pszString;
break;
}
--k;
}
}
/*
* replace the uppercase multibyte character with an
* equivalent lowercase multibyte character
*/
iLowerLength = strlen(pszLower);
if (iLowerLength != iLength)
memmove(p + iLowerLength,
p + iLength,
strlen((char *)p + iLength) + 1);
memcpy(p, pszLower, iLowerLength);
iLength = iLowerLength;
}
else
{
iLength = matchAlphaChar(p, pTextCtl_in);
if (iLength == 0)
iLength = 1;
}
++j;
}
pDowncaseList = addToStringList(pDowncaseList, pszDecapWord);
}
freeMemory( pszDecapWord );
}
return pDowncaseList;
}
`myctype.c'
#include "template.h" /* or opaclib.h */
int decapitalizeWord(WordTemplate * pWord_io,
const TextControl * pTextCtl_in);
int (pWord_io, pTextCtl_in) WordTemplate * pWord_io; /* pointer to WordTemplate structure TextControl * pTextCtl_in;
decapitalizeWord converts the input word to all lowercase (possibly
ambiguously) and returns a capitalization flag:
0 (NOCAP)
1 (INITCAP)
2 (ALLCAP)
>4
After the conversion to all lowercase, any orthography changes stored
in pTextCtl_in are applied.
The arguments to decapitalizeWord are as follows:
pWord_io
pTextCtl_in
the capitalization flag for the word
5.18.4 Example
#include "template.h" /* includes textctl.h */
...
WordTemplate * buildTemplate(
char * pszWord_in,
TextControl * pTextCtl_in)
{
WordTemplate * pTemplate;
if (pszWord_in == NULL)
return NULL;
pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate));
pTemplate->pszOrigWord = duplicateString( pszWord_in );
pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in);
return pTemplate;
}
`textin.c'
#include "rpterror.h" /* or opaclib.h */
void displayNumberedMessage(const NumberedMessage * pMessage_in,
int bSilent_in,
int bShowWarnings_in,
FILE * pLogFP_in,
const char * pszFilename_in,
unsigned uiLineNumber_in,
...);
displayNumberedMessage writes a numbered error or warning
message to the standard error output (screen), optionally writing it to
a log file as well. For GUI programs, the programmer must write a
different version of displayNumberedMessage to satisfy the link
requirements of other functions in the OPAC library. This would
typically display a message box or write to a message window.
The arguments to displayNumberedMessage are as follows:
pMessage_in
NumberedMessage data structure that contains the
message type, the message number, and the format string for the
message.
bSilent_in
TRUE (nonzero).
bShowWarnings_in
TRUE (nonzero).
pLogFP_in
FILE pointer to an open log file, or is NULL.
pszFilename_in
NULL.
uiLineNumber_in
0).
...
printf style format string given by pMessage_in.
none
5.19.4 Example
#include <stdio.h>
#include "opaclib.h" /* includes rpterror.h */
...
int bSilent_g = 0;
int bShowWarnings_g = 1;
FILE * pLogFP_g = NULL;
...
static NumberedMessage sCannotOpen_m = { ERROR_MSG, 100,
"Cannot open %s file %s" };
static NumberedMessage sIgnoreRedundant_m = { WARNING_MSG, 101,
"Ignoring all but first \\%s line" };
static char * aszCodes_m[] = {
"\\lexicon",
"\\grammar",
...
NULL
};
...
FILE * pControlFP;
char * pszControlFile;
unsigned uiLineNumber;
char * pszLexFile;
char ** ppszField;
char * p;
unsigned i;
...
pControlFP = fopen(pszControlFile, "r");
if (pControlFP == (FILE *)NULL)
{
displayNumberedMessage(&sCannotOpen_m,
bSilent_g, bShowWarnings_g, pLogFP_g,
NULL, 0,
"log", pszControlFile);
exit(1);
}
uiLineNumber = 1;
while ((ppszField = readStdFormatField(pControlFP,
aszCodes_m, NUL)) != NULL)
{
switch (**ppszField)
{
case 1: /* "\\lexicon" */
if (pszLexFile != (char *)NULL)
displayNumberedMessage(&sIgnoreRedundant_m,
bSilent_g, bShowWarnings_g,
pLogFP_g,
pszControlFile, uiLineNumber,
"lexicon");
else
{
p = strtok(ppszField[0]+1, " \t\r\n\f\v");
pszLexFile = buildAdjustedFilename(p,
pszControlFile,
".lex");
}
break;
...
}
...
for ( i = 0 ; ppszField[i] ; ++i )
++uiLineNumber;
}
...
`textin.c'
#include "allocmem.h" /* or opaclib.h */ char * duplicateString(const char * pszString_in);
duplicateString creates a copy of an existing NUL-terminated
character string. It calls allocateMemory to get the memory to
store the copy of the string. If pszString_in is NULL,
then duplicateString returns NULL.
This is the same as the standard function strdup, except that it
calls allocateMemory instead of malloc.
duplicateString has one argument:
pszString_in
NUL-terminated character string.
a pointer to the newly allocated and copied duplicate string
5.20.4 Example
#include "template.h" /* includes textctl.h */
...
WordTemplate * buildTemplate(
char * pszWord_in,
TextControl * pTextCtl_in)
{
WordTemplate * pTemplate;
if (pszWord_in == NULL)
return NULL;
pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate));
pTemplate->pszOrigWord = duplicateString( pszWord_in );
pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in);
return pTemplate;
}
`allocmem.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * duplicateStringList(const StringList * pList_in);
duplicateStringList copies a list of strings to create another,
identical list of strings. If pList_in is NULL, then
duplicateStringList returns NULL.
duplicateStringList has one argument:
pList_io
a pointer to the new list of dynamically allocated strings
5.21.4 Example
#include "strlist.h" ... StringList * pList1; StringList * pList2; ... pList2 = duplicateStringList(pList1); ... freeStringList( pList2 ); pList2 = NULL;
`copy_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
int equivalentStringLists(const StringList * pFirst_in,
const StringList * pSecond_in);
equivalentStringLists tests whether or not two string lists
contain the same strings. The strings do not have to be in the same
order in the two lists. Duplicate strings in either list are
immaterial.
The arguments to equivalentStringLists are as follows:
pFirst_in
pSecond_in
nonzero (TRUE) if the lists are equal, otherwise zero (FALSE)
5.22.4 Example
#include "strlist.h"
...
StringList * pList1;
StringList * pList2;
...
if (equivalentStringLists(pList1, pList2))
{
...
}
`equiv_sl.c'
#include "opaclib.h"
char * eraseCharsInString(char * pszString_io,
const char * pszEraseChars_in);
eraseCharsInString erases any characters from
pszEraseChars_in that are found in pszString_io, possibly
shortening pszString_io as a side-effect.
The arguments to eraseCharsInString are as follows:
pszString_io
pszEraseChars_in
a pointer to the possibly modified string
5.23.4 Example
#include "opaclib.h" /* includes allocmem.h */
...
static char szMarkers_m[] = "-=#";
...
static int get_score(pszMarkedWord_in)
const char * pszMarkedWord_in;
{
char * pszWord;
int iScore = 0;
if (pszMarkedWord_in != NULL)
{
pszWord = eraseCharsInString(duplicateString(pszMarkedWord_in),
szMarkers_m);
...
freeMemory(pszWord);
}
return iScore;
}
`erasecha.c'
#include "trie.h" /* or opaclib.h */
void eraseTrie(Trie * pTrieHead_io,
void (* pfEraseInfo_in)(void * pList_io));
eraseTrie walks through a trie, freeing all the memory allocated
for the trie and for the information it stores.
The arguments to eraseTrie are as follows:
pTrieHead_io
pfEraseInfo_in
pList_io
none
5.24.4 Example
#include "trie.h"
#include "allocmem.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next element */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
unsigned long uiLexiconCount_g;
...
static void erase_lex_item(void * pList)
{
LexItem * pItem;
LexItem * pNextItem;
for ( pItem = (LexItem *)pList ; pItem ; pItem = pNextItem )
{
pNextItem = pItem->pLink;
if (pItem->pszForm != NULL)
freeMemory(pItem->pszForm);
if (pItem->pszGloss != NULL)
freeMemory(pItem->pszGloss);
freeMemory(pItem);
}
}
void free_lexicon()
{
if (pLexicon_g != NULL)
{
eraseTrie(pLexicon_g, erase_lex_item);
pLexicon_g = NULL;
}
uiLexiconCount_g = 0L;
}
`trie.c'
#include "opaclib.h" int exitSafely(int iCode_in);
exitSafely replaces exit. When compiled for Microsoft
Windows, the program should define exitSafely to not call
exit because Windows doesn't like that very much!
exitSafely has one argument:
iCode_in
none, but it must be defined as returning int to keep everyone happy
5.25.4 Example
#include <stdlib.h>
#include "opaclib.h"
...
char * pszCopy;
...
pszCopy = strdup("This is a test!");
if (pszCopy == NULL)
{
...
exitSafely(2);
}
`safeexit.c'
#include "opaclib.h"
void fcloseWithErrorCheck(FILE * pOutputFP_in,
const char * pszFilename_in);
fcloseWithErrorCheck checks for the output file for write
errors, and closes it. If an error is detected, it is reported using
reportError.
The arguments to fcloseWithErrorCheck are as follows:
pOutputFP_in
pszFilename_in
none
5.26.4 Example
#include <stdio.h>
#include "opaclib.h"
...
FILE * pOutput;
char * pszFilename;
...
pOutput = fopen(pszFilename, "w");
if (pOutput != NULL)
{
...
fcloseWithErrorCheck(pOutput, pszFilename);
pOutput = NULL;
}
`errcheck.c'
#include "trie.h" /* or opaclib.h */
void * findDataInTrie(const Trie * pTrieHead_in,
const char * pszKey_in);
findDataInTrie searches the trie for information stored using
the key for access. The pointer returned is not guaranteed to point to
only desired information unless the length of the key is less than the
maximum depth of the trie. You may need to scan over the list (or
array) to get exactly what you want.
The arguments to findDataInTrie are as follows:
pTrieHead_in
pszKey_in
a pointer to the generic information found in the trie, or NULL if
the search fails
5.27.4 Example
#include "trie.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next element */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
...
LexItem * find_entries(unsigned char * pszWord_in)
{
LexItem * pLex;
for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ;
pLex ;
pLex = pLex->pLink )
{
if (strcmp(pLex->pszForm, pszWord_in) == 0)
{
/*
* since add_lex_item() links the homographs together,
* this points to a list containing only the homographs
*/
return pLex;
}
}
return NULL;
}
`trie.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
StringClass * findStringClass(const char * pszName_in,
const StringClass * pClasses_in);
findStringClass searches a list of string classes for a specific
string class by name.
The arguments to findStringClass are as follows:
pszName_in
pClasses_in
a pointer to the string class found, or NULL if not found
5.28.4 Example
#include "strclass.h"
#include "rpterror.h"
...
static StringClass * pClasses_m = NULL;
...
StringClass * pClass;
char * pszClassName;
...
pClass = findStringClass( pszClassName, pClasses_m);
if (pClass == NULL)
reportError(WARNING_MSG, "Undefined class %s\n", pszName);
...
`strcla.c'
#include "allocmem.h" /* or opaclib.h */ char * fitAllocStringExactly(char * pszString_in);
fitAllocStringExactly shrinks the allocated buffer to exactly
fit the string. The program is aborted with an error message if it
somehow runs out of memory.
(See section 5.8 allocMemory,
for details about this error message.)
fitAllocStringExactly has one argument:
pszString_in
a pointer to the (possibly) reallocated block
5.29.4 Example
#include <stdio.h>
#include "allocmem.h"
...
char * read_line(FILE * pInputFP_in)
{
char * pszBuffer;
size_t uiBufferSize = 500;
size_t uiLineLength;
if ((pInputFP_in == NULL) || feof(pInputFP_in))
return NULL;
pszBuffer = allocMemory(uiBufferSize);
if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL)
{
freeMemory(pszBuffer);
return NULL;
}
while (strchr(pszBuffer, '\n') == NULL)
{
uiBufferSize += 500;
pszBuffer = reallocMemory(pszBuffer, uiBufferSize);
uiLineLength = strlen(pszBuffer);
if (fgets(pszBuffer + uiLineLength,
uiBufferSize - uiLineLength, pInputFP_in) == NULL)
break;
}
return fitAllocStringExactly( pszBuffer );
}
`allocmem.c'
#include "opaclib.h"
void fixSynthesizedWord(WordTemplate * pTemplate_io,
const TextControl * pTextCtl_in);
fixSynthesizedWord applies the output orthography changes and
recapitalization to the list of synthesized wordforms. The list is
updated to reflect these changes, and to minimize any ensuing
ambiguity.
The arguments to fixSynthesizedWord are as follows:
pTemplate_io
pTextCtl_in
none
5.30.4 Example
#include "template.h"
...
TextControl sTextControl_g;
...
FILE * pInputFP;
FILE * pOutputFP;
WordTemplate * pWord;
...
for (;;)
{
pWord = readTemplateFromAnalysis(pInputFP, &sTextControl_g);
if (pWord == NULL)
break;
pWord->pNewWords = synthesize_word(pWord->pAnalyses,
&sTextControl_g);
fixSynthesizedWord(pWord, &TextControl_g);
writeTextFromTemplate( pOutputFP, pWord, &sTextControl_g);
freeWordTemplate( pWord );
}
`textout.c'
#include "opaclib.h"
FILE * fopenAlways(char * pszFilename_io,
const char * pszMode_in);
fopenAlways opens a file, prompting the user if necessary and
retrying until successful. If it is not NULL,
pszFilename_io is updated to contain the name of the file
actually opened. fopenAlways uses fopen to open the
file, and repeatedly prompts the user for a filename if fopen
fails.
The buffer pointed to by pszFilename_io must be (at least)
FILENAME_MAX bytes long. If FILENAME_MAX is not defined
by `stdio.h', then it is assumed to be 128.
pszFilename_io
NULL.
pszMode_in
fopen mode string (usually "r" or
"w").
a valid FILE pointer
5.31.4 Example
#include <stdio.h> #include "opaclib.h" ... FILE * pInputFP; char szFilename[FILENAME_MAX]; ... pInputFP = fopenAlways(szFilename, "r"); ... fclose(pInputFP); pInputFP = NULL;
`ufopen.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ void freeChangeList(Change * pList_io);
freeChangeList frees the memory allocated for a list of
consistent change structures.
freeChangeList has one argument:
pList_io
none
5.32.4 Example
#include "change.h"
...
Change * pChangeList_g;
...
void add_change(char * pszChange_in)
{
Change * pTail;
if (pChangeList_g == NULL)
pChangeList_g = parseChangeString( pszChange_in );
else
{
for (pTail = pChangeList_g ; pTail->pNext ; pTail = pTail->pNext)
;
pTail->pNext = parseChangeString( pszChange_in );
}
}
...
freeChangeList( pChangeList_g );
pChangeList_g = NULL;
`change.c'
5.33.1 Syntax
#include "record.h" void freeCodeTable(CodeTable * pCodeTable_io);
freeCodeTable frees the memory allocated for a CodeTable
data structure.
freeCodeTable has only one argument:
pCodeTable_io
CodeTable data structure that contains information
that is no longer needed.
none
5.33.4 Example
#include "record.h" #include "ample.h" AmpleData sAmpleData_g; char szCodesFilename_g[100]; char szDictFilename_g[100]; ... loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE); ... loadAmpleDictionary(szDictFilename_g, PFX, &sAmpleData_g); freeCodeTable( sAmpleData_g.pPrefixTable ); sAmpleData_g.pPrefixTable = NULL;
`free_ct.c'
#include "allocmem.h" /* or opaclib.h */ void freeMemory(void * pBlock_io);
freeMemory provides a "safe" interface to free. It
ignores NULL as an argument. (But passing NULL is still
poor practice!) This is the only protection added to free:
passing random memory addresses to freeMemory, or passing the
same address twice, will result in memory corruption and program
crashes!
freeMemory has one argument:
pBlock_io
none
5.34.4 Example
#include <stdio.h>
#include "allocmem.h"
...
char * read_line(FILE * pInputFP_in)
{
char * pszBuffer;
size_t uiBufferSize = 500;
size_t uiLineLength;
if ((pInputFP_in == NULL) || feof(pInputFP_in))
return NULL;
pszBuffer = allocMemory(uiBufferSize);
if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL)
{
freeMemory(pszBuffer);
return NULL;
}
return pszBuffer;
}
`allocmem.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
void freeStringClasses(StringClass * pClasses_io);
freeStringClasses frees the memory allocated for the list of
string classes.
freeStringClasses has one argument:
pClasses_io
none
5.35.4 Example
#include "change.h" /* includes strclass.h */
...
static Change * pChanges_m;
static StringClass * pClasses_m;
...
void free_change_info()
{
freeChangeList( pChanges_m );
freeStringClasses( pClasses_m );
pChanges_m = NULL;
pClasses_m = NULL;
}
`strcla.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
void freeStringList(StringList * pList_io);
freeStringList deletes a list of strings, freeing all the memory
used by the list of strings.
freeStringList has one argument:
pList_io
none
5.36.4 Example
#include "strlist.h" ... StringList * pNames_g; ... freeStringList(pNames_g); pNames_g = NULL; ...
`free_sl.c'
#include "template.h" void freeWordAnalysisList(WordAnalysis * pAnalyses_io);
freeWordAnalysisList frees the memory allocated for a list of
WordAnalysis data structures.
freeWordAnalysisList has one argument:
pAnalyses_io
WordAnalysis data structures.
none
5.37.4 Example
#include "template.h"
...
WordTemplate * pWord;
...
if (pWord->pAnalyses != NULL)
freeWordAnalysisList(pWord->pAnalyses);
...
`wordanal.c'
#include "template.h" /* or opaclib.h */ void freeWordTemplate(WordTemplate * pWord_io);
freeWordTemplate frees everything in a WordTemplate data
structure, including the structure itself.
freeWordTemplate has one argument:
pWord_io
WordTemplate data structure to free.
none
5.38.4 Example
#include "template.h"
...
TextControl sTextCtl_g;
...
WordAnalysis * merge_analyses(
WordAnalysis * pList_in,
WordAnalysis * pAnal_in)
{
...
}
...
void process(
FILE * pInputFP_in,
FILE * pOutputFP_in)
{
WordTemplate * pWord;
WordAnalysis * pAnal;
unsigned uiAmbiguityCount;
unsigned long uiWordCount;
for ( uiWordCount = 0L ;; )
{
pWord = readTemplateFromText(pInputFP_in, &sTextCtl_g);
if (pWord == NULL)
break;
uiAmbiguityCount = 0;
if (pWord->paWord != NULL)
{
for ( i = 0 ; pWord->paWord[i] ; ++i )
{
pAnal = analyze(pWord->paWord[i]);
pWord->pAnalyses = merge_analyses(pWord->pAnalyses,
pAnal);
}
for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext)
++uiAmbiguityCount;
}
uiWordCount = showAmbiguousProgress(uiAmbiguityCount,
uiWordCount);
writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g);
freeWordTemplate(pWord);
}
}
`free_wt.c'
#include "allocmem.h" /* or opaclib.h */ unsigned long getAndClearAllocMemorySum(void);
getAndClearAllocMemorySum returns the amount of memory used by
allocMemory calls since the last call to
getAndClearAllocMemorySum. It does not account for calls to
freeMemory, which greatly reduces its accuracy.
getAndClearAllocMemorySum does not have any arguments.
5.39.3 Return Value
the number of bytes of memory requested by allocMemory calls
since the last call to getAndClearAllocMemorySum
5.39.4 Example
#include <stdio.h>
#include "allocmem.h"
...
getAndClearAllocMemorySum(); /* reset the counter */
...
p = allocMemory(500);
...
p = duplicateString("this is a test");
...
printf("%lu bytes allocated recently\n", getAndClearAllocMemorySum());
`allocmem.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
int getChangeQuote(const char * pszMatch_in,
const char * pszReplace_in);
getChangeQuote finds a suitable "quote" character that is not
used in either input string.
The arguments to getChangeQuote are as follows:
pszMatch_in
pszReplace_in
a character suitable for quoting the match and replace strings
5.40.4 Example
#include <string.h>
#include "change.h"
#include "allocmem.h"
char * composeChangeString(pszMatch_in, pszReplace_in, pszEnvir_in)
const char * pszMatch_in;
const char * pszReplace_in;
const char * pszEnvir_in;
{
char * pszChange;
size_t uiLength;
char cQuote;
if ((pszMatch_in == NULL) && (pszReplace_in == NULL))
return NULL;
if (pszMatch_in == NULL)
pszMatch_in = "";
if (pszReplace_in == NULL)
pszReplace_in = "";
uiEnvirLength = strlen( pszEnvir_in );
uiLength = strlen( pszMatch_in ) + strlen( pszReplace_in ) + 6;
if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0'))
uiLength += strlen( pszEnvir_in ) + 1;
pszChange = allocMemory(uiLength);
cQuote = getChangeQuote(pszMatch_in, pszReplace_in);
sprintf(pszChange, "%c%s%c %c%s%c",
cQuote, pszMatch_in, cQuote, cQuote, pszReplace_in, cQuote);
if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0'))
strcat(strcat(pszChange, " "), pszEnvir_in);
return pszChange;
}
`change.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
unsigned getStringListSize(const StringList * pList_in);
getStringListSize counts the number of strings stored in the
list. It does not check for duplicate strings or for NULL
string pointers, just for the total number of data structures linked
together.
getStringListSize has one argument:
pList_in
the number of strings in the list
5.41.4 Example
#include <stdio.h>
#include "strlist.h"
...
void writeAmbigWords(pList_in, cAmbig_in, pOutputFP_in)
const StringList * pList_in;
int cAmbig_in;
FILE * pOutputFP_in;
{
char szAmbig[2];
if (pList_in == NULL)
fprintf(pOutputFP_in, "%c0%c%c", cAmbig_in, cAmbig_in, cAmbig_in);
else if (pList_in->pNext)
{
fprintf(pOutputFP_in, "%c%u%c",
cAmbig_in, getStringListSize(pList_in), cAmbig_in );
szAmbig[0] = cAmbig_in;
szAmbig[1] = '\0';
writeStringList( pList_in, szAmbig, pOutputFP_in );
fprintf(pOutputFP_in, "%c", cAmbig_in);
}
else
fputs(pList_in->pszString, pOutputFP_in);
}
`size_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
int identicalStringLists(const StringList * pFirstList_in,
const StringList * pSecondList_in);
identicalStringLists checks whether or not two lists of strings
are identical, that is, whether they have the same strings in the same
order.
The arguments to identicalStringLists are as follows:
pFirstList_in
pSecondList_in
nonzero (TRUE) if the lists are identical, otherwise zero (FALSE)
5.42.4 Example
#include "strlist.h"
...
StringList * pList1;
StringList * pList2;
...
if (identicalStringLists(pList1, pList2))
{
...
}
`equal_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
int isMemberOfStringList(const StringList * pList_in,
const char * pszString_in);
isMemberOfStringList checks whether a string is stored in a list
of strings.
The arguments to isMemberOfStringList are as follows:
pList_in
pszString_in
nonzero (TRUE) if the string is found in the list, otherwise zero (FALSE)
5.43.4 Example
#include "strlist.h"
...
static StringList * pFiles_m = NULL;
...
void processFileOnce(const char * pszFile_in)
{
if ((pszFile_in != NULL) && !isMemberOfStringList(pFiles_m, pszFile_in))
{
pFiles_m = mergeIntoStringList(pFiles_m, pszFile_in);
...
}
}
`membr_sl.c'
#include "opaclib.h" char * isolateWord(char * pszLine_io);
isolateWord isolates the "word" pointed to by its argument by
replacing the first whitespace character following the word with a NUL
character. It then steps the pointer to the beginning of the next
"word" in the input string.
isolateWord skips over any leading whitespace in the input string
before trying to isolate a "word".
isolateWord has one argument:
pszLine_io
NUL-terminated character string.
a pointer to the first character of the next following word, which may be
the NUL character at the end of the input string
5.44.4 Example
#include <string.h>
#include "opaclib.h" /* includes strlist.h */
...
StringList * pTraceMorphs_m = NULL;
...
void addTraceMorphs(char * pszLine_in)
{
char * pszMorph;
char * pszEnd;
if (pszLine_in == NULL)
return;
for ( pszMorph = pszLine_in + strspn(pszLine_in, " \r\n\t\f\v");
*pszMorph_in ;
pszMorph = pszEnd )
{
pszEnd = isolateWord( pszMorph ); /* isolate the morpheme */
if (strcmp(pszMorph, "0") == 0) /* If 0, put in NUL */
*pszMorph = NUL;
pTraceMorphs_m = mergeIntoStringList(pTraceMorphs_m, pszMorph);
}
}
`isolatew.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
int isStringClassMember(const char * pszString_in,
const StringClass * pClass_in);
isStringClassMember searches a string class for a specific
string.
The arguments to isStringClassMember are as follows:
pszString_in
pClass_in
nonzero (TRUE) if the string is found in the class, otherwise zero (FALSE)
5.45.4 Example
#include "strclass.h"
...
static StringClass * pClasses_m;
...
int isClassMember(const char * pszString_in,
const char * pszClassName_in)
{
StringClass * pClass;
pClass = findStringClass(pszClassName_in, pClasses_m);
if (pClass == NULL)
return 0;
return isStringClassMember(pszString_in, pClass);
}
`strcla.c'
#include "textctl.h" /* or template.h or opaclib.h */
int loadIntxCtlFile(const char * pszFilename_in,
int cComment_in,
TextControl * pTextCtl_out,
StringClass ** ppStringClasses_io);
loadIntxCtlFile loads a text input control file into memory.
This is a standard format file containing one data record with the
following fields (not necessarily in this order):
\ambig
\ambig field is optional, and may occur only once.
\barchar
|) in
the S.I.L. Manuscripter program in the early 1980's. The
\barchar field is optional, and may occur only once. An empty
field disables this feature.
\barcodes
\barchar character
to form formatting commands. Whitespace (spaces, tabs, or newlines) in
this field is optional. The \barcodes field is optional, and
may occur any number of times. Its effect is cumulative.
\ch
\ch field is optional, and may
occur any number of times. An ordered list of consistent changes is
built by the function. Each change is applied to each input word as
many times as necessary before the next change is applied.
\dsc
\dsc field is optional, and may occur only once.
\excl
\format field in the text input
control file. The \excl field lists one or more field codes
(formatting commands) complete with the leading \format
character. Field codes are separated by whitespace (spaces, tabs, or
newlines). The \excl field is optional, and may occur any
number of times. Its effect is cumulative. If any \excl fields
occur, then no \incl fields are allowed, and all fields in the
input file that are not explicitly listed in a \excl field will
be processed.
\format
\format field is optional, and may occur only once.
\incl
\format field in the text input control
file. The \incl field lists one or more field codes (formatting
commands) complete with the leading \format character. Field
codes are separated by whitespace (spaces, tabs, or newlines). The
\incl field is optional, and may occur any number of times. Its
effect is cumulative. If any \incl fields occur, then no
\excl fields are allowed, and only those fields in the input
file that are explicitly listed in a \incl field will be
processed.
\luwfc
\luwfc field is optional, and may occur
any number of times. Its effect is cumulative. For lowercase and
uppercase forms that are represented by two or more adjacent characters
(bytes), use the \luwfcs field described below.
\luwfcs
\luwfcs field is optional, and may occur any
number of times. Its effect is cumulative. Note that \luwfcs
fields may be used to replace \luwfc fields, or the two types of
fields may be mixed together in the control file.
The implementation underlying the \luwfcs field does not require
that the lowercase and uppercase forms occupy the same number of
characters (bytes).
\maxdecap
\maxdecap field is
optional, and may occur only once.
\nocap
\luwfc and \luwfcs fields
should not be used. The \nocap field is optional, and may occur
only once.
\noincap
\noincap field is optional, and may
occur only once.
\scl
\scl field is
optional, and any number of string classes may be defined. A string
class definition must occur before any \ch field that uses that
string class.
\wfc
\wfc field is
optional, and may occur any number of times. Its effect is cumulative.
For caseless forms that are represented by two or more adjacent characters
(bytes), use the \wfcs field described below.
\wfcs
\wfcs field is optional, and may occur
any number of times. Its effect is cumulative. Note that \wfcs
fields may be used to replace \wfc fields, or the two types of
fields may be mixed together in the control file.
For more details about this file, see section `Text Input Control File' in AMPLE Reference Manual.
The arguments to loadIntxCtlFile are as follows:
pszFilename_in
cComment_in
pTextCtl_out
ppStringClasses_io
\ch fields or added to by \scl fields.
zero if successful, nonzero if an error occurs
5.46.4 Example
#include <stdio.h>
#include "textctl.h" /* includes strclass.h */
#include "rpterror.h"
...
char szIntxFilename_g[200];
TextControl sTextControl_g;
StringClass * pStringClasses_g = NULL;
static TextControl sDefaultTextControl_m = {
NULL, /* filename */
NULL, /* ordered array of lowercase letters */
NULL, /* ordered array of matching uppercase letters */
NULL, /* array of caseless letters */
NULL, /* list of input orthography changes */
NULL, /* list of output (orthography) changes */
NULL, /* list of format markers (fields) to include */
NULL, /* list of format markers (fields) to exclude */
'\\', /* initial character of format markers (field codes) */
'%', /* character for marking ambiguities and failures */
'-', /* character for marking decomposition */
'|', /* initial character of secondary format markers */
NULL, /* (Manuscripter) bar codes */
TRUE, /* flag whether to capitalize individual letters */
TRUE, /* flag whether to decapitalize/recapitalize */
100 /* maximum number of decapitalization alternatives */
};
...
memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl));
fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: ");
fgets( szIntxFilename_g, 200, stdin );
if (szIntxFilename_g[0])
{
if (loadIntxCtlFile(szIntxFilename_g, ';',
sTextControl_g, pStringClasses_g) != 0)
{
reportError(ERROR_MSG, "Error reading text control file %s\n",
szIntxFilename_g);
}
}
if ( (sTextControl_g.cBarMark == NUL) &&
(sTextControl_g.pszBarCodes != NULL) )
{
freeMemory(sTextControl_g.pszBarCodes);
sTextControl_g.pszBarCodes = NULL;
}
if ( (sTextControl_g.cBarMark != NUL) &&
(sTextControl_g.pszBarCodes == NULL) )
{
sTextControl_g.pszBarCodes = (unsigned char *)duplicateString(
"bdefhijmrsuvyz");
}
`loadintx.c'
#include "textctl.h" /* or template.h or opaclib.h */
int loadOutxCtlFile(const char * pszFilename_in,
int cComment_in,
TextControl * pTextCtl_out,
StringClass ** ppStringClasses_io);
loadOutxCtlFile loads a text output control file into memory.
This is a standard format file containing one data record with the
following fields (not necessarily in this order):
\ambig
\ambig field is optional, and may occur only
once.
\ch
\ch field is optional, and may occur any number of times. An
ordered list of consistent changes is built by the function. Each
change is applied to each output word as many times as necessary before
the next change is applied.
\dsc
\dsc field is optional, and may occur only once.
\format
\format field is optional, and may occur only once.
\luwfc
\luwfc field is optional, and may occur
any number of times. Its effect is cumulative. For lowercase and
uppercase forms that are represented by two or more adjacent characters
(bytes), use the \luwfcs field described below.
\luwfcs
\luwfcs field is optional, and may occur any
number of times. Its effect is cumulative. Note that \luwfcs
fields may be used to replace \luwfc fields, or the two types of
fields may be mixed together in the control file.
The implementation underlying the \luwfcs field does not require
that the lowercase and uppercase forms occupy the same number of
characters (bytes).
\scl
\scl field is
optional, and any number of string classes may be defined. A string
class definition must occur before any \ch field that uses that
string class.
\wfc
\wfc field is
optional, and may occur any number of times. Its effect is cumulative.
For caseless forms that are represented by two or more adjacent characters
(bytes), use the \wfcs field described below.
\wfcs
\wfcs field is optional, and may occur
any number of times. Its effect is cumulative. Note that \wfcs
fields may be used to replace \wfc fields, or the two types of
fields may be mixed together in the control file.
Note that these are only a subset of the fields allowed in a text input control file. For more details about this file, see section `The text output control file' in KTEXT Reference Manual.
The arguments to loadOutxCtlFile are as follows:
pszFilename_in
cComment_in
pTextCtl_out
ppStringClasses_io
\ch fields or added to by \scl fields.
zero if successful, nonzero if an error occurs
5.47.4 Example
#include <stdio.h>
#include "textctl.h" /* includes strclass.h */
#include "rpterror.h"
...
char szOutxFilename_g[200];
TextControl sOutputControl_g;
StringClass * pStringClasses_g = NULL;
...
memset(&sOutputControl_g, 0, sizeof(TextControl));
fprintf(stderr, "Text Output Control File (xxOUTX.CTL) [none]: ");
fgets(szOutxFilename_g, 200, stdin);
if (szOutxFilename_g[0])
{
if (loadOutxCtlFile(szOutxFilename_g, ';',
sOutputControl_g, pStringClasses_g) != 0)
{
reportError(ERROR_MSG,
"Error reading text output control file %s\n",
szOutxFilename_g);
}
}
`loadoutx.c'
#include "textctl.h" /* or template.h or opaclib.h */
int matchAlphaChar(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
matchAlphaChar checks whether the input string begins with a
multibyte alphabetic (word formation) character. If so, it returns the
number of bytes in the matched multibyte alphabetic character.
This function depends on previous calls to addWordFormationChars,
addWordFormationCharStrings, addLowerUpperWFChars, and
addLowerUpperWFCharStrings to establish the multibyte alphabetic
characters. (These functions are implicitly called by
loadIntxCtlFile and loadOutxCtlFile.)
The arguments to matchAlphaChar are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte alphabetic character at the beginning of the input string, or zero if the the string does not begin with a multibyte alphabetic character
5.48.4 Example See section 5.14 convLowerToUpper.
5.48.5 Source File `myctype.c'
#include "opaclib.h"
int matchBeginning(const char * pszString_in,
const char * pszBegin_in);
matchBeginning compares two strings, using the end of the second
string as the cutoff point for the comparison. It is functionally
equivalent to
(strncmp(pszString_in, pszBegin_in, strlen(pszBegin_in)) == 0)
The arguments to matchBeginning are as follows:
pszString_in
pszBegin_in
nonzero (TRUE) if the two strings are equal up to the end of the second string, otherwise zero (FALSE)
5.49.4 Example
#include "opaclib.h"
...
char string[100], match[50];
...
if (matchBeginning(string, match))
{
...
}
`matchbeg.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
size_t matchBeginWithStringClass(const char * pszString_in,
const StringClass * pClass_in);
matchBeginWithStringClass searches a string class to find a
class member that matches the beginning of a string. It stops at the
first successful match.
The arguments to matchBeginWithStringClass are as follows:
pszString_in
pClass_in
the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE)
5.50.4 Example
#include "strclass.h"
...
static StringClass * pClasses_m;
...
int matchesClassMemberAtBeginning(const char * pszString_in,
const char * pszClassName_in)
{
StringClass * pClass;
pClass = findStringClass(pszClassName_in, pClasses_m);
if (pClass == NULL)
return 0;
return matchBeginWithStringClass(pszString_in, pClass);
}
`strcla.c'
#include "textctl.h"
int matchCaselessChar(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
matchCaselessChar checks whether the input string begins with a
multibyte caseless character. If so, it returns the number of bytes in
the matched multibyte caseless character.
This function depends on previous calls to addWordFormationChars
or addWordFormationCharStrings to establish the multibyte caseless
characters. (addWordFormationChars and
addWordFormationCharStrings are implicitly called by
loadIntxCtlFile and loadOutxCtlFile.)
The arguments to matchCaselessChar are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte caseless character at the beginning of the input string, or zero if the the string does not begin with a multibyte caseless character
5.51.4 Example See section 5.54 matchLowercaseChar.
5.51.5 Source File `myctype.c'
#include "opaclib.h"
int matchEnd(const char * pszString_in,
const char * pszTail_in);
matchEnd compares the second string against the end of the
first string. It is functionally equivalent to
((strlen(pszString_in) < strlen(pszTail_in)) ? 0 :
(strcmp(pszString_in + strlen(pszString_in) - strlen(pszTail_in),
pszTail_in) == 0))
The arguments to matchEnd are as follows:
pszString_in
pszTail_in
nonzero (TRUE) if the second string matches the end of the first string, otherwise zero (FALSE)
5.52.4 Example
#include "opaclib.h"
...
char string[100], match[50];
...
if (matchEnd(string, match))
{
...
}
`matchend.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
size_t matchEndWithStringClass(const char * pszString_in,
const StringClass * pClass_in);
matchEndWithStringClass searches a string class to find a class
member that matches the end of a string. It stops at the first
successful match.
The arguments to matchEndWithStringClass are as follows:
pszString_in
pClass_in
the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE)
5.53.4 Example
#include "strclass.h"
...
static StringClass * pClasses_m;
...
int matchesClassMemberAtEnd(const char * pszString_in,
const char * pszClassName_in)
{
StringClass * pClass;
pClass = findStringClass(pszClassName_in, pClasses_m);
if (pClass == NULL)
return 0;
return matchEndWithStringClass(pszString_in, pClass);
}
`strcla.c'
#include "textctl.h" /* or template.h or opaclib.h */
int matchLowercaseChar(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
matchLowercaseChar checks whether the input string begins with a
multibyte lowercase character. If so, it returns the number of bytes in
the matched multibyte lowercase character.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the multibyte lowercase
characters. (addLowerUpperWFChars and
addLowerUpperWFCharStrings are implicitly called by
loadIntxCtlFile and loadOutxCtlFile.)
The arguments to matchLowercaseChar are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character
5.54.4 Example
#include "textctl.h"
#define CASELESS -1
#define NOCAP 0
#define INITCAP 1
#define ALLCAP 2
#define MIXCAP 3
int getWordCase(const unsigned char * pszWord_in,
const TextControl * pTextCtl_in)
{
unsigned uiUpperCount = 0;
unsigned uiLowerCount = 0;
int bFirstCap = 0;
int iLength;
unsigned char * p;
for ( p = pszWord_in ; p && *p ; p += iLength )
{
iLength = matchLowercaseChar(p, pTextCtl_in);
if (iLength != 0)
++uiLowerCount;
else
{
iLength = matchUppercaseChar(p, pTextCtl_in);
if (iLength != 0)
{
++uiUpperCount;
if (uiLowerCount == 0)
bFirstCap = 1;
}
else
{
iLength = matchCaselessChar(p, pTextCtl_in);
if (iLength == 0)
iLength = 1;
}
}
}
if ((uiUpperCount == 0) && (uiLowerCount == 0))
return CASELESS;
else if (uiUpperCount == 0)
return NOCAP;
else if (bFirstCap && (uiUpperCount == 1))
return INITCAP;
else if (uiLowerCount == 0)
return ALLCAP;
else
return MIXCAP;
}
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */
int matchUppercaseChar(const unsigned char * pszString_in,
const TextControl * pTextCtl_in);
matchUppercaseChar checks whether the input string begins with a
multibyte uppercase character. If so, it returns the number of bytes in
the matched multibyte uppercase character.
This function depends on previous calls to addLowerUpperWFChars or
addLowerUpperWFCharStrings to establish the multibyte uppercase
characters. (addLowerUpperWFChars and
addLowerUpperWFCharStrings are implicitly called by
loadIntxCtlFile and loadOutxCtlFile.)
The arguments to matchUppercaseChar are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character
5.55.4 Example See section 5.54 matchLowercaseChar.
5.55.5 Source File `myctype.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * mergeIntoStringList(StringList * pList_io,
const char * pszString_in);
mergeIntoStringList adds a string to the beginning of a list of
strings if it is not already present in the list.
The arguments to mergeIntoStringList are as follows:
pList_io
pszString_in
duplicateString is stored in the list, not the original string
itself.
a pointer to the possibly modified list of strings
5.56.4 Example
#include "strlist.h"
...
StringList * pStrings = NULL;
...
pStrings = mergeIntoStringList(pStrings, "this");
/* pStrings-->"this"-->NULL */
pStrings = mergeIntoStringList(pStrings, "test");
/* pStrings-->"test"-->"this"-->NULL */
pStrings = mergeIntoStringList(pStrings, "is");
/* pStrings-->"is"-->"test"-->"this"-->NULL */
pStrings = mergeIntoStringList(pStrings, "a");
/* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */
pStrings = mergeIntoStringList(pStrings, "test");
/* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */
`add_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * mergeIntoStringListAtEnd(StringList * pList_io,
const char * pszString_in);
mergeIntoStringListAtEnd adds a string to the end of a list of
strings if it is not already present in the list.
The arguments to mergeIntoStringListAtEnd are as follows:
pList_io
pszString_in
duplicateString is stored in the list, not the original string
itself.
a pointer to the possibly modified list of strings
5.57.4 Example
#include "strlist.h"
...
StringList * pStrings = NULL;
...
pStrings = mergeIntoStringListAtEnd(pStrings, "this");
/* pStrings-->"this"-->NULL */
pStrings = mergeIntoStringListAtEnd(pStrings, "test");
/* pStrings-->"this"-->"test"-->NULL */
pStrings = mergeIntoStringListAtEnd(pStrings, "is");
/* pStrings-->"this"-->"test"-->"is"-->NULL */
pStrings = mergeIntoStringListAtEnd(pStrings, "a");
/* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */
pStrings = mergeIntoStringListAtEnd(pStrings, "test");
/* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */
`appnd_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * mergeTwoStringLists(StringList * pFirstList_io,
StringList * pSecondList_io);
mergeTwoStringLists merges two lists of strings together to form
a single list. Any strings in the second list that exist in the first
list are freed. Neither of the original lists survives this operation.
The arguments to mergeTwoStringLists are as follows:
pFirstList_io
pSecondList_io
a pointer to the merged list
5.58.4 Example
#include "strlist.h" ... StringList * pStrings = NULL; StringList * pStrings1 = NULL; StringList * pStrings2 = NULL; ... pStrings1 = mergeIntoStringListAtEnd(pStrings1, "this"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "is"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "a"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings2 = mergeIntoStringList(pStrings2, "that"); pStrings2 = mergeIntoStringList(pStrings2, "test"); pStrings2 = mergeIntoStringList(pStrings2, "is"); pStrings2 = mergeIntoStringList(pStrings2, "good"); /* pStrings1-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings2-->"good"-->"is"-->"test"-->"that"-->NULL */ pStrings = mergeTwoStringLists(pStrings1, pStrings2); /* pStrings-->"good"-->"that"-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings1-->-----------------^ */ /* pStrings2-->??? */
`cat_sl.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */
Change * parseChangeString(const char * pszString_in,
const StringClass * pClassList_in);
parseChangeString parses a string to build a Change
structure.
The arguments to parseChangeString are as follows:
pszString_in
pClasses_in
a pointer to a newly allocated Change structure, or NULL if an
error occurred while parsing the change definition
5.59.4 Example
#include "change.h" /* includes strclass.h */
...
Change * addChange(const char * pszChange_in,
Change * pChanges_io,
const StringClass * pClasses_in)
{
Change * pChange;
Change * pTail;
pChange = parseChangeString(pszChange_in, pClasses_in);
if (pChange != NULL)
{
if (pChanges_io == NULL)
return pChange;
/*
* keep the list of changes in the original order
*/
for (pTail = pChanges_io ; pTail->pNext ; pTail = pTail->pNext)
;
pTail->pNext = pChange;
}
return pChanges_io;
}
`change.c'
#include "opaclib.h"
void promptUser(const char * pszPrompt_in,
char * pszBuffer_out,
unsigned uiBufferSize_in);
promptUser prompts the user, then reads a line of input from the
keyboard (normally the standard input). If an EOF occurs,
promptUser tries to reopen the keyboard.
The arguments to promptUser are as follows:
pszPrompt_in
pszBuffer_out
uiBufferSize_in
NUL).
none
5.60.4 Example
#include <stdio.h>
#include "opaclib.h"
...
char szFilename_g[BUFSIZ+1];
FILE * pInputFP_g;
char szBuffer_g[17];
long iRepeatCount_g;
...
promptUser("Data file: ", szFilename_g, BUFSIZ);
pInputFP_g = fopen(szFilename_g, "r");
...
promptUser("Number of iterations to perform: ", szBuffer_g, 16);
iRepeatCount_g = strtol(szBuffer_g, NULL, 10);
`promptus.c'
#include "opaclib.h"
char * readLineFromFile(FILE * pInputFP_in,
unsigned * puiLineNumber_io,
int cComment_in);
readLineFromFile reads an arbitrarily long line of input text,
erasing the trailing newline character. The string returned is
overwritten or freed at the next call to readLineFromFile.
The arguments to readLineFromFile are as follows:
pInputFP_in
FILE pointer.
puiLineNumber_io
NULL.
cComment_in
the address of the buffer containing the NUL-terminated line, or
NULL if already at the end of the file
5.61.4 Example
#include <stdio.h>
#include <string.h>
#include "opaclib.h"
void processFile(const char * pszFilename_in)
{
FILE * pInputFP;
unsigned uiLineNumber;
char * pszLine;
if (pszFilename_in == NULL)
return;
pInputFP = fopen(pszFilename_in, "r");
if (pInputFP == NULL)
return;
uiLineNumber = 1;
while ((pszLine = readLineFromFile(pInputFP,
&uiLineNumber, ';')) != NULL)
{
...
}
printf("%u lines read from %s\n", uiLineNumber, pszFilename_in);
}
`readline.c'
#include "template.h" /* or opaclib.h */
WordTemplate ** readSentenceOfTemplates(FILE * pInputFP_in,
const char * pszAnaFile_in,
const char * pszFinalPunct_in,
TextControl * pTextCtl_in,
FILE * pLogFP_in)
readSentenceOfTemplates reads an arbitrarily long sentence
(sequence of words) from an input analysis file, building an array of
WordTemplate data structures. The sentence is terminated by a
sentence-final punctuation character from pszFinalPunct_in.
The arguments to readSentenceOfTemplates are as follows:
pInputFP_in
FILE pointer.
pszAnaFile_in
pszFinalPunct_in
NUL-terminated string of punctuation characters that
mark the end of a sentence.
pTextCtl_in
pLogFP_in
FILE pointer, used to log error messages, or
NULL.
a pointer to a dynamically allocated NULL-terminated array of
pointers to dynamically allocated WordTemplate structures
5.62.4 Example
#include <stdio.h>
#include "template.h"
#include "allocmem.h"
#include "rpterror.h"
...
TextControl sTextControl_g;
static const char szSentenceFinalPunc_m[] = ".!?";
static const char szCannotOpen_m[] =
"Warning: cannot open analysis input file %s\n";
...
void processSentences(char * pszAnaFile_in, FILE * pLogFP_in)
{
FILE * pInputFP;
WordTemplate ** pSentence;
unsigned uiSentenceCount;
unsigned i;
...
pInputFP = fopen(pszAnaFile_in, "r");
if (pInputFP == NULL)
{
reportError(ERROR_MSG, szCannotOpen_m, pszAnaFile_in);
if (pLogFP_in != NULL)
fprintf(pLogFP_in, szCannotOpen_m, pszAnaFile_in);
return 0;
}
for ( uiSentenceCount = 0 ;; ++uiSentenceCount )
{
pSentence = readSentenceOfTemplates(pInputFP,
pszAnaFile_in,
szSentenceFinalPunc_m,
&sTextControl_g,
pLogFP_in);
if (pSentence == NULL)
break;
...
for ( i = 0 ; pSentence[i] ; ++i )
freeWordTemplate( pSentence[i] );
freeMemory( pSentence );
}
return uiSentenceCount;
}
`senttemp.c'
#include "opaclib.h"
char ** readStdFormatField(FILE * pInputFP_in,
const char ** ppszFieldCodes_in,
int cComment_in);
readStdFormatField reads an arbitrarily large text field that
starts with a backslash marker at the beginning of a line. Each line
of the input field is stored separately in a NULL-terminated
array of strings. If the field code at the beginning matches one of
those in the input array of field codes, it is replaced by a single
byte containing the 1-based index of the matching field code.
Otherwise, the field code is left intact except that the backslash
character is replaced by the character code 255 ('\377').
This function is an alternative to readStdFormatRecord, which
potentially reads several fields at a time.
The arguments to readStdFormatField are as follows:
pInputFP_in
FILE pointer.
ppszFieldCodes_in
NULL-terminated array of field code strings.
cComment_in
a pointer to a dynamically allocated NULL-terminated array of
pointers to dynamically allocated lines of text
5.63.4 Example
#include <stdio.h>
#include "opaclib.h"
...
static char szWhitespace_m[7] = " \t\r\n\f\v";
...
int read_control_file(char * pszControlFile_in)
{
int i;
char * pszRuleFile = NULL;
char * pszLexiconFile = NULL;
char * pszGrammarFile = NULL;
StringList * pTraceList = NULL;
char * pszMorph;
FILE * pControlFP;
char ** ppszField;
char * pszLine;
static char * aszCodes_s[] = {
"\\rules", "\\lexicon", "\\grammar", "\\trace", ..., NULL
};
if (pszControlFile_in == NULL)
return FALSE;
pControlFP = fopen(pszControlFile_in, "r");
if (pControlFP == (FILE *)NULL)
{
reportError(WARNING_MSG, "Cannot open control file %s\n",
pszControlFile_in);
return FALSE;
}
for (;;)
{
ppszField = readStdFormatField(pControlFP, aszCodes_s, NUL));
if (ppszField == NULL)
break;
switch (**ppszField)
{
case 1: /* "\\rules" */
if (pszRuleFile != NULL)
reportError(WARNING_MSG,
"Rule file already specified: %s\n",
pszRuleFile);
else
{
for ( i = 0 ; ppszField[i] ; ++i )
{
pszLine = ppszField[i];
if (i == 0)
++pszLine;
pszRuleFile = strtok(pszLine, szWhitespace_m);
if (pszRuleFile != NULL)
break;
}
}
break;
case 2: /* "\\lexicon" */
if (pszLexiconFile != NULL)
reportError(WARNING_MSG,
"Lexicon file already specified: %s\n",
pszLexiconFile);
else
{
for ( i = 0 ; ppszField[i] ; ++i )
{
pszLine = ppszField[i];
if (i == 0)
++pszLine;
pszLexiconFile = strtok(pszLine, szWhitespace_m);
if (pszLexiconFile != NULL)
break;
}
}
break;
case 3: /* "\\grammar" */
if (pszGrammarFile != NULL)
reportError(WARNING_MSG,
"Grammar file already specified: %s\n",
pszGrammarFile);
else
{
for ( i = 0 ; ppszField[i] ; ++i )
{
pszLine = ppszField[i];
if (i == 0)
++pszLine;
pszGrammarFile = strtok(pszLine, szWhitespace_m);
if (pszGrammarFile != NULL)
break;
}
}
break;
case 4: /* "\\trace" */
for ( i = 0 ; ppszField[i] ; ++i )
{
pszLine = ppszField[i];
if (i == 0)
++pszLine;
for ( pszMorph = strtok(pszLine, szWhitespace_m) ;
pszMorph ;
pszMorph = strtok(NULL, szWhitespace_m)
{
pTraceList = mergeIntoStringList(pTraceList,
pszMorph);
}
}
break;
...
default:
reportError(WARNING_MSG, "Unknown field: \\%s\n",
ppszField[0] + 1);
break;
}
for ( i = 0 ; ppszField[i] ; ++i )
freeMemory(ppszField[i]);
freeMemory(ppszField);
}
fclose(pControlFP);
...
return TRUE;
}
`readfiel.c'
#include "record.h" /* or opaclib.h */
char * readStdFormatRecord(FILE * pInputFP_in,
const CodeTable * pCodeTable_in,
int cComment_in,
unsigned * puiRecordCount_io);
readStdFormatRecord reads the next record from a standard format
file. The record is stored in memory as a series of
NUL-terminated strings stored consecutively in a single buffer,
with the record terminated by two consecutive NUL bytes. The
first character of each string is either a character representing the
field code (if found in the code table), or a backslash indicating that
the field code was not recognized.
This function is an alternative to readStdFormatField, which
always reads only one field at a time.
The arguments to readStdFormatRecord are as follows:
pInputFP_in
FILE pointer.
pCodeTable_in
cComment_in
puiRecordCount_io
NULL.
a pointer to the buffer containing the record, or NULL for
EOF.
5.64.4 Example
#include <stdio.h>
#include <string.h>
#include "record.h"
...
void loadStdFmtFile(pszFilename_in)
char * pszFilename_in;
{
FILE * pInputFP;
char * pRecord;
char * pszField;
char * pszNextField;
unsigned uiRecordCount;
static CodeTable sCodeTable_s = { "\
\\a\0A\0\
\\d\0D\0\
\\w\0W\0\
\\f\0F\0\
\\c\0C\0\
\\n\0N\0"
6, "\\a"
};
if (pszFilename_in == NULL)
return;
pInputFP = fopen(pszFilename_in, "r");
if (pInputFP == NULL)
return;
while ((pRecord = readStdFormatRecord(pInputFP,
&sCodeTable_s,
';',
&uiRecordCount)) != NULL)
{
pszField = pRecord;
while ((c = *pszField++) != '\0')
{
pszNextField = pszField + strlen(pszField) + 1;
switch (c)
{
case 'A':
...
break;
case 'C':
...
break;
case 'D':
...
break;
case 'F':
...
break;
case 'N':
...
break;
case 'W':
...
break;
default:
...
break;
}
pszField = pszNextField;
}
...
}
cleanupAfterStdFormatRecord();
fclose(pInputFP);
return;
}
`record.c'
#include "template.h" /* or opaclib.h */
WordTemplate * readTemplateFromAnalysis(
FILE * pInputFP_in,
const TextControl * pTextCtl_in);
readTemplateFromAnalysis fills in a WordTemplate data
structure from an AMPLE style analysis file.
The arguments to readTemplateFromAnalysis are as follows:
pInputFP_in
FILE pointer.
pTextCtl_in
a pointer to a dynamically allocated WordTemplate data
structure, or NULL if either EOF or an error occurs
5.65.4 Example
#include "template.h"
#include "rpterror.h"
...
void synthesizeFile(
char * pszInputFile_in,
char * pszOutputFile_in,
TextControl * pTextCtl_in)
{
FILE * pInputFP;
FILE * pOutputFP;
WordTemplate * pWord;
WordAnalysis * pAnal;
...
/*
* open the files
*/
if ((pszInputFile_in == NULL) || (pszOutputFile_in == NULL))
return;
pInputFP = fopen(pszInputFile_in, "r");
if (pInputFP == NULL)
{
reportError(WARNING_MSG, "Cannot open input file %s\n",
pszInputFile_in);
return;
}
pOutputFP = fopen(pszOutputFile_g, "w");
if (pOutputFP == NULL)
{
reportError(WARNING_MSG, "Cannot open output file %s\n",
pszOutputFile_in);
fclose(pInputFP);
return;
}
/*
* process the data
*/
for (;;)
{
pWord = readTemplateFromAnalysis(pInputFP, &pTextCtl_in);
if (pWord == NULL)
break;
...
for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext )
{
...
}
...
writeTextFromTemplate( pOutputFP, pWord, pTextCtl_in);
freeWordTemplate( pWord );
}
...
fclose(pInputFP);
fclose(pOutputFP);
}
`dtbin.c'
#include "template.h" /* or opaclib.h */
WordTemplate * readTemplateFromText(FILE * pInputFP_in,
const TextControl * pTextCtl_in);
readTemplateFromText reads a word from a text file into a
WordTemplate structure.
The arguments to readTemplateFromText are as follows:
pInputFP_in
FILE pointer.
pTextCtl_in
a pointer to a dynamically allocated WordTemplate data
structure, or NULL if either EOF or an error occurs
5.66.4 Example See section 5.38 freeWordTemplate.
5.66.5 Source File `textin.c'
#include "template.h" /* or opaclib.h */
WordTemplate * readTemplateFromTextString(unsigned char ** ppszString_io,
const TextControl * pTextCtl_in);
readTemplateFromText reads a word from a text string into a
WordTemplate structure.
The arguments to readTemplateFromText are as follows:
ppszString_io
pTextCtl_in
a pointer to a dynamically allocated WordTemplate data
structure, or NULL if either the string consists merely of
NUL or an error occurs
5.67.4 Example
#include "template.h"
...
TextControl sTextCtl_g;
...
WordAnalysis * merge_analyses(
WordAnalysis * pList_in,
WordAnalysis * pAnal_in)
{
...
}
...
void process(
unsigned char *pszInputText_in,
FILE * pOutputFP_in)
{
char * pszInputText;
char * pszWord;
WordTemplate * pWord;
WordAnalysis * pAnal;
unsigned uiAmbiguityCount;
unsigned long uiWordCount;
pszInputText = duplicateString(pszInputText_in);
pszWord = pszInputText;
for ( uiWordCount = 0L ;; )
{
pWord = readTemplateFromTextString(&pszWord, &sTextCtl_g);
if (pWord == NULL)
break;
uiAmbiguityCount = 0;
if (pWord->paWord != NULL)
{
for ( i = 0 ; pWord->paWord[i] ; ++i )
{
pAnal = analyze(pWord->paWord[i]);
pWord->pAnalyses = merge_analyses(pWord->pAnalyses,
pAnal);
}
for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext)
++uiAmbiguityCount;
}
writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g);
freeWordTemplate(pWord);
}
freeMemory(pszInputText);
}
`textin.c'
#include "allocmem.h" /* or opaclib.h */
void * reallocMemory(void * pBuffer_in,
size_t uiSize_in);
reallocMemory adjusts an allocated buffer to a new size.
It provides a "safe" interface to either realloc or
malloc, depending on whether or not pBuffer_in is
NULL. Running out of memory is handled the same as for
allocMemory; see
section 5.8 allocMemory.
The arguments to reallocMemory are as follows:
pBuffer_in
allocMemory, reallocMemory, or duplicateString.
It also may be NULL to allocate a new block of memory.
uiSize_in
a pointer to a possibly reallocated block
5.68.4 Example See section 5.29 fitAllocStringExactly.
5.68.5 Source File `allocmem.c'
#include "template.h" /* or opaclib.h */
void recapitalizeWord(char * pszWord_io,
int iRecap_in,
const TextControl * pTextCtl_in);
recapitalizeWord tries to reimpose capitalization as it was in
the original input text.
The arguments to recapitalizeWord are as follows:
pszWord_io
iRecap_in
0 (NOCAP)
1 (INITCAP)
2 (ALLCAP)
4-65535
4 encoding the capitalization of the first character, 8
encoding the second character, and so on.
pTextCtl_in
none
5.69.4 Example
#include "template.h"
void fix_new_words(pTemplate_io, pTextCtl_in)
WordTemplate * pTemplate_io;
const TextControl * pTextCtl_in;
{
StringList * pWord;
char * p;
if ((pTemplate_io == NULL) || (pTemplate_io->pNewWords == NULL))
return;
if (pTextCtl_in == NULL)
return;
/*
* apply orthography changes to the word and recapitalize it
*/
for ( pWord = pTemplate_io->pNewWords ; pWord ; pWord = pWord->pNext )
{
/*
* apply output orthography changes and recapitalize
*/
p = applyChanges(pWord->pszString, pTextCtl_in->pOutputChanges );
recapitalizeWord( p, pTemplate_io->iCapital, pTextCtl_in);
/*
* store the modified wordform
*/
freeMemory(pWord->pszString);
pWord->pszString = p;
}
}
`textout.c'
#include "trie.h" /* or opaclib.h */
int removeDataFromTrie(Trie * pTrieHead_in,
char * pszKey_in,
void * pInfo_in,
void * (* pfRemoveInfo_in)(void * pOld_in,
void * pList_io));
removeDataFromTrie removes a stored piece of information from a
trie.
The arguments to removeDataFromTrie are as follows:
pTrieHead_in
pszKey_in
pInfo_in
pfRemoveInfo_in
pOld_in
pInfo_in).
pList_io
Trie node
(Trieinfo).
pTrieInfo.
zero if successful, nonzero if an error occurs
5.70.4 Example
#include <string.h>
#include "trie.h"
#include "rpterror.h"
#include "allocmem.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next item */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
unsigned long uiLexiconCount_g;
static char szWhitespace_m[7] = " \t\r\n\f\v";
...
static void * remove_lex_item(void * pDefunct_in, void * pList_in)
{
LexItem * pLex;
LexItem * pList;
/*
* be a little paranoid
*/
if (pDefunct_in == NULL)
return pList_in;
/*
* handle removing the head of the list
*/
if (pDefunct_in == pList_in)
return pDefunct_in->pLink;
/*
* unlink from the list of homographs
*/
/*
* unlink from both the general list and the list of homographs
*/
for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink )
{
if (pLex->pNext == pDefunct_in)
pLex->pNext = pDefunct_in->pNext;
if (pLex->pLink == pDefunct_in)
{
pLex->pLink = pDefunct_in->pLink;
break; /* no need to check further */
}
}
return pList_in;
}
void remove_from_lexicon(char * pszForm_in,
char * pszGloss_in,
char * pszCategory_in)
{
LexItem * pLex;
unsigned short uiCategory;
if ( (pszForm_in == NULL) ||
(pszGloss_in == NULL) ||
(pszCategory_in == NULL) )
return;
uiCategory = index_lexical_category(pszCategory_in);
for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ;
pLex ;
pLex = pLex->pLink )
{
if ( (strcmp(pLex->pszForm, pszWord_in) == 0) &&
(strcmp(pLex->pszGloss, pszGloss_in) == 0) &&
(pLex->uiCategory == uiCategory) )
{
removeDataFromTrie(pLexicon_g, pszForm_in, pLex,
remove_lex_item);
break;
}
}
}
`trie.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * removeFromStringList(StringList * pList_io,
const char * pszString_in);
removeFromStringList removes the first occcurrence of a string
from a list of strings.
The arguments to removeFromStringList are as follows:
pList_io
pszString_in
a pointer to the (possibly shorter) list, or NULL if the only
item in the list was removed
5.71.4 Example
#include "strlist.h" ... static StringList * pNameList_m; ... char * pszName; ... pNameList_m = removeFromStringList(pNameList_m, pszName); ...
`rmstr_sl.c'
#include "rpterror.h" /* or opaclib.h */
void reportError(int eMessageType_in,
const char * pszFormat_in,
...);
reportError reports an error message to the user. For MS-DOS
and Unix, reportError writes to the standard error output. The
message is also written to the standard output if it has been
redirected. For GUI programs, the programmer must write a different
version of reportError to satisfy the link requirements of other
functions in the OPAC library. This would typically display a message
box.
The arguments to reportError are as follows:
eMessageType_in
ERROR_MSG
WARNING_MSG
DEBUG_MSG
pszFormat_in
printf style format string for the (error) message.
...
pszFormat_in).
none
5.72.4 Example See section 5.1 addDataToTrie.
5.72.5 Source File `rpterror.c'
#include "rpterror.h" /* or opaclib.h */
void reportMessage(int bNotSilent_in,
const char * pszFormat_in,
...);
reportMessage displays a message with zero or more arguments.
For MS-DOS and Unix, reportMessage writes to the standard error
output. The message is also written to the standard output if it has
been redirected. For GUI programs, the programmer must write a different
version of reportMessage to satisfy the link requirements of other
functions in the OPAC library. This would typically write to a message
window.
The arguments to reportMessage are as follows:
bNotSilent_in
TRUE
(nonzero). If FALSE (zero), the message is written only to the
standard output (stdout), and then only if it has been
redirected. This allows programs to have a "quiet" mode of
operation without requiring a global variable.
pszFormat_in
printf style format string for the message.
...
pszFormat_in).
none
5.73.4 Example
#include "rpterror.h"
...
static int iDebugLevel_m;
...
static int read_token(pszBuffer_in, uiBufferSize_in)
char * pszBuffer_in;
unsigned uiBufferSize_in;
{
int iTokenType;
...
if (iDebugLevel_m >= 8)
{
reportMessage("DEBUG read_token(\"%s\",%u) => ",
pszBuffer_in, uiBufferSize_in);
switch (iTokenType)
{
case BECOMES:
reportMessage("BECOMES_TOKEN");
break;
case KEYWORD:
reportMessage("KEYWORD_TOKEN");
break;
case SYMBOL:
reportMessage("SYMBOL_TOKEN");
break;
default:
reportMessage("'%c'\t", iTokenType);
break;
}
reportMessage("\n");
}
return( iTokenType );
}
`rptmessg.c'
#include "opaclib.h" void reportProgress(unsigned long uiCount_in);
reportProgress displays a progress report based on a progress
counter.
The standard version of reportProgress actually does nothing.
For GUI programs, the programmer may write a version of
reportProgress to display some sort of progress message using
the progress counter.
reportProgress has one argument:
uiCount_in
none
5.74.4 Example
#include "opaclib.h"
...
static unsigned long uiTokenCount_m;
...
static int read_token(pszBuffer_in, uiBufferSize_in)
char * pszBuffer_in;
unsigned uiBufferSize_in;
{
int iTokenType;
...
++uiTokenCount_m;
reportProgress( uiTokenCount_m );
return( iTokenType );
}
`rptprgrs.c'
#include "textctl.h" /* or template.h or opaclib.h */ void resetTextControl(TextControl * pTextCtl_io);
resetTextControl frees any memory allocated by either
loadIntxCtlFile or
loadOutxCtlFile. It does not free the
TextControl data structure itself.
resetTextControl has one argument:
pTextCtl_io
none
5.75.4 Example
#include <stdio.h>
#include "textctl.h" /* include strclass.h */
#include "rpterror.h"
...
char szIntxFilename_g[200];
TextControl sTextControl_g;
StringClass * pStringClasses_g = NULL;
static TextControl sDefaultTextControl_m = {
NULL, /* filename */
NULL, /* ordered array of lowercase letters */
NULL, /* ordered array of matching uppercase letters */
NULL, /* array of caseless letters */
NULL, /* list of input orthography changes */
NULL, /* list of output (orthography) changes */
NULL, /* list of format markers (fields) to include */
NULL, /* list of format markers (fields) to exclude */
'\\', /* initial character of format markers (field codes) */
'%', /* character for marking ambiguities and failures */
'-', /* character for marking decomposition */
'|', /* initial character of secondary format markers */
NULL, /* (Manuscripter) bar codes */
TRUE, /* flag whether to capitalize individual letters */
TRUE, /* flag whether to decapitalize/recapitalize */
100 /* maximum number of decapitalization alternatives */
};
...
memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl));
fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: ");
fgets( szIntxFilename_g, 200, stdin );
if (szIntxFilename_g[0])
{
if (loadIntxCtlFile(szIntxFilename_g, ';',
sTextControl_g, pStringClasses_g) != 0)
{
reportError(ERROR_MSG, "Error reading text control file %s\n",
szIntxFilename_g);
}
}
if ( (sTextControl_g.cBarMark == NUL) &&
(sTextControl_g.pszBarCodes != NULL) )
{
freeMemory(sTextControl_g.pszBarCodes);
sTextControl_g.pszBarCodes = NULL;
}
if ( (sTextControl_g.cBarMark != NUL) &&
(sTextControl_g.pszBarCodes == NULL) )
{
sTextControl_g.pszBarCodes = (unsigned char *)duplicateString(
"bdefhijmrsuvyz");
}
...
resetTextControl(&sTextControl_g);
`resetxtc.c'
#include "textctl.h" /* or template.h or opaclib.h */ void resetWordFormationChars(TextControl * pTextCtl_io);
resetWordFormationChars erases the stored information about word
formation characters stored by previous calls to either
addWordFormationChars or addLowerUpperWFChars.
This frees any allocated memory and sets the relevant pointers to
NULL.
resetWordFormationChars has one argument:
pTextCtl_io
none
5.76.4 Example See section 5.2 addLowerUpperWFChars.
5.76.5 Source File `myctype.c'
#include "allocmem.h" /* or opaclib.h */ void setAllocMemoryTracing(const char * pszFilename_in);
setAllocMemoryTracing turns debugging on (if a filename is
given) or off (if pszFilename_in is NULL). If debugging
is on, every call to allocMemory, reallocMemory, and
freeMemory is logged to the given file for postmortem analysis.
Calls to duplicateString are logged as calls to
allocMemory, which duplicateString calls internally.
setAllocMemoryTracing has one argument:
pszFilename_in
NULL.
none
5.77.4 Example
#include <stdlib.h>
#include "allocmem.h"
...
extern int getopt(int argc, char * const argv[],
const char *opts);
extern char * optarg;
...
int main(int argc, char ** argv)
{
void * pTrapAddress = NULL;
unsigned iTrapCount = 0;
int k;
char * p;
...
while ((k = getopt(argc, argv, "ai:o:x:z:Z:")) != EOF)
{
switch (k)
{
...
case 'z': /* memory allocation trace filename */
setAllocMemoryTracing(optarg);
break;
case 'Z': /* memory allocation trap address,count */
pTrapAddress = (void *)strtoul(optarg, &p, 10);
if (*p == ',')
iTrapCount = (unsigned)strtoul(p+1, NULL, 10);
if (iTrapCount == 0)
iTrapCount = 1;
setAllocMemoryTrap(pTrapAddress, iTrapCount);
break;
...
}
}
...
}
`allocmem.c'
#include "allocmem.h" /* or opaclib.h */
void setAllocMemoryTrap(const void * pAddress_in,
int iCount_in);
setAllocMemoryTrap sets a trap for the iCount_in'th
reference to the address pAddress_in by either
allocMemory or freeMemory. This can be useful for
tracking down memory allocation bugs.
The arguments to setAllocMemoryTrap are as follows:
pAddress_in
iCount_in
none
5.78.4 Example See section 5.77 setAllocMemoryTracing.
5.78.5 Source File `allocmem.c'
#include "opaclib.h"
unsigned long showAmbiguousProgress(unsigned uiAmbiguityCount_in,
unsigned long uiItemCount_in);
showAmbiguousProgress displays the progress of the program in a
rudimentary fashion. If uiAmbiguityCount_in is 0, then a star
(`*') is written to the screen, and if uiAmbiguityCount_in
is 1, then a dot (`.') is written to the screen. Otherwise, if
uiAmbiguityCount_in is less than 10, the count digit is written,
and if it is greater than or equal to 10, a greater than sign
(`>') is written. These progress characters are grouped in
bunches of 10, with 5 bunches on a line and space between each bunch.
Every other line ends with the total count of items thus far
(uiItemCount_in).
The arguments to showAmbiguousProgress are as follows:
uiAmbiguityCount_in
uiItemCount_in
the updated value for uiItemCount_in
5.79.4 Example See section 5.38 freeWordTemplate.
5.79.5 Source File `ambprog.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
StringList * squeezeStringList(StringList * pList_io);
squeezeStringList removes any redundant strings from a list of
strings.
squeezeStringList has one argument:
pList_io
a pointer to the (possibly smaller) list of strings
5.80.4 Example
#include "template.h" /* includes strlist.h */ ... static WordTemplate * pTemplate_m = NULL; ... /* * eliminate identical results */ pTemplate_m->pNewWords = squeezeStringList( pTemplate_m->pNewWords );
`sqz_sl.c'
#include "opaclib.h"
unsigned char * tokenizeString(unsigned char * pszString_in,
const unsigned char * pszSeparate_in)
tokenizeString splits the string (pszString_in into a
sequence of zero or more text tokens separated by spans of one or more
characters from pszSeparate_in. Only the initial call provides
a value for pszString_in; successive calls must use a
NULL pointer for the first argument. The first separater
character following the token in pszString_in is replaced by a
NUL character. Subsequent calls to tokenizeString work
through pszString_in sequentially. Note that
pszSeparate_in may change from one call to the next.
tokenizeString is like strtok except that it
operates on strings of unsigned char rather than strings of
char.
The arguments to tokenizeString are as follows:
pszString_in
NUL-terminated character string, or NULL.
pszSeparate_in
NUL-terminated set of separator characters, or
NULL. If it is NULL, then the rest of the string is
returned as the token.
a pointer to the next token extracted from the input string, or
NULL if no more tokens exist
5.81.4 Example
#include "opaclib.h"
...
char szWhitespace_m[7] = " \n\r\t\f\v";
char szInputBuffer_m[1024];
char * pszToken;
...
for ( pszToken = tokenizeString(szInputBuffer_m, szWhitespace_m) ;
pszToken != NULL ;
pszToken = tokenizeString(NULL, szWhitespace_m) )
{
...
}
...
`tokenize.c'
#include "opaclib.h" char * trimTrailingWhitespace(char * pszString_io);
trimTrailingWhitespace removes any trailing white space
characters from the input string.
trimTrailingWhitespace has one argument:
pszString_io
a pointer to the beginning of the input string
5.82.4 Example
#include "opaclib.h"
...
static char szWhitespace_m[7] = " \t\r\n\f\v";
...
FILE * pRulesFP;
unsigned uiLineNumber;
char * pszToken;
...
for ( uiLineNumber = 1 ;;)
{
pszToken = readLineFromFile(pRulesFP, &uiLineNumber, ';');
if (pszToken == NULL)
break;
/*
* skip leading spaces and remove trailing spaces
*/
pszToken += strspn(pszToken, szWhitespace_m);
if (*pszToken == NUL)
continue;
trimTrailingWhitespace(pszToken);
...
}
`trimspac.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
void unlinkStringList(StringList ** ppList_io);
unlinkStringList frees the StringList data structures in
a list of strings, while leaving intact the strings they point to.
The arguments to unlinkStringList are as follows:
ppList_io
none
5.83.4 Example
#include "strlist.h" ... StringList * pList; ... unlinkStringList(pList); pList = NULL;
`unlst_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
char * updateStringList(StringList ** ppList_io,
const char * pszString_in);
updateStringList adds the string to the list if it is not
already in the list. This function is similar to
mergeIntoStringList, except that it has a different argument and
returns a different value.
The arguments to updateStringList are as follows:
ppList_io
pszString_in
a pointer to the copy of pszString_in stored in the list of
strings
5.84.4 Example
#include "strlist.h" ... static StringList * pCategories_m; static char szBuffer_m[100]; ... char * pszCategory; ... pszCategory = updateStringList( &pCategories_m, szBuffer_m ); ...
`updat_sl.c'
#include "trie.h" /* or opaclib.h */
void walkTrie(Trie * pTrieHead_in,
void (* pfWalk_in)(void * pList_in));
walkTrie walks through a trie, processing the information stored
at each node.
The arguments to walkTrie are as follows:
pTrieHead_in
pfWalk_in
pList_in
Trie node
(Trieinfo).
none
5.85.4 Example
#include <stdio.h>
#include "trie.h"
#include "rpterror.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next item */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
FILE * pLexiconFP_m;
...
static void write_lex_items(void * pList_in)
{
LexItem * pLex;
if (pLexiconFP_m == NULL)
return;
for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink )
{
fprintf(pLexiconFP_m, "%-20s %-20s %s\n",
pLex->pszForm, pLex->pszGloss,
get_lexical_category_name(pLex->uiCategory));
}
}
void write_lexicon()
{
if (pszLexiconFile_in == NULL)
{
reportError(WARNING_MSG, "Missing output lexicon filename\n");
return;
}
pLexiconFP_m = fopen(pszLexiconFile_in, "w");
if (pLexiconFP_m == NULL)
{
reportError(WARNING_MSG,
"Cannot open lexicon file %s for output\n",
pszLexiconFile_in);
return;
}
walkTrie(pLexicon_g, write_lex_items);
fclose(pLexiconFP_m);
}
`trie.c'
#include "allocmem.h" /* or opaclib.h */
void writeAllocMemoryDebugMsg(const char * pszFormat_in,
...);
writeAllocMemoryDebugMsg writes a message to the memory
allocation tracing file if it is open, and does nothing if that file is
not open. The memory allocation tracing file is opened and closed by
setAllocMemoryTracing. writeAllocMemoryDebugMsg is
similar to printf except that it writes to a specific (optional)
file rather than to the standard output.
The arguments to writeAllocMemoryDebugMsg are as follows:
pszFormat_in
printf style format string for the message.
...
pszFormat_in).
none
5.86.4 Example
#include "allocmem.h"
#include "strlist.h"
...
StringList * pStrings;
...
writeAllocMemoryDebugMsg("deleting %u strings\n",
getStringListSize(pStrings));
freeStringList(pStrings);
pStrings = NULL;
`allocmem.c'
#include "change.h"
void writeChange(const Change * pChange_in,
FILE * pOutputFP_in);
writeChange writes the given Change data structure to the
output file as a human readable string consisting of a pair of quoted
strings followed by the environment constraint (if any).
The arguments to writeChange are as follows:
pChange_in
pNext
field of the Change data structure is ignored.)
pOutputFP_in
none
5.87.4 Example
#include <stdio.h>
#include "change.h"
...
void writeChangeList(FILE * pOutputFP_in, Change * pChanges_in)
{
Change * cp;
if (pOutputFP_in == NULL)
return;
for ( cp = pChanges_in ; cp ; cp = cp->pNext )
writeChange(cp, pOutputFP_in);
}
`change.c'
5.88.1 Syntax
#include "record.h"
void writeCodeTable(FILE * pOutputFP_in,
const CodeTable * pTable_in);
writeCodeTable writes the contents of a CodeTable data
structure to a file. The output is useful only for debugging.
The arguments to writeCodeTable are as follows:
pOutputFP_in
pTable_in
CodeTable data structure.
none
5.88.4 Example
#include "record.h"
#include "ample.h"
AmpleData sAmpleData_g;
char szCodesFilename_g[100];
...
loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE);
writeCodeTable( sAmpleData_g.pLogFP,
sAmpleData_g.pPrefixTable );
`loadtb.c'
#include "strclass.h" /* or change.h or textctl.h or template.h
or opaclib.h */
void writeStringClasses(FILE * pOutputFP_in,
const StringClass * pClasses_in);
writeStringClasses writes the contents of all the string classes
in the list to a file.
The arguments to writeStringClasses are as follows:
pOutputFP_in
pClasses_in
none
5.89.4 Example
#include <stdio.h> #include "strclass.h" ... static StringClass * pClasses_m; ... writeStringClasses(stdout, pClasses_m); ... }
`strcla.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h
or template.h or opaclib.h */
void writeStringList(const StringList * pList_in,
const char * pszSep_in,
FILE * pOutputFP_in);
writeStringList writes a list of strings to an output file,
separating the individual strings in the list by the indicated string.
The arguments to writeStringList are as follows:
pList_in
pszSep_in
pOutputFP_in
none
5.90.4 Example
#include <stdio.h>
#include "strlist.h"
...
static StringList * pCategories_m;
...
void showCategories()
{
printf("Categories: ");
writeStringList(pCategories_m, " ", stdout);
printf("\n");
}
`write_sl.c'
#include "template.h" /* or opaclib.h */
void writeTemplate(FILE * pOutputFP_in,
const char * pszFilename_in,
const WordTemplate * pTemplate_in,
const TextControl * pTextCtl_in);
writeTemplate writes the results of a morphological analysis as
a database. Each word is a record with these fields:
\a
\d
\cat
\p
\fd
\u
\w
\f
\c
\n
Ambiguities are marked as %n%Anal1%Anal2%...%analn%.
Failures are marked as %0%OriginalWord% or %0%%.
(The separation character can be set to something other than %.)
The arguments to writeTemplate are as follows:
pOutputFP_in
pszFilename_in
pTemplate_in
pTextCtl_in
none
5.91.4 Example See section 5.38 freeWordTemplate.
5.91.5 Source File `dtbout.c'
#include "template.h" /* or opaclib.h */
void writeTextFromTemplate(FILE * pOutputFP_in,
const WordTemplate * pTemplate_in,
const TextControl * pTextCtl_in);
writeTextFromTemplate writes the results of a morphological
synthesis to an output file, restoring all the formatting information
associated with the word in the original input to analysis.
Ambiguities are marked as %n%Word1%Word2%...%Wordn%.
Failures are marked as %0%OriginalWord%.
(The separation character can be set to something other than %.)
The arguments to writeTextFromTemplate are as follows:
pOutputFP_in
pTemplate_in
pTextCtl_in
none
5.92.4 Example See section 5.65 readTemplateFromAnalysis.
5.92.5 Source File `textout.c'
#include "trie.h" /* or opaclib.h */
void writeTrieData(Trie * pTrieHead_in,
void (* pfWriteInfo_in)(void * pList_in,
int iIndent_in,
FILE * pOutputFP_in),
FILE * pOutputFP_in);
writeTrieData walks through a trie, writing the information
stored at each node to a file. This is intended primarily for
debugging, as the trie structure is explicitly written to the output
file in indented form, together with the information stored in the
trie.
The arguments to writeTrieData are as follows:
pTrieHead_in
pfShowInfo_in
pList_in
Trie node
(Trieinfo).
iIndent_in
pOutputFP_in
pOutputFP_in
none
5.93.4 Example
#include <stdio.h>
#include "trie.h"
#include "rpterror.h"
...
typedef struct lex_item {
struct lex_item * pLink; /* link to next item */
struct lex_item * pNext; /* link to next homograph */
unsigned char * pszForm; /* lexical form (word) */
unsigned char * pszGloss; /* lexical gloss */
unsigned short uiCategory; /* lexical category */
} LexItem;
...
Trie * pLexicon_g;
...
static void debug_lex_items(void * pList_in,
int iIndent_in,
FILE * pOutputFP_in)
{
LexItem * pLex;
int i;
if (pOutputFP_in == NULL)
return;
for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink )
{
for ( i = 0 ; i < iIndent_in ; ++i )
fputc(' ', pOutputFP_in);
fprintf(pOutputFP_in, "%-20s %-20s %u [%lu -> %lu]\n",
pLex->pszForm, pLex->pszGloss, pLex->uiCategory,
(unsigned long)pLex, (unsigned long)pLex->pNext);
}
}
void debug_lexicon()
{
printf("BEGIN LEXICON TRIE DATA\n");
writeTrieData(pLexicon_g, debug_lex_items, stdout);
printf("END LEXICON TRIE DATA\n");
}
`trie.c'
#include "template.h"
void writeWordAnalysisList(const WordAnalysis * pAnalyses_in,
FILE * pOutputFP_in);
writeWordAnalysisList writes a list of WordAnalysis data
structures to an output file for debugging purposes.
The arguments to writeWordAnalysisList are as follows:
pAnalyses_in
WordAnalysis data structures.
pOutputFP_in
none
5.94.4 Example
#include <stdio.h>
#include "template.h"
...
void dumpWordTemplate(pTemplate_in, pOutputFP_in)
WordTemplate * pTemplate_in;
FILE * pOutputFP_in;
{
if (pOutputFP_in == NULL)
return;
if (pTemplate_in == NULL))
{
fprintf(pOutputFP_in, "WordTemplate ptr is NULL\n");
return;
}
putc('\n', pOutputFP_in);
fprintf(pOutputFP_in, " orig_word = \"%s\"\n",
pTemplate_in->pszOrigWord ? pTemplate_in->pszOrigWord : "{NULL}" );
fprintf(pOutputFP_in, " word = \"%s\"\n",
pTemplate_in->paWord && pTemplate_in->paWord[0] ?
pTemplate_in->paWord[0] : "{NULL}" );
fprintf(pOutputFP_in, " format = \"%s\"\n",
pTemplate_in->pszFormat ? pTemplate_in->pszFormat : "{NULL}" );
fprintf(pOutputFP_in, " non_alpha = \"%s\"\n",
pTemplate_in->pszNonAlpha ? pTemplate_in->pszNonAlpha : "{NULL}" );
fprintf(pOutputFP_in, " capital = %d\n", pTemplate_in->iCapital );
writeWordAnalysisList(pTemplate_in->pAnalyses, pOutputFP_in);
fprintf(pOutputFP_in, " new_words = ");
if (pTemplate_in->pNewWords)
{
fprintf(pOutputFP_in, "\"");
writeStringList( pTemplate_in->pNewWords, "\" \"", pOutputFP_in);
fprintf(pOutputFP_in, "\"\n");
}
else
fprintf(pOutputFP_in, "{NULL}\n");
}
`wordanal.c'
#include "textctl.h" /* or template.h or opaclib.h */
void writeWordFormationChars(FILE * pOutputFP_in,
const TextControl * pTextCtl_in);
writeWordFormationChars writes the set of word formation
characters to an output file. This function depends on previous calls
to addWordFormationChars and addLowerUpperWFChars.
The arguments to writeWordFormationChars are as follows:
pOutputFP_in
pTextCtl_in
none
5.95.4 Example
#include <stdio.h>
#include "textctl.h"
...
static TextControl sTextCtl_m;
...
printf("The word formation characters are:\n");
writeWordFormationChars(stdout, &sTextCtl_m);
...
`myctype.c'
This document was generated on 11 May 2000 using texi2html 1.55k.