Here we explain how two-level rules work, how they can be implemented as finite-state machines, and all the types of rule constraints can be translated into finite-state tables. We then summarize the rule semantics. This is followed by a detailed discussion of rule conflicts; specficity and conflicts amongst SUBSETS; and finally, and explanation of the rule file format and the rules in the pc-kimmo file english.rul. It's a lot to read through... but I hope, complete, and will guide you through Spanish. 1. How two-level rules work. Consider Rule 2 (R2) below. R2 t:c ==> ____i The operator ==> means that lexical t is realized as a surface c only (but not always) in the environment preceding i:i. The correspondence t:c declared in R2 is a special correspondence. All two-level descriptions must also contain a set of *default* correspondences, such as t:t, i:i, etc. (This is the so-called "BOGUS RULE" - it isn't really bogus, it is a default.) The sum of the special and default correspondences are the total set of valid correspondences or feasible pairs that can be used in the description. If a two-level description containing R2 (and all default correspondences) is applied to the lexical (underlying) form "tati" (without the quote marks) PCKIMMO proceeds as follow to produce the corresponding surface form(s). (NOTE this is why you can use GENERATE without a dictionary and JUST the .rul file) Beginning with the first character of the input form, it looks to see if there is a correspondence declared for it. Due to R2, it will find that lexical t can correspond to surface c, so it will begin by positing that correspondence. Lexical: t a t i | | | | Rule: R2 | Surface: c At this point the generated has entered R2. For the posited t:c correspondence to succeed, the generator MUST find an i:i correspondence next - that is what R2 says. When the generator moves on to the second character of the input word, it finds that it is a lexical a, and thus R2 FAILS, so the generator must back up, undo what it has done so far, and try to find a different path. Backing up to the first character t, it now tries the DEFAULT correspondence t:t (which is guaranteed to succeed, since it has NO conditions): Lexical: t a t i | | | | Rule: R2 | Surface: t The generator now moves on to the second character. No correspondence for lexical a has been declared other than the default, so the generator posits a surface a: Lexical: t a t i | | | | Rule: R2 | | | Surface: t a Moving on to the third character, the generator again finds a lexical t, so it posits a surface c and enters R2 again: Lexical: t a t i | | | | Rule: R2 | R2 | | | Surface: t a c Now the generator looks at the fourth character, a lexical i. This SATISFIES the environment of R2, so it keeps the i (NOTE that the constraint refers only to a surface i, and says nothing about the lexical, underlying character): R2 t:c ==> ____i Since the context of R2 requires an i, the generator must also posit a surface i, so it does, and exits R2. NOTE that by the time R2 is finished, TWO characters will have been posited. Lexical: t a t i | | | | Rule: R2 | R2 | | | | | Surface: t a c i Since there are no more characters in the lexical form, the generator outputs the surface form "taci". However, the generator is not yet done. It will continue backtracking, trying to find alternative realizations of the lexical form. First, it will undo the i:i correspondence of the last character of the input word, then it will consider the third character, lexical t. Having already tried the correspondence t:c, it will try the default correspondence t:t: Lexical: t a t i | | | Rule: R2 | | | | | Surface: t a t i Now the generator will try the final correspondence and succeed, since R2 does NOT prohibit t:t before an i (rather, it prohibits t:c in any environment EXCEPT BEFORE i). It will then output "tati". The reader may confirm that no other outputs will be generated. 2. The ==> rule as a finite-state machine. A key insight of PCKIMMO is that if phonological rules are written as two-level rules, they can be implemented as FST's running in parallel. In the next 4 sections we briefly show how each of the four rule types (==>, <==, <==>, and \<==) translates to an FST. We then go on to describe conflicts in SUBSETS, and RULES. 2.1 A ==> rule. Consider rule R2 again. A possible paraphrase is, If ever the correspondence t:c occurs, it must be followed by i:i. In other words, if anything OTHER THAN t:c occurs, this rule ignores it. This must be incorporated into our two-level FST, call this T2 (for table 2) t i @ c i @ 1: 2 1 1 2. 0 1 0 The @:@ arc means ANY OTHER symbol than t, i, or c, i. State 2 is a kind of 'default'state that ignores everyting except the substring crucial to the rule. It is also the only final, accepting state. Importantly, the state table is constructed such that the entire set of feasible pairs in the rule description is partition among the column headers WITH NO OVERLAP (this is the source of MANY bugs in Kimmo rule systems). T2 specifies the special correspondence t:c and the environment in which it is allowed. (the machine goes to state 2 to anticipate that an i:i comes next - if it does, success, and goes to state 1; if not, it goes to state 0, the rejecting state.) The column header @:@ in T2 matches ALL the feasible pairs that are defined by ALL THE OTHER FSTs of the system - thus saying that R2 'takes a pass' and doesn't care about any other feasible pairs. So, with respect to T2, @:@ does not stand for all feasible pairs, rather, all feasible pairs except i:c and i:i. The default correspondences of the system must be declared in a trivial FST like T3: (also see below where we cover the .rul file format). If we assume p, t, k, a, i, u in our alphabet, then we need: p t k a i u @ p t k a i u @ 1: 1 1 1 1 1 1 1 (Table T3) Even this table of correspondences must include @:@ as a column. Otherwise, it would fail when it encountered a special correspondence such as t:c, because all the rule in a two-level description apply in parallel, and for each character in an input string ALL the rules must succeed, even if vacuously. Now, given the lexical form tatik, T2 and T3 together will generate the surface forms tatik and tacik. IMPORTANT. To understand how to represent two-level rules as state tables, we must understand what the rules really mean. It is a common tendency to think of them positively, that is, as statement of where the correspondence succeeds. IN FACT STATE TABLES ARE FAILURE DRIVEN, THEY SPECIFY WHERE THE CORRESPONDENCES MUST FAIL. This point is perhaps THE biggest source of difficulty in building the FSTs. In our case above, it is natural to think of R2 as saying that t:c succeeds when it occurs preceding i:i. But T2 actually works because it FAILS when ANYTHING BUT i:i follows t:c. 2.2 A <== rule. Now consider R4. R4 t:c <== ____i This rule says that lexical t is always realized as surface c when it occurs before i:i, but NOT ONLY BEFORE i:i. Thus, the lexical form tati will successfully match the surface form taci, but not tati. Note, however, it would also match "caci" since it does not disallow t:c in any environment. Rather, its function is to disallow t:t in the environment following i:i. Remember that state tables are failure-driven, so the strategy of writing the state table for R3 is to force it to fail if it recognizes the sequence t:t i:i. So the state table for R4, viz., T4, looks like this: T4 t t i @ c t i @ 1: 1 2 1 1 2: 0 2 0 1 In state 1, any occurrences of the pairs t:c, i:i, or any other feasible pairs are allowed without leaving state 1. It is only the correspondence t:t that forces a transition to state 2, where all feasible pairs succeed except i:i. Note that state 2 must be a final state - this allows all the correspondences to succeed and return to state 1. Also note that in state 2 the cell under the t:t column contains a 2. This is necessary to allow for the possibility of a tt sequence in the input. For example, tatti will surface as the form tatci. This phenomenon is called "backlooping" - more on this below. Actually T4 is potentially over-specified. It is not really the pair t:t that is disallowed before i, but rather the pair t:not-c (lexical t and surface anything but c) Given that the more specific correspondence t:c is already in the table, the more general correspondence t:@ will take care of all the rest of the characters, including t:t. (I'll leave the details of this to you..) In summary, the rule type L:S <==E positively says that L is ALWAYS realized as S in the environment E. Thus, it is a kind of OBLIGATORY rule. Negatively, it says that L is realized as any character but S is not allowed in E. The state table must be written so that it forces all correspondences of L with anything BUT S to fail. 2.3 A <==> rule. R5 t:c <==> ____i The state table for a <==> rule is simply the combination of the tables for ==> and <==. You build it by anding the two fst's together. So here, t:c MUST occur before i, and NOWHERE ELSE. We next turn to the problem of what can happen when you have more than one rule - rule conflicts, the use of SUBSETS, and overlapping character descriptions. 3.0 Writing rules: conflicts, SUBSETS, and character descriptions. Writing Rule Automata - Part 2 In this part we cover common issues that arise from using Subsets, rule conflicts, and character descriptions in subsets, as well as use of the word boundary (3), affix marker (+) and 0 symbols. We follow this with a detailed look at the rules for English. Don't worry - after the abstract bit that follows below, we immediately do an English example to illustrate it. As a summary for writing rules based on ==> <==, etc. given some Lexical (L) and Surface (S) correspondence, and the environment (E) in which it occurs, one can ask yourself: (a) is E the ONLY environment in which L:S is allowed? (b) Must L ALWAYS be realized as S in the environment E? (forgetting for the moment whether E means left or right context, or both) There are 4 possible outcomes. Depnding on the outcome: (1) If (a) is Yes and (b) is No, the rule is L:S ==> E (2) If (a) is No and (b) is Yes, the rule is L:S <== E (3) If both (a) and (b) are YES, the rule is L:S <==> E. (this means ("if and only if" - this is the ONLY place you see this (correspondence, and it MUST be like this) (4) If neither is Yes, find the other environments in which L:S is (allowed, combine these into a single disjunctive environment, and go through the exercise again. IF YOU LOOK AT THE .rul file for English, you will note that there are a series of ==> and <== rules. For instance, there are 2 rules for Epenthesis: one that is written ==>, and one that is written /<==, or These are rules 3 and 4 i the .rul file. Indeed, all the English rules are written in this paired form. The reason for this is simple: the basic lexical:surface correspondences may best be stated as <==> ("if and only if" form). But to write <==> rules as state tables, as we explain for epenthesis and in general below, we SPLIT them into two parts, one is ==> and the other is either <== or /<==. Let's see how this works for y-i spelling in English (spy+ed/spied), to make the rule tables, and state the general principles. Here are paraphrases for the ==>, <==, <==> and /<== forms Remember that 'rc' denotes 'right context' and 'lc' denotes 'left context' - this can be a string of symbols. ==> ONLY BUT NOT ALWAYS: (1) The rule L:S ==> lc___rc means "the expression lc L:S rc is allowed, but L:S in any other context is NOT allowed." <== ALWAYS BUT NOT ONLY: (2)The rule L:S <== lc__rc can be paraphrased as, "the expression lc L:notS rc is not allowed". <==> ALWAYS AND ONLY: (3) The rule L:S <==> lc___rc can be paraphrased as "the expression lc L:S rc must be allowed, in this environment and only this rc___lc environment, and no other L:S correspondence is allowed in this rc__lc environment" /<== NEVER: (4) The rule L:S /<==lc___rc can be paraphrased as "the expression lc L:S rc is not allowed". OK, let's try this for y-i spelling. We can observe the following for English, that a lexical y is realized as a surface i after a consonant and before a suffix: Lexical forms: boy+s spy+0s spy+ed happy+ly spot0+y+ness Surface forms: boy0s spi0es spi0ed happi0ly spott0i0ness Please note how we use the zeroes to make sure each lexical/surface form is the same length, a mathematical requirement for composition as you may recall. As soon as we have padded out boys into boy0s, that forces the pairing of +:0. Once we have done that, it must appear as such in all the other forms. Note that in the word 'spottiness' y-i spelling occurs after a consonant that is inserted by another rule, gemination (consonant doubling, big-bigger). This means that y-i spelling applies after a surface consonant, expressed as @:C. OK, on to automaton building. The rule for y-i spelling therefore looks like this. y-i Spelling R5 y:i <==> :C___+:0 NOW, to build the automata, we will take each <==> rule and break it down into TWO automata, one to handle the ==> part, and one to handle the <== part. In general, this is true of ALL the rules in the english.lex file - they are rules of the form <==> (if and only if), and are then broken down into two. Here is the ==> form. RULE "6 y:ispelling y:i ==> :C___+:0" 3 4 And here is the <== form. (This is not exactly what's in the english.lex file, as we shall see - this rule will need to be revised because it interacts with another rule, Epenthesis (the rule that inserts an e, as in fox+s/foxes) --- consider try+ing/trying --- here the i does not get spelled as y. RULE "6 y:ispelling y:i <== :C___+:0" 3 5 We shall start with the table for the ==> portion. Here is how to map each kind of correspondence into an automaton table. First, ==> rules The strategy for writing a state table for this is to construct a table that recognizes the sequence lc L:S rc, FORBIDS any other occurrence of L:S, and permits ANY OTHER correspondences to occur anywhere. The steps are as follows. Note that they close to algorithmic spec, so one *could* write a compiler from rules to tables.... (a project for one or more of you?) 1. To convert each ==> rule a. [Make column headers] Make a list of column headers for the fsa by writing down all the correspondences used in the expression lc L:S rc (including correspondences with @ and subset names). Add @:@ to the end of the list b. [Recognize context and L:S string] Beginning with state 1, add states (rows) and fill in the state transitions in the appropriate cells in the fsa table to recognize precisely the expression lc L:S rc. The final symbol in the expression normally should result in a transition back to state 1 (except see step 8 below - there is one exception called 'backlooping'.) c. [Mark final, nonfinal states] Mark state 1 with a colon to indicate that it is a final state (e.g., 1:). Mark every state that is traversed before L:S is reached as a final state. Mark all states while L:S is being recognized (ie, it is more than 1 character long) as final). Use a period to mark all states traversed *after* that point as nonfinal states. That is, once L:S is encountered, it is not in the correct environment unless the full right context is found, thus these states cannot be final. d. [Forbid L:S in any other context] Since L:S in any other environment is not allowed, fill in the rest of the COLUMN for L:S with zeroes. Further, in any state traversed during the recognition of right context, any correspondences other than those provided for in rc means that L:S is in the WRONG context. Thus, the rest of the cells for the states traversed in rc should be filled with zeroes. e. [Permit any other correspondence] All remaining cells in the transition table denote successful transitions as far as THIS rule is concerned. In most cases, these cells are filled with transitions back to the initial state (state 1) except in the case where backlooping applies - that is, a state that represents the characters matching the first character, or sequence of characters, in the expression lc L:S rc. Transitions must be specified that represent the successful recognition of that character or sequence of characters, rather than state 1. (Why do we have to do this? Because if there are TWO left-context starting symbols in a row -- we want to make sure that we don't fail to recognize the second one. We want to 'loop' until we get to the very last one, which would be the real start of the left context.) OK, let's apply this to our y-i case. RULE "6 y:ispelling y:i ==> :C___+:0" 3 4 Step a. Form character columns. If we collect all the symbols we need that are mentioned in the context and the L:S itself, we get: y:i, @:C, and +:0. To this we add as usual @:@. So here is our table so far: @ y + @ C i 0 @ ---------- Step b. Recognize the context and L:S string. We want to write a 'straight line' fsa that will recognize exactly the string @:C y:i +:0 So this takes three states: 1-->@:C-->2-->y:i-->3-->+0-->1 @ y + @ C i 0 @ ---------- 1 2 2 3 3 1 Step c. Mark final and nonfinal states. State 1 is marked as final (as usual). State 2 is on the way to recognizing the context and L:S string, so it is marked final. What about state 3? By that time, since this marks the last transition to recognizing the context and goes back to state 1, a final state, we're done, so state 3 doesn't have to be marked as final. @ y + @ C i 0 @ ---------- 1: 2 2: 3 3. 1 Step d. Forbid L:S in any other context. Put 0's in all of the other places in the column for the L:S pair, i.e., y:i: @ y + @ C i 0 @ ---------- 1: 2 0 2: 3 3. 0 1 Put 0's for any other state, correspondene pair after this recognition point, that is, for state 3 and beyond (that means we fill in the remaining part of the row for state 3 with 0's: @ y + @ C i 0 @ ---------- 1: 2 0 2: 3 3. 0 0 1 0 Step e. Allow L:S in all other contexts. Fill in transitions to state 1 for all other cells - EXCEPT for those that represent the beginning of the left context, @:C -- these are the 'backlooping' states, where we must 'idle' on a string of @:C's until we hit hte last one: @ y + @ C i 0 @ ---------- 1: 2 0 1 1 2: 3 1 3. 0 0 1 0 Now, finally, add in backloops from any other states back to the state that represents the (string of) symbols beginning the left context. All other transitions in this column not otherwise filled in must be 2. Recall that the effect of this is to 'loop' on the left context starting symbol @:C. That gives us: @ y y + @ C i @ 0 @ ------------- 1: 2 1 1 1 1 2: 2 2 3 1 1 3: 2 1 1 0 1 The steps in building a L:S /<== rc___lc table are as follows. Please NOTE the difference brought in by the negation - in Step b below, we arrive at failure (state 0) rather than success (state 1). Step a. [Form column headers]. Write down all the correspondenes used in the expression lc L:S rc (including those with @ and subset names). Add @:@ to the end of the list. Step b. [Recognize expression as failure] Beginning with state 1, add states and fill in the transitions to recognized lc L:S rc. The final symbol in the expression should result in failure (ie the cell representing recognition should contain 0). Step c. [Mark states]. Use a colon to mark EVERY state as a final state. Step d. All remaining cells denote successful transitions as far as this rule is concerned. In most cases, the transitions are back to state 1, except if backlooping occurs, as before. Applying this to our y:i <== rule, we get the following: context, i.e., @:C, state 2. @ y + @ C i 0 @ ---------- 1: 2 0 1 1 2: 2 3 2 1 3. 0 0 1 0 And, whew, we are done. Now, what about writing out <== rules? We need this method. The method for writing a <== rule as a fsa table is to construct a table that recognizes the sequence lc L: NOT S rc and forbids it anywhere else, while permitting any other correspondence to occur anywhere. The steps in building the table are: Step a. [Form column headers.] Put down L:S. Then, put down L:@, which now represents L: NOT S. Next, write down all correspondences used in lc and rc (including the correspondences with @ and subset names). Add @:@ to the end of the list. Step b. [Form recognition sequence]. Beginning with state 1, add states (rows) and fill in state transitions to recognize the expression lc L:@ rc. (This is a straightline automaton as we did above - BUT NOTE the diffence from ==>) The final symbol in the expression should result in FAILURE (the cell should be a 0) - rather than, as before, success (state 1). Step c. Use a colon to mark EVERY state as a final state. Step d. All remaining cells in the transition table denote successful transitions as far as this rule is concerned. In most cases, these are filled with transitions back to the initial state (state 1), except in the case of backlooping, as before. OK, now let's apply this to our <== y:i spelling rule. RULE "5 y:ispelling y:i ==> :C___+:0" 3 5 Step a. Form column headers. We first add y:i. Then, we add y:@. Now we add the context pairs, @:C and +:0. Finally, we add @:@. @ y y + @ C i @ 0 @ ------------- Step b. Add the straight-line fsa to recognize the string @:C y:@ +:0: 1-->@:C-->2-->y:@-->3-->+:0-->0 @ y y + @ C i @ 0 @ ------------- 1 2 2 3 3 0 Step c. Mark final states --- all states are final. @ y y + @ C i @ 0 @ ------------- 1: 2 2: 3 3: 0 Step d. Add success transitions for all other states, except for the column headed by @:C which is the beginning of the left context. @ y y + @ C i @ 0 @ ------------- 1: 2 1 1 1 1 2: 1 3 1 1 3: 1 1 0 1 Now add the backloop states to state 2, on character @:C: @ y y + @ C i @ 0 @ ------------- 1: 2 1 1 1 1 2: 2 1 3 1 1 3: 2 1 1 0 1 HANDLING RULE INTERACTIONS. That's enough for one rule. What happens when we have another, though? Consider the EPENTHESIS RULE - the one in English that inserts an 'e' into fox+s --> foxes. Note that gemination INTERACTS with the y:i rule --- because it applies AFTER y:i. (WHY? Because we have the forms spy+s/spi0es --- the 'e' has been inserted by gemination.) We must therefore take this into account in the Gemination rule. In fact, if you look at the Gemination rule in the english.lex file, you will see that there is a column headed by y:i. NOTE that if you just wrote down the gemination rule all by itself, without considering y:i, you need not include this column. (Try it, for practice --- the resulting automaton table is something like this, in one direction: RULE "3 Epenthesis 0:e ==> Csib +:0___s#" 5 6 s Csib + 0 # @ s Csib 0 e # @ ------------------- 1: 2 2 1 0 1 1 2: 2 2 3 0 1 1 3: 2 2 1 4 1 1 4. 5 0 0 0 0 0 5. 0 0 0 0 1 0 ) So THIS is where you have to think about rule interactions - and you can do this by thinking about possible lexical, surface pairs - as in the spies example. 3. Writing rules: SUBSETS, rule conflicts; conflicting character descriptions in subsets. 3.1 Using SUBSETS in Automata tables Assume that your description contains these subsets. SUBSET D t d s SUBSET P c j ^ SUBSET Vhf i e Here is a rule using these subsets. It states that 'alveolar' consonants in subset D may be realized as 'palatized' consonants (i.e. those made with the tongue at the roof of the mouth), when they occur preceding the high, front vowels in the subset Vhf. R5 Palatization D:P ==> ____Vhf Specifically, we want the subset correspondence D:P to stand for the feasible pairs t:c, d:j, and s:^. Translating this into a state table is straightforward: T5 Palatization D Vhf @ P Vhf @ ------------ 1: 2 1 1 2. 0 1 0 However, this automaton will produce NO correct results unless the feasible pairs t:c, d:j, and s:^ are declared explicitly. The pairs must appear as column headers in a table somewhere in the description. This is typically done by constructing a table specifically for the purpose of declaring special correspondences. For example, the following table declares the feasible pairs we want for the column header D:P. Palatization correspondences t d s @ c j ^ @ --------------- 1: 1 1 1 1 Similarly, the feasible pairs that Vhf:Vhf stand for (i:i, e:e, etc.) must be declared. Since in this case the feasible pairs are also just default correspondenes, they would typically be included in the table with all the other default correspondences. 3.2 Overlapping Column Headers and Specificity Using subsets in rules often leads to a situation where a state table has column headers that potentially overlap. THIS IS A COMMON SOURCE OF BUGS in KIMMO rule tables. In such a case unexpected results may occur. For example, consider this rule, which states that t:c occurs between any vowel and i: t:c ==> V__i A first attempt at writing a state stable for this rule might look like this: V t i @ V c i @ ------------- 1: 2 0 1 1 2: 2 3 1 1 3. 0 0 1 0 Given the lexical form "mati" this table will correctly produce the surface form "maci". BUT given the form "miti" it will fail to produce the expected result "mici". This is because of the interaction between the column headers V:V and i:i. Because i:i is also an instance of V:V, we might expect that the first i in the input would match V:V and cause a successful transition to state 2. BUT this is not the case. For each table in a Kimmo description, the ENTIRE set of feasible pairs must be partitioned among the column headers WITH NO OVERLAP. When PC-KIMMO interprets the column headers of a table, it scans the list of all the feasible pairs and assigns each one to a column header. IF a feasible pair matches more than one column header, it assigns it to the MOST SPECIFIC one, where specificity is defined as the number of feasible pairs that matches it. In order to see exactly how the feasible pairs are assigned to the column headers of a rule, use the SHOW RULE command. Thus in the table above, the feasible pair i:i matches both the column headers i:i and V:V, but because i:i is more specific than V:V, the pair i:i is assigned to the column header i:i. This means that the column header V:V stands for all the feasible pair of vowels EXCEPT i:i. And i:i matches ONLY i:i. To work correctly, the automaton must allow i:i to be an instance of V:V in the left context, by placing a 2 i the states 1 and 2 under the i:i header. Note also that the order of the columns has NO effect on which column header an input pair is matched to (the automata are all applied in parallel). The revised table is: V t i @ V c i @ ------------- 1: 2 0 2 1 2: 2 3 2 1 3. 0 0 1 0 Now consider a description that constains a subset Vrd for rounded vowels and a subset Vhi for high vowels: SUBSET Vrd o u SUBSET Vhi i e o u Note that Vhi properly includes Vrd. Assume that we have a rule: t:c ==> Vrd___Vhi Our first attempt at a state table for this rule might look like this: Vrd t Vhi @ Vrd c Vhi @ --------------- 1: 2 0 1 1 2: 2 3 1 1 3. 0 0 1 0 But the feasible pairs o:o and u:u, which match both the Vrd:Vrd and Vhi:Vhi column headers must belong to the Vrd:Vrd column since it is more specific. Thus, the Vhi column represents only the pairs i:i and e:e. This means that a lexical form such as "utu" will NOT produce the expected surface form "ucu" because the second u will always match Vrd, not Vri. This problem is fixed by included u:u and o:o as column headers: Vrd t Vhi u o @ Vrd c Vhi u o @ --------------------- 1: 2 0 1 2 2 1 2: 2 3 1 2 2 1 3. 0 0 1 1 1 0 The solution, then, in cases of overlapping column headers it to explicitly include as headers in the table the feasible pairs that belong to both headers. 3.3 Expressing Word Boundary environments. Consider a rule that states that stop consonants (like b, d, p that 'stop' the air flow) are devoiced (vocal cords top vibrating) when they occur in word final position: example: lexical form mabab surface form mabap Here, the voiced b changes to an unvoiced p at the end of a word. Assume the subsets for voiced stops (B) and voiceless stops (P): SUBSET B b d g SUBSET P p t k We might write the rule as follows. Remember that <==> is used when a correspondence occurs in a given environment and in NO OTHER environment -- ie, only at the end of words. It means the correspondence is allowed if and only if (iff) it is found in the specified context. Devoicing B:P <==> ____# The corresponding state table is written with #:# as the column header representing the word boundary. Note that a boundary symbol used in a column header can ONLY correspond to another boundary symbol; that is, correspondences such as #:0 are ILLEGAL. B B # @ B @ # @ --------------- 1: 3 2 1 1 2: 3 2 0 1 3. 0 0 1 0 Rules that refer to an initial word boundary are written in a similar way. 3.4 Rule Conflicts This is all fine when one is writing just one rule, but of course you will need more than one. Then the rules might conflict - let us see how. The two main types of rule conflicts are the ==> (or environment) conflict, and the <== (or realization) conflict. 3.4.1 The ==> conflict arises when two conditions are met: (1) Two ==> rules have the same correspondence on the left side of a rule, but (2) They have DIFFERENT environments on the right side. For example: p:b ==> V____V (here, V is vowel - this is 'intervocalic voicing') p:b ==> m___ (voicing after a nasal sound) Since the rule operator ==> means that the correspondence can occur only in the specified environment, these two rules contradict each other. The simplest resolution of the conflict is to combine the two rules into one, with a disjunctive environment: Voicing p:b ==> [V__V | m__] where | is the usual BNF 'or' The state table for this rule looks like this: V m p @ V m p @ ---------- 1: 2 4 0 1 2: 2 4 3 1 3. 2 0 0 0 4: 2 4 1 1 where states 1, 2, and 3 correspond to the V___V part of the first rule, and states 1 and 4 correspond to the m___ part. 3.4.2 A <==> conflict. Suppose the rules above had been written as <==> (i.e., if and only if) instead of ==> p:b <==> V____V (here, V is vowel - this is 'intervocalic voicing') p:b <==> m___ (voicing after a nasal sound) Their two state tables will again have problems - they won't work because <==> contains the ==> conflicts as before, and the two sides of the ==> parts will conflict. The solution is to write a disjunctive automaton table to represent p:b ==> [V__V | m__] where | is the usual BNF 'or' exactly as before, and then write two automaton tables to represent p:b <== V____V p:b <== m___ We leave these last two as exercises. They are easy to do.