next up previous
Next: Parsing Up: Chunking Previous: Noun Phrase Recognition


Arbitrary Phrase Identification

Our work with chunks of arbitrary types6is similar to that with noun phrase chunks apart from two facts. First, we refrained from using feature selection methods. Applying these methods did not gain us much for noun phrase chunking but they required a lot of extra computational work. Therefore we went back to using a fixed set of features in these experiments. The context size we used here was four left and four right for words and POS tags in the first pass over the data, and three left and three right for words and POS tags, and two left and two right without the focus for chunk tags in the second pass. This means that both first and second pass use 18 features. The second pass has only been used for the four IO data representations. Table 2 shows that the second pass improved the performance of the first pass only by a small margin for the two bracket representations O and C.

The second difference between this study and the one for noun phrase chunks originates from the fact that apart from chunk boundaries, we need to find chunk types as well. We can approach this task in two ways. First, we could train the learner to identify both chunk boundaries and chunk types at the same time. We have called this approach the Single-Phase Approach. Second, we could split the task and train a learner to identify all chunk boundaries and feed its output to another classifier which identifies the types of the chunks (Double-Phase Approach). A computationally-intensive approach would be to develop learners for each different chunk type. They could identify chunks independently of each other and words assigned to more than one chunk could be disambiguated by choosing the chunk type that occurs most frequently in the training data (N-Phase Approach). Since we did not know in advance which of these three processing strategies would generate the best results, we have evaluated all three.

In order to find the best processing strategy and the best combination technique, we have performed several 10-fold cross-validation experiments on the training data. We have processed this data for each processing strategy and in each of the six data representations earlier used for noun phrase chunking. After this we have used the seven combination techniques presented in Section 2.3 for combining these. The results can be found in Table 4. Of the three processing strategies, the N-Phase Approach generally performed best with Double-Phase being second best and Single-Phase performing worst. Again, system combination improved all individual results. There were only small differences between the seven combination techniques when compared for the same processing approach. The only exception were the two stacked MBL classifiers applied to the Single-Phase Approach results. They did about 0.3 F$_{\beta =1}$ rate better than most of the other combination techniques.


Table 4: F$_{\beta =1}$ rates obtained for the three processing strategies, Single-Phase Approach (SP), Double-Phase Approach (DP) and N-Phase approach (NP), when applied to the training data of the CoNLL-2000 shared task (arbitrary chunking) while using five different data representations and seven system combination techniques. In all cases, system combination led to performances that were better than the individual system results. The computationally-intensive N-Phase Approach does better than the other two.
train SP DP NP
IOB1 90.68 91.59 92.02
IOB2 90.77 91.65 91.94
IOE1 90.94 91.60 91.90
IOE2 91.21 91.97 91.99
O+C 91.57 91.97 91.51
Majority 91.96 92.34 92.62
TotPrecision 91.97 92.34 92.62
TagPrecision 91.98 92.34 92.62
Precision-Recall 91.96 92.34 92.62
TagPair 92.08 92.34 92.65
MBL 92.32 92.35 92.75
MBL+ 92.40 92.32 92.72


The best result was generated with the N-Phase Approach in combination with a stacked memory-based classifier (MBL, 92.76). A bootstrap resampling test with 8000 random populations generated the 90% significance interval 92.60-92.90 which means that this result is significantly better than any Single-Phase or Double-Phase result. However, the N-Phase approach has a big computing overhead: the number of passes over the data is at least N times the number of representations. Therefore, we have chosen the Double-Phase Approach combined with Majority Voting for our further work. This approach combines a reasonable performance with computational efficiency. The Single-Phase Approach is potentially faster but its performance is worse unless we use a stacked classifier which requires extra combinator training data.

When we applied the Double-Phase Approach combined with Majority Voting to the CoNLL-2000 data sets, we obtained an F$_{\beta =1}$ rate of 92.50 (precision 94.04% and recall 91.00%). An overview of the performance rates of the different chunk types can be found in Table 5. Our system does well for the three most frequently occurring chunk types, noun phrases, prepositional phrases and verb phrases, and less well for the other seven. The chunk type UCP which occurred in the training data, was not present in the test data. With this result, our memory-based arbitrary chunker finished third of eleven participants in the CoNLL-2000 shared task. The two systems that performed better were Support Vector Machines (F$_{\beta =1}$=93.48, [Kudoh and Matsumoto(2000)]) and Weighted Probability Distribution Voting (F$_{\beta =1}$=93.32, [Van Halteren(2000)]).


Table 5: The results per chunk type of processing the test data with the Double Pass Approach and Majority Voting. Although the data is formatted differently than the noun phrase chunking data, the NP F$_{\beta =1}$ rate here (93.23) is close to that of our NP chunking F$_{\beta =1}$ rate (93.34).
test data precision recall F$_{\beta =1}$
ADJP 85.25% 59.36% 69.99
ADVP 85.03% 71.48% 77.67
CONJP 42.86% 33.33% 37.50
INTJ 100.00% 50.00% 66.67
LST 0.00% 0.00% 0.00
NP 94.14% 92.34% 93.23
PP 96.45% 96.59% 96.52
PRT 79.49% 58.49% 67.39
SBAR 89.81% 72.52% 80.25
VP 93.97% 91.35% 92.64
all 94.04% 91.00% 92.50



next up previous
Next: Parsing Up: Chunking Previous: Noun Phrase Recognition
Erik Tjong Kim Sang 2002-03-13