Next: Bibliography Up: Memory-Based Shallow Parsing Previous: Parsing

Concluding Remarks

We have presented memory-based approaches to shallow parsing and we have applied these to five tasks: noun phrase chunking, arbitrary chunking, clause identification, noun phrase parsing and full parsing. We have used two additional techniques for improving the performance of our shallow parsers: feature selection and system combination. The first was used to compensate for a problem of the memory-based learner: it has difficulty with ignoring features that are not immediately relevant. While feature selection worked well in one study (clause identification with large feature sets), it did not make much difference to the overall performance of our noun phrase chunker. We believe that other techniques that were incorporated in the chunker (cascading and system combination) have already stretched the performance of the system to its limits. Therefore there might not have been much left to gain by using feature selection. System combination has proved to be quite useful for generating base phrases. Unfortunately, we could not apply it for higher level chunks because our method for producing different system results, using different data representations, failed to produce results for higher level phrases that could be improved with the Majority Voting technique we used for chunking.

A comparison of our work with other studies revealed that our approach works well for base phrase identification, but not for finding embedded structures. We have made a couple of suggestions for improving the performance on tasks that require generating embedded structures: provide different features to the learners, try to find a method which allows combination of different systems when working on higher level phrases and replace the greedy phrase selection approach currently used by one that allows backtracking from earlier choices. However, while further improvement is interesting from a scientific point of view, it might not be useful from a practical point of view. Our present method is already slower than state-of-the-art full parsers and it requires more memory. Extra improvements to this approach will probably slow it down even more without guaranteeing state-of-the-art performance.

We would like to thank our colleagues of CNTS - Language Technology Group, University of Antwerp, Belgium and ILK, University of Tilburg, The Netherlands, the members of the TMR-LCG network, in particular James Hammerton, and two anonymous reviewers for valuable discussions and comments. We are grateful to Xavier Carreras for his cooperation in the comparison study of his clause identification system with ours. This study was funded by the European Training and Mobility of Researchers (TMR) network Learning Computational Grammars.¹⁴

Next: Bibliography Up: Memory-Based Shallow Parsing Previous: Parsing

Erik Tjong Kim Sang 2002-03-13