Noun Phrase Parsing

Next: Full Parsing Up: Parsing Previous: Clause Identification

Noun Phrase Parsing

Noun phrase parsing is similar to noun phrase chunking but this time the goal is to find noun phrases at all levels. This means that just like in the clause identification task we need to be able to recognize embedded phrases. The following example sentence will illustrate this:

In ( early trading ) in ( Hong Kong ) ( Monday ) , ( gold ) was quoted
at ( ( $ 366.50 ) ( an ounce ) ) .

This sentence contains seven noun phrases of which the one containing the final four words of the sentence consists of two embedded noun phrases. If we use the same approach as for clause identification, retrieving brackets of all phrase levels in one step and balancing these, we will probably not detect this noun phrase because it starts and ends together with other noun phrases. Therefore we will use a different approach here.

We will recover noun phrases at different levels by performing repeated chunking [Tjong Kim Sang(2000a)]. We will start with data containing words and part-of-speech tags and identify the base noun phrases in this data with techniques used in our noun phrase chunking work. After this we will replace the phrases that were found by the head words and their tags. This will create a summary of the sentences with words and a mixed data stream of POS tags and chunk tags. We can apply our noun phrase chunking techniques to this data one more time and find noun phrases one level above the base level. The compressing and chunking steps will be repeated in order to retrieve phrases at higher levels. The process will stop when no new phrases are found.

The approach described here seems a trivial expansion of our noun phrase chunking work. However, there are some details left to discuss. First, there is the selection of the head word duing the phrase summarization process. At the time we performed these experiments, we did not have access to the Magerman/Collins set of rules for determining head words, and therefore we used a rule created by ourselves: the head word of a noun phrase is the final word of the first noun cluster in the phrase or the final word of the phrase if it does not contain a noun cluster.

The second fact we should mention, is that the data we used contains a different format of noun phrase chunks than the data we previously have worked with. In this task we use the data set which was developed for the noun phrase bracketing shared task of CoNLL-99 [Osborne(1999)]. It was extracted from the Wall Street Journal part of the Penn Treebank [Marcus et al.(1993)] without extra modifications and this means, for example, that possessives between two noun phrases have been attached to the first one unlike in the noun phrase chunking data. This and other differences make that we cannot be sure that the techniques we developed for the other base noun phrase format will work very well here. Indeed, there is a performance drop in the chunking part of our shallow parser when compared with the chunking work (F $_{\beta =1}$ of 92.77 compared with 93.34). However, we decided not to put extra work in searching for a better configuration for our noun phrase chunker and have trained an existing chunker with the data available for this task.

An unforeseen problem occurred when we attempted to use the chunker for identifying noun phrases above the base level. Our chunker output is a majority vote of five systems using different data representations. In our evaluation work with tuning data (WSJ section 21), we observed that the overall output of the chunker at nonbase levels was worse than the performance of the best individual system [Tjong Kim Sang(2000a)]. The reason for this is that the system that used the O+C data representation, outperformed the other four systems by a large margin. Because of this, and probably because the other four systems made similar errors, the errors of the four cancelled some of the correct analyses of the best system and caused the majority vote to be worse than the best individual system. For this reason we have decided to use only the bracket representations when processing noun phrases above base levels.

The main open question in this study is what training data to use when processing the nonbase noun phrases. In order to find an answer to this question we have tested several configurations while processing tuning data, WSJ section 21, with the training data for the CoNLL-99 shared task. We have tested six training data configurations for predicting open and close bracket positions: using all bracket positions, those of base phrases only, those of all phrases except base phrases, those of phrases of the current level only, those of the current level and the previous, and those of the current level and the next. At all levels, using the brackets of the current level only proved to be working best or close to best. At the sixth level no new noun phrases were detected. Therefore we decided to use only brackets of one phrase level in the training data for nonbase phrases and stop phrase identification after six levels.

We have applied a noun phrase chunker with fixed symmetrical context sizes to the noun phrase data of the CoNLL-99 shared task [Tjong Kim Sang(2000a)]. The chunker generated a majority vote of open and close brackets put forward by five systems, each of which used a different representation of the base noun phrases (IOB1, IOB2, IOE1, IOE2 and O or C). All systems used a window of four left and four right for words and POS tags (18 features) and the four systems using IO representations additionally performed and extra pass with a window of three left and three right for words and POS tags, and a window of two left and two right without the focus tag for chunk tags (also 18 features). The output of the chunker was presented to a cascade of six chunkers, each of which consisted of a pair of open and close bracket predictors which were trained with brackets from one of the levels 1 to 6. After each chunk phase the phrases found were replaced by the head word of the phrase and a fixed chunk tag.

The system obtained an overall F $_{\beta =1}$ rate of 83.79 (precision 90.00% and recall 78.38%) for identifying arbitrary noun phrases.⁹It is slightly better than our performance at CoNLL-99 (82.98, obtained without system combination) which was the best of two entries submitted for the shared task at that workshop. The performance of our noun phrase chunker can be regarded as a baseline score for this data set. This score is already quite high: F $_{\beta =1}$ = 79.70, and it seems that the nonbase level chunkers have not been contributing much to the performance of this shallow parser. Out of curiosity we have also examined how well a full parser does on the task of identifying arbitrary noun phrases. For this purpose we looked at output data of a parser described by [Collins(1999)] which was provided with the parser code (WSJ section 23, model 2). The parser obtained F $_{\beta =1}$ = 89.8 (precision 89.3% and recall 90.4%) for this task. This is a lot better than our shallow parser but we should note that compared with our application, the Collins parser has access to better part-of-speech tags and more training data with more sophisticated annotation rather than only noun phrase boundaries.

Next: Full Parsing Up: Parsing Previous: Clause Identification

Erik Tjong Kim Sang 2002-03-13