This page provides more information to help you compare the tagger output. We have provided a perl program that can be used to compute the confusion matrices and kappa for the taggers, comparing them against the ‘truth’ in the .pos files. For a definition of Kappa, please see the class notes or the Jurafsky text; Kappa ranges between 0 (for no agreement between the tagger and the Gold Standard) and 1 (for complete agreement); typically excellent values are greater than 0.8. The program will also print the P(A) and P(E) values used to compute kappa (recall that P(A) is the agreement between ‘truth’ and the tagger, while P(E) is the agreement expected by chance alone.)
The basic syntax for
is as follows. This program resides in the
can run it from there.
athena> compare-taggers.pl (-b|-h) (-m) (-k) tagger-output gold-standard
The parameters are as follows:
The file containing the output from the tagger
The file containing the "gold standard" tags
Specify that the output is from the Brill tagger
Specify that the output is from the HMM tagger
Print out the confusion matrix
Print out kappa
To print the confusion matrix and kappa for the tagged file
tagged by the Brill tagger, and save the matrix and kappa in the file
athena> compare-taggers.pl -b -m -k wsj_1975.brill wsj_1975.pos > ~/wsj_1975.results
To print kappa only for the tagged file
sw2019.hmm, tagged by the
athena> compare-taggers.pl -h -k sw2019.hmm sw2019.pos
compare-taggers.pl- it relies on the formatting being the same in the tagged and gold-standard files. You will know you have gone wrong if you get a lot of “line mismatch” errors, and a low kappa score (near 0).
all.pos, which contain all of the texts in one big file.