Evaluating tagger output

This page provides more information to help you compare the tagger output. We have provided a perl program that can be used to compute the confusion matrices and kappa for the taggers, comparing them against the ‘truth’ in the .pos files.  For a definition of Kappa, please see the class notes or the Jurafsky text; Kappa ranges between 0 (for no agreement between the tagger and the Gold Standard) and 1 (for complete agreement); typically excellent values are greater than 0.8.  The program will also print the P(A) and P(E) values used to compute kappa (recall that P(A) is the agreement between ‘truth’ and the tagger, while P(E) is the agreement expected by chance alone.)

Running the program

The basic syntax for compare-taggers.pl is as follows.  This program resides in the /mit/6.863/tagging/ directory, so you can run it from there.

athena> compare-taggers.pl (-b|-h) (-m) (-k) tagger-output gold-standard

 

The parameters are as follows:

tagger-output

The file containing the output from the tagger

gold-standard

The file containing the "gold standard" tags

-b

Specify that the output is from the Brill tagger

-h

Specify that the output is from the HMM tagger

-m

Print out the confusion matrix

-k

Print out kappa

-x        
            Print out the results in a format suitable for input into a spreadsheet

Examples

To print the confusion matrix and kappa for the tagged file wsj_1975.brill, tagged by the Brill tagger, and save the matrix and kappa in the file wsj_1975.results:

athena> compare-taggers.pl -b -m -k wsj_1975.brill wsj_1975.pos > ~/wsj_1975.results

To print kappa only for the tagged file sw2019.hmm, tagged by the HMM tagger:

athena> compare-taggers.pl -h -k sw2019.hmm sw2019.pos

Notes