Universal Part of Speech (UPOS) tagging

Top  Previous  Next

The universal part of speech tagger annotate a set of 17 core part of speech categories. The annotation scheme is based on the integration of the Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part of speech tags (Petrov et al., 2012), and the interset interlingua for morphosyntactic tagset (Zeman, 2008).

 

A full description of the tag set is available from the Universal Dependencies website.

 

Evaluation results for UPOS tagging:

 

Language

Accuracy

Afrikaans

97,12%

isiNdebele

89,89%

isiXhosa

92,67%

isiZulu

89,36%

Sesotho

94,55%

Sesotho sa Leboa

97,08%

Setswana

95,53%

Siswati

92,04%

Tshivenḓa

91,63%

Xitsonga

94,42%