Named Entity Recognition (NER)

Top  Previous  Next

Named-Entity Recognition (NER) is the process of automatically classifying distinct unique identifiers, known as named entities (NE), according to a predefined set of categories. Automatic NER systems were developed for each of the languages, typically serving as baseline systems that could be used for other development projects or as starting points for improving NER systems for South African languages. Although several different techniques have proven to be accurate for NER classification, it was decided to use linear-chain conditional random fields (CRFs) with L2 regularisation, as this method has been shown to be both effective and scalable for solving sequence labelling problems in the NER domain.

 

 

See Annotation Tag Sets for tag details.

 

Evaluation results for NER:

(As reported in Eiselen, R, 2016, Government Domain Named Entity Recognition for South African Languages, In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, Portorož, Slovenia, European Language Resources Association (ELRA), pp. 3344–3348.)

 

Language

F-score

Afrikaans

0.7586

isiNdebele

0.7510

isiXhosa

0.7708

isiZulu

0.6993

Sesotho sa Leboa

0.7446

Sesotho

0.7309

Setswana

0.7806

Siswati

0.6429

Tshivenḓa

0.7343

Xitsonga

0.7093