NCHLT: Sesotho POS tag set
For purposes of
annotators, this tag set is by and large taken over from Taljard et al (2008)
and various documents compiled by G Faasz
and U Heid from the IMS, Stuttgart
and D J Prinsloo and E Taljard, University of Pretoria. The information below
refers to the current state of the tagset, but further development will
probably necessitate any number of changes.
The tagset is
mainly based on the lexical and morphological criteria defined by Lombard
(1985) and Louwrens (1991). As described above, the logical structure of the
tagset is divided into two layers of linguistic description (annotation
levels):
The first
annotation level includes all mandatory, or, according to EAGLES, obligatory
information, namely up to three elements: an element hinting at the word class,
a second one specifying functional or syntactic properties, and a third one
giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).
The second level
of annotation includes recommended and optional information. This level is in
most cases used for a detailed description of closed class items described in
the tagger lexicon. Compare the following excerpt:
Figure 1: Annotation levels
Description |
Tag 1st level (mandatory information) |
Tag 2nd level (optional/ recommended information) |
Pronouns: |
|
|
emphatic personal |
PROEMPPERS |
1sg,2sg,1pl,2pl |
Verbals: |
V |
tr |
Morphemes: |
|
|
Deficient |
MORPH |
def |
As for the actual tagging, an additional first level
of tagging is envisaged. On this level, linguistic words will be tagged. For
Northern Sotho, this implies that the four orthographic units ke + a
+ mo + rata will be tagged as V, since together they constitute a
linguistic verb. <Sesotho adaptation required here>
The tagset
currently distinguishes 29 categories and different levels of annotation. The
first part of the tag gives a general indication of the nature of the unit in
question. These are as follows:
1. $
= Punctuation
2. ABBR
= abbreviation
3. ADJ
= adjective
4. ADV
= adverb
5. ASP
= aspectual marker
6. AUX
= auxilliary verb
7. CCOP
= class-indicating copulative subject concord
8. CDEM
= class-indicating demonstrative
9. CDEMCOP
= class-indicating demonstrative copulative
10. CN
= class-indicating nominal prefix
11. CO
= class-indicating object concord
12. CPOSS
= class-indicating possessive concord
13. CS
= class-indicating subject concord
14. ENUM
= enumerative
15. IDEO
= ideophone
16. INT
= interjection
17. JUNC
= conjunction
18. MNEG
= negative morpheme
19. N
= noun
20. NPP
= place and brand name
21. NUM
= numerative
22. PART
= particle
23. PROEMP
= emphatic pronoun
24. PROPOSS
= possessive pronoun
25. PROQUANT
= quantitative pronoun
26. QUE
= question word
27. TENSE
= tense marker
28. V
= verbal
29. VCOP
= copulative verb
As we envisage
going deeper into morphological analysis, we also plan for the implementation
of the following tags:
AS =
adjectival stem
CA =
class indicating adjectival prefix
NS =
noun stem
NSuf =
nominal suffix
VEnd = verbal
ending
VExt = verbal
extension
VR =
verb root
The tag $ is used for all punctuation marks. These include full stops, commas, colons, semi-colons,
quotation marks, hyphens, exclamation marks, brackets, etc.
All abbreviations are tagged as
ABBR.
The following tags are used:
Level 1: ADJ01-14, ADJLOC
Notes:
Examples:
se
seholo ADJ07
The following tags are used:
Level 1: ADV
Level 2: ADV_loc
Notes:
Examples:
ruri ADV_nil
haModjadji ADV_loc
The following tags are used:
Level 1: ASP
Level 2: ASP_pot, ASP_prog
Note:
The deficient verbs forms, also called
deficient auxiliary verb forms, -mo, -no,
yo and -tšo are tagged as
ASP. <Sesotho examples required here>
Examples:
ba sa bua |
ASP_prog |
ba ka bua |
ASP_pot |
The following tag
is used:
Level 1: AUX
Notes:
Examples:
ba se ba fihlile |
AUX |
o ile bua jwalo |
AUX |
The following tags are used:
Level 1: CCOP01-10, CCOP14-15,
CCOPLOC, CCOPPERS
Level 2: CCOPPERS_1sg,
CCOPPERS_1pl, CCOPPERS_2sg, CCOPPERS_2pl
Notes:
Examples:
le nna ke hona |
CCOPPERS_1sg |
borotho bo teng |
CCOP14_nil |
re toropong |
CCOPPERS_1pl |
The followings tags are used:
CDEM01-10, CDEM14-15, CDEMLOC
Notes:
Examples:
batho bao CDEM02
sefate seo CDEM07
hona
moo CDEMLOC
The followings tags are used:
Level 1: CDEMCOP
Level 2: CDEMCOP_01-10, CDEMCOP_14-15,
CDEMCOP_loc
Notes:
Examples:
sedi CDEMCOP_08
ke
sela CDEMCOP_loc
The following tags are used:
Level 1: CO01-10, CO14-15, COLOC, COPERS
Level 2: COPERS_1pl, COPERS_2pl, COPERS_2sg
Notes:
Examples:
Ba re thusitse |
COPERS_1pl |
Re a ho batla |
COPERS_2sg |
Ke a a rata |
CO06 |
Ba tlo se reka |
CO07 |
The following tags are used:
Level 1: CPOSS01-10, 14-15,
CPOSSLOC
Notes:
Examples:
bana ba hae |
CPOSS02 |
diaparo tsa bana |
CPOSS08 |
tlasa tafole <insert possessive concord> |
CPOSSLOC |
The following tags are
used:
Level 1: CS01-10,
CS14-15, CSLOC, CSINDEF, CSNEUT, CSPERS
Level 2: CSPERS_1sg,
CSPERS_1pl, CSPERS_2sg, CSPERS_2pl
Notes:
Examples:
se
fihlile |
CS07 |
fatse ho a bata |
CSLOC |
ho a
tjhesa |
CSINDEF |
e ne e le mariha |
CSNEUT |
o a
tshwenya |
CSPERS_2sg |
ra
qala mosebetsi |
CSPERS_1pl |
The following tag is used:
Level 1: ENUM
Note:
Examples:
mokgwa
o sele |
ENUM |
The following tag is used:
Level 1: IDEO
Examples:
thwa |
IDEO |
Pha |
IDEO |
The following tag is used:
Level 1: INT
Level 2: INT_neg
Notes:
Examples:
A e! |
INT_neg |
The following tag is used:
Level 1: JUNC
Notes:
Examples:
hore |
JUNC |
The following tag is used:
Level 1: MNEG
Notes:
Examples:
ha ba
bue |
MNEG |
ba sa bue |
MNEG |
hore
ba se bue |
MNEG |
Level 1: N01-10,
N01a, N02b, N14, NLOC
Level 2: _aug,
_dim, _loc, _name
Notes:
Examples:
Mpho |
N09_nil |
Mpho |
N01a_name |
Mphonyana |
N09_dim |
Mphong |
N09_loc |
Tauhadi |
N09_aug |
sefatenyaneng |
N03_dim_loc |
fatse |
NLOC |
The following tag
is used:
Level 1: NPP
Level 2: NPP_name,
NPP_brand
Notes:
Examples:
polokwane |
NPP_place |
coke |
NPP_brand |
The following tag is used:
NUM
Note:
The following tags are used:
Level 1: PART
Level 2: PART_cop,
PART_agen, PART_hort, PART_loc, PRT_que, PART_temp, PART_ins, PART_con
Notes:
Examples:
ke
mariha |
PART_cop |
e
bonwa ke dintja |
PART_agen |
a re
bale |
PART_hort |
ka kua toropong |
PART_loc |
na ba
tlile? |
PART_que |
ka Moqebelo |
PART_temp |
ka thipa |
PART_ins |
ho
na le kotsi |
PART_con |
The following tags are used:
Level 1: PROEMP01-10, PROEMP14-15, PROEMPLOC,
PROEMPPERS
Level 2: PROEMPPERS_1sg,
PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl
Notes:
Examples:
yena |
PROEMP01 |
Rona |
PROEMPPERS_1pl |
hona |
PROEMPLOC |
dibuka
tsona |
PROEMP10 |
ka yona |
PROEMP09 |
The following tags are used:
Level 1: PROPOSS01-10, PROPOSS14-15, PROPOSSLOC,
PROPOSSPERS
Level 2: PROPOSSPERS_1sg, PROPOSSPERS_1pl,
PROPOSSPERS_2sg, PROPOSSPERS_2pl
Notes:
Examples:
bana
ba gagwe |
PROPOSS01 |
bana
ba geso |
PROPOSSPERS_1pl |
bana
ba rena |
PROPOSSPERS_1pl |
maoto
a tsona |
PROPOSS10 |
dikolo
tsa gona |
PROPOSSLOC |
The following tags are used:
PROQUANT01 – 10,
PROQUANT14-15, PROQUANTLOC
Notes:
Examples:
bana
bohle |
PROQUANT02 |
tsohle di fedile |
PROQUANT10 |
rena
bohle |
PROQUANT02 |
The following tags are used:
Level 1: QUE
Level 2: QUE_N01a, QUE_N02b, QUE_loc, QUE_time, QUE_man,
QUE_01 – 10, 14 – 15
Notes:
Examples:
ba
fihlile neng? |
QUE_time |
ba
dula kae? |
QUE_loc |
batho
bafe |
QUE_02 |
o
batla mang? |
QUE_N01a |
o
rekile eng? |
QUE_nil |
The following tags are used:
Level 1: TENSE
Level 2: TENSE_fut, TENSE_pres,
TENSE_past
Notes:
Examples: <check for correctness>
ba tlo bua |
TENSE_fut |
ba a bua |
TENSE_pres |
ba ka se bua |
TENSE_fut |
ha
ba a bua |
TENSE_neg |
The following tag
is used:
Level 1: V
Notes:
Examples:
mmotsa |
V_tr |
Ithuta |
V_tr |
ntshwenya |
V_tr |
Etsetsa |
V_dtr |
Eja |
V_tr |
The following tag is used:
Level 1: VCOP
Level 2: VCOP_neg
Notes:
Examples:
ke na le |
VCOP_nil |
h e le mariha <check for correctness> |
VCOP_nil |
ha a le siko |
VCOP_neg |
ya ba selemo |
VCOP_nil |
|
|
Working Tagset
ADJ01 |
|
Adjective |
ADJ02 |
|
Adjective |
ADJ03 |
|
Adjective |
ADJ04 |
|
Adjective |
ADJ05 |
|
Adjective |
ADJ06 |
|
Adjective |
ADJ07 |
|
Adjective |
ADJ08 |
|
Adjective |
ADJ09 |
|
Adjective |
ADJ10 |
|
Adjective |
ADJ14 |
|
Adjective |
ADJC01 |
|
Adjective Concord |
ADJC02 |
|
Adjective Concord |
ADJC03 |
|
Adjective Concord |
ADJC04 |
|
Adjective Concord |
ADJC05 |
|
Adjective Concord |
ADJC06 |
|
Adjective Concord |
ADJC07 |
|
Adjective Concord |
ADJC08 |
|
Adjective Concord |
ADJC09 |
|
Adjective Concord |
ADJC10 |
|
Adjective Concord |
ADJC15 |
|
Adjective Concord |
ADJLOC |
|
Adjective |
ADJ15 |
|
Adjective |
ADV |
|
Adverb |
CCOP07 |
|
Copulative concord |
CCOP09 |
|
Copulative concord |
CCOP09 |
|
Copulative concord |
CCOP09 |
|
Copulative concord |
CCOP10 |
|
Copulative concord |
CCOP10 |
|
Copulative concord |
CCOP10 |
|
Copulative concord |
CCOPPERS |
|
Copulative concord |
CD01 |
|
Demonstrative |
CD02 |
|
Demonstrative |
CD03 |
|
Demonstrative |
CD04 |
|
Demonstrative |
CD05 |
|
Demonstrative |
CD06 |
|
Demonstrative |
CD07 |
|
Demonstrative |
CD08 |
|
Demonstrative |
CD09 |
|
Demonstrative |
CD10 |
|
Demonstrative |
CD14 |
|
Demonstrative |
CD15 |
|
Demonstrative |
CD17 |
|
Demonstrative |
CD18 |
|
Demonstrative |
CDLOC |
|
Demonstrative |
CN |
|
Infinitive class prefix |
CO01 |
|
Object concord |
CO02 |
|
Object concord |
CO03 |
|
Object concord |
CO04 |
|
Object concord |
CO05 |
|
Object concord |
CO06 |
|
Object concord |
CO07 |
|
Object concord |
CO08 |
|
Object concord |
CO09 |
|
Object concord |
CO10 |
|
Object concord |
CO14 |
|
Object concord |
CO15 |
|
Object concord |
COLOC |
|
Object concord |
CONJ |
|
Conjunctive |
COPERS |
|
Object concord |
CPOSS01 |
|
Possessive concord |
CPOSS02 |
|
Possessive concord |
CPOSS03 |
|
Possessive concord |
CPOSS04 |
|
Possessive concord |
CPOSS05 |
|
Possessive concord |
CPOSS06 |
|
Possessive concord |
CPOSS07 |
|
Possessive concord |
CPOSS08 |
|
Possessive concord |
CPOSS09 |
|
Possessive concord |
CPOSS10 |
|
Possessive concord |
CPOSS14 |
|
Possessive concord |
CPOSS15 |
|
Possessive concord |
CPOSSLOC |
|
Possessive concord |
CS01 |
|
Subject concord |
CS02 |
|
Subject concord |
CS03 |
|
Subject concord |
CS04 |
|
Subject concord |
CS05 |
|
Subject concord |
CS06 |
|
Subject concord |
CS07 |
|
Subject concord |
CS08 |
|
Subject concord |
CS09 |
|
Subject concord |
CS10 |
|
Subject concord |
CS14 |
|
Subject concord |
CS15 |
|
Subject concord |
CSINDEF |
|
Subject concord |
CSLOC |
|
Subject concord |
CSNEUT |
|
Subject concord |
CSPERS |
|
Subject concord |
ENUM |
|
Enumerative |
IDEO |
|
Idiophone |
INF |
|
Infinitive class prefix |
INT |
|
Interjection |
MORPHFUT |
|
Future |
MNEG |
|
Negative morpheme |
MORPHPER |
|
Progressive |
MORPHPOT |
|
Potential |
MORPHPRES |
|
Present tense marker |
N01 |
|
Noun |
N01a |
|
Noun |
N02 |
|
Noun |
N02b |
|
Noun |
N03 |
|
Noun |
N04 |
|
Noun |
N05 |
|
Noun |
N06 |
|
Noun |
N07 |
|
Noun |
N08 |
|
Noun |
N09 |
|
Noun |
N10 |
|
Noun |
N14 |
|
Noun |
N16 |
|
Noun |
N17 |
|
Noun |
N18 |
|
Noun |
NLOC |
|
Noun |
PART |
|
Particle |
|
||
PROEMP01 |
|
Emphatic pronoun |
PROEMP02 |
|
Emphatic pronoun |
PROEMP03 |
|
Emphatic pronoun |
PROEMP04 |
|
Emphatic pronoun |
PROEMP05 |
|
Emphatic pronoun |
PROEMP06 |
|
Emphatic pronoun |
PROEMP07 |
|
Emphatic pronoun |
PROEMP08 |
|
Emphatic pronoun |
PROEMP09 |
|
Emphatic pronoun |
PROEMP10 |
|
Emphatic pronoun |
PROEMP14 |
|
Emphatic pronoun |
PROEMP15 |
|
Emphatic pronoun |
PROEMPLOC |
|
Emphatic pronoun |
|
||
PROEMPPERS |
|
Emphatic pronoun |
PROPOSS02 |
|
Possessive pronoun |
PROPOSS03 |
|
Possessive pronoun |
PROPOSS04 |
|
Possessive pronoun |
PROPOSS05 |
|
Possessive pronoun |
PROPOSS06 |
|
Possessive pronoun |
PROPOSS07 |
|
Possessive pronoun |
PROPOSS08 |
|
Possessive pronoun |
PROPOSS09 |
|
Possessive pronoun |
PROPOSS10 |
|
Possessive pronoun |
PROPOSS14 |
|
Possessive pronoun |
PROPOSSPERS |
|
Possessive pronoun |
PROQUANT01 |
|
Quantitative pronoun |
PROQUANT02 |
|
Quantitative pronoun |
PROQUANT03 |
|
Quantitative pronoun |
PROQUANT04 |
|
Quantitative pronoun |
PROQUANT05 |
|
Quantitative pronoun |
PROQUANT06 |
|
Quantitative pronoun |
PROQUANT07 |
|
Quantitative pronoun |
PROQUANT08 |
|
Quantitative pronoun |
PROQUANT09 |
|
Quantitative pronoun |
PROQUANT10 |
|
Quantitative pronoun |
PROQUANT14 |
|
Quantitative pronoun |
PROQUANT15 |
|
Quantitative pronoun |
PROQUANT17 |
|
Quantitative pronoun |
PROQUANTLOC |
|
Quantitative pronoun |
QUE |
|
Question word |
RO |
|
|
RS |
|
|
RV |
|
|
V |
|
Verb |
VAUX |
|
Auxiliary verb |
VCOP |
|
Copulative verb |
ZE |
|
|
ZM |
|
|
ZPL |
|
|
ZPR |
|