NCHLT Setswana POS tag set

Tag set

 

The following discussion reflects on broader aspects of the tagging process and tag selection strategies as well as aspects of granularity. A separate table of the actual/working tags will be provided separately.

 

For purposes of annotators, this tag set is by and large taken over from Taljard et al (2008) and various documents compiled by G Faasz  and U Heid  from the IMS, Stuttgart and D J Prinsloo and E Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). As described above, the logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

 

The second level of annotation includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

As for the actual tagging, an additional first level of tagging is envisaged. On this level, linguistic words will be tagged. For Northern Sotho, this implies that the four orthographic units ke + a + mo + rata will be tagged as V, since together they constitute a linguistic verb.

 

The tagset currently distinguishes 29 categories and different levels of annotation. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

1.         $ = Punctuation

2.         ABBR = abbreviation

3.         ADJ = adjective

4.         ADV = adverb

5.         ASP = aspectual marker

6.         AUX = auxilliary verb

7.         CCOP = class-indicating copulative subject concord

8.         CD = class-indicating demonstrative

9.         CDCOP = class-indicating demonstrative copulative  {{Not for Tswana}}

10.      CN = class-indicating nominal prefix

11.      CO = class-indicating object concord

12.      CPOSS = class-indicating possessive concord

13.      CS = class-indicating subject concord

14.      ENUM = enumerative

15.      IDEO = ideophone

16.      INT = interjection

17.      JUNC = conjunction

18.      MNEG =  negative morpheme

19.      N = noun

20.      NPP = place and brand name

21.      NUM = numerative

22.      PART = particle

23.      PROEMP = emphatic pronoun

24.      PROPOSS = possessive pronoun

25.      PROQUANT = quantitative pronoun

26.      QUE = question word

27.      TENSE = tense marker

28.      V = verbal

29.      VCOP = copulative verb

As we envisage going deeper into morphological analysis, we also plan for the implementation of the following tags:

AS = adjectival stem

CA = class indicating adjectival prefix

NS = noun stem

NSuf = nominal suffix

VEnd = verbal ending

VExt = verbal extension

VR = verb root

 

1.         PUNCTUATION

The tag $ is used for all punctuation marks. These include full stops, commas, colons, semi-colons, quotation marks, hyphens, exclamation marks, brackets, etc.

2.         ABBREVIATION

All abbreviations are tagged as ABBR.

 

3.         ADJECTIVE

The following tags are used:

Level 1: ADJ01-14, ADJLOC

Notes:

Examples:

            se segolo      ADJ07

            mo go maswe         ADJLOC

4.         ADVERB

The following tags are used:

Level 1:          ADV

Level 2:         ADV_loc

 

 

 

Notes:

Examples:   

ruri                  ADV_nil

gaMosiane    ADV_loc

5.         ASPECTUAL MARKER

The following tags are used:

Level 1: ASP

Level 2: ASP_pot, ASP_prog

Note:

Examples:

ba sa bolela

ASP_prog

ba ka bolela

ASP_pot

 

6.         AUXILLIARY

The following tag is used:

Level 1: AUX

Notes:

Examples:

ba setse ba jele

AUX

o ile a kaela jalo

AUX

 

7.         [CLASS-INDICATING] COPULATIVE SUBJECT CONCORD

The following tags are used:

Level 1: CCOP01-10, CCOP14-15, CCOPLOC, CCOPPERS

Level 2: CCOPPERS_1sg, CCOPPERS_1pl, CCOPPERS_2sg, CCOPPERS_2pl

Notes:

Examples:

le nna ke gona

CCOPPERS_1sg

borotho bo gona

CCOP14_nil

re mo toropong

CCOPPERS_1pl

 

8.         [CLASS-INDICATING] DEMONSTRATIVES

The followings tags are used:

CD01-10, CD14-15, CDLOC

Notes:

Examples:   

            batho ba        CD02

            selo seo        CD07

            felo fale          CDLOC

 

9.         [CLASS-INDICATING] COPULATIVE DEMONSTRATIVES

{{This category is not applicable in Tswana}}

The followings tags are used:

Level 1: CDCOP

Level 2: CDCOP_01-10, CDCOP_14-15, CDCOP_loc

Notes:

Examples:

            šokhwi                       CDCOP_01

            šedi                CDCOP_08

            šefale             CDCOP_loc

 

10.      [CLASS-INDICATING] NOMINAL PREFIX

11.      [CLASS-INDICATING] OBJECT CONCORD

The following tags are used:

Level 1: CO01-10, CO14-15, COLOC, COPERS

Level 2: COPERS_1pl, COPERS_2pl, COPERS_2sg

Notes:

Examples:

Ba re thusitse

COPERS_1pl

Re a go batla

COPERS_2sg

Ke a a rata

CO06

Ba tla se reka

CO07

12.      [CLASS-INDICATING] POSSESSIVE CONCORD

The following tags are used:

Level 1: CPOSS01-10, 14-15, CPOSSLOC

Notes:

 

 

Examples:

bana ba gagwe

CPOSS02

diaparo tsa bana

CPOSS08

Fa tlase ga tafole

CPOSSLOC

 

13.      [CLASS-INDICATING] SUBJECT CONCORD

The following tags are used:

Level 1: CS01-10, CS14-15, CSLOC, CSINDEF, CSNEUT, CSPERS

Level 2: CSPERS_1sg, CSPERS_1pl, CSPERS_2sg, CSPERS_2pl

Notes:

Examples:

se robegile

CS07

di ne tsa boa

CS10

Fa tlase go a tsidifala

CSLOC

go a fisa

CSINDEF

e ne e le mariga

CSNEUT

o a tshwenya

CSPERS_2sg

ra simolola ka tiro

CSPERS_1pl

 

14.      ENUMERATIVE

The following tag is used:

Level 1:          ENUM

Note:

Examples:

polao e šoro

ENUM

mokgwa o šele

ENUM

 

15.      IDEOPHONE

The following tag is used:

Level 1:          IDEO

Examples:   

Gwaa

IDEO

setlhee

IDEO

 

16.      INTERJECTION

The following tag is used:

Level 1: INT

Level 2: INT_neg

Notes:          

Examples:

Dumela

INT_nil

nyaa

INT_neg

 

17.      CONJUNCTION

The following tag is used:

Level 1:          JUNC

Notes:

Examples:

mme

JUNC

gore

JUNC

 

 

18.      NEGATIVE MORPHEME

The following tag is used:

Level 1: MNEG

Notes:

Examples:

ga ba re thuse

MNEG

ba sa re thuse

MNEG

gore ba se re thuse

MNEG

 

19.      NOUN

The following tags are used:

Level 1: N01-10, N01a, N02b, N14, NLOC

Level 2: _aug, _dim, _loc, _name

Notes:

 

Examples:

Mpho

N09_nil

Mpho

N01a_name

Mphonyana

N09_dim

Mphong

N09_loc

Taugadi

N09_aug

ngwakaneng

N03_dim_loc

fatshe

NLOC

Bomosiane

N02b_name

20.      PLACE AND BRAND NAMES

The following tag is used:

Level 1: NPP

Level 2: NPP_name, NPP_brand

Notes:

Examples:

Tlokwe

NPP_place

Coke

NPP_brand

 

21.      NUMERATIVE

The following tag is used:

NUM

Note:

22.      PARTICLE

The following tags are used:

Level 1:          PART

Level 2:         PART_cop, PART_agen, PART_hort, PART_loc, PRT_que, PART_temp, PART_ins, PART_con

Notes:

Examples:

ke mariga

PART_cop

a kwadilwe ke rona

PART_agen

a re bale

PART_hort

ka kwa morago

PART_loc

A ba tlile?

PART_que

A ba tlile naa?

 

ka Matlhatso

PART_temp

ka thipa

PART_ins

go na le kotsi

PART_con

 

23.      EMPHATIC PRONOUN

The following tags are used:

Level 1: PROEMP01-10, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

 

Examples:

Ene

PROEMP01

rona

PROEMPPERS_1pl

Gone/gona

PROEMPLOC

Dibuka tsona

PROEMP10

ka yone

PROEMP09

24.      POSSESSIVE PRONOUNS

The following tags are used:

Level 1: PROPOSS01-10, PROPOSS14-15, PROPOSSLOC, PROPOSSPERS

Level 2: PROPOSSPERS_1sg, PROPOSSPERS_1pl, PROPOSSPERS_2sg, PROPOSSPERS_2pl

Notes:

Examples:

bana ba gagwe

PROPOSS01

bana ba gaetsho

PROPOSSPERS_1pl

bana ba rona

PROPOSSPERS_1pl

maoto a tsone

PROPOSS10

dikolo tsa gone/gona

PROPOSSLOC

 

25.      QUANTITATIVE PRONOUNS

The following tags are used:

PROQUANT01 – 10, PROQUANT14-15, PROQUANTLOC

Notes:

 

 

Examples:

bana botlhe

PROQUANT02

tsotlhe di fedile

PROQUANT10

rona rotlhe

PROQUANT02

 

26.      QUESTION WORDS

The following tags are used:

Level 1: QUE

Level 2: QUE_N01a, QUE_N02b, QUE_loc, QUE_time, QUE_man, QUE_01 – 10, 14 – 15

Notes:

Examples:

                 

ba tlile leng?

QUE_time

ba dula kae?

QUE_loc

Batho bafe

QUE_02

o batla mang?

QUE_N01a

o rekile eng?

QUE_nil

 

27.      TENSE MARKER

The following tags are used:

Level 1: TENSE

Level 2: TENSE_fut, TENSE_pres, TENSE_past

 

Notes:

Examples:

ba tla re thusa

TENSE_fut

ba a re thusa

TENSE_pres

ba ka se re thuse

TENSE_fut

ga ba a re thusa

TENSE_neg

 

28.      VERBAL

The following tag is used:

Level 1: V

Notes:

Examples:

mmotsa

V_tr

ithuta

V_tr

ntshwenya

V_tr

direla

V_dtr

eja

V_tr

 

29.      COPULATIVE VERB

The following tag is used:

Level 1: VCOP

Level 2: VCOP_neg

Notes:

Examples:

ke na le

VCOP_nil

fa e le mariga

VCOP_nil

fa a se teng

VCOP_neg

ya nna selemo

VCOP_nil

 

Working tags

ADJ01

Adjective

ADJ02

Adjective

ADJ03

Adjective

ADJ04

Adjective

ADJ05

Adjective

ADJ06

Adjective

ADJ07

Adjective

ADJ08

Adjective

ADJ09

Adjective

ADJ10

Adjective

ADJ14

Adjective

ADJLOC

Adjective

ADV

Adverb

CD01

Demonstrative

CD02

Demonstrative

CD03

Demonstrative

CD04

Demonstrative

CD05

Demonstrative

CD06

Demonstrative

CD07

Demonstrative

CD08

Demonstrative

CD09

Demonstrative

CD10

Demonstrative

CD11

Demonstrative

CD14

Demonstrative

CD15

Demonstrative

CD16

Demonstrative

CDLOC

Demonstrative

CN15

Infinitive class prefix

CO01

Object concord

CO02

Object concord

CO03

Object concord

CO04

Object concord

CO05

Object concord

CO06

Object concord

CO07

Object concord

CO08

Object concord

CO09

Object concord

CO10

Object concord

CO14

Object concord

CO15

Object concord

CO17

Object concord

CONJ

Conjunctive

COPERS

Object concord

CPOSS01

Possessive concord

CPOSS02

Possessive concord

CPOSS03

Possessive concord

CPOSS04

Possessive concord

CPOSS05

Possessive concord

CPOSS06

Possessive concord

CPOSS07

Possessive concord

CPOSS08

Possessive concord

CPOSS09

Possessive concord

CPOSS10

Possessive concord

CPOSS14

Possessive concord

CPOSS15

Possessive concord

CPOSS17

Possessive concord

CS01

Subject concord

CS02

Subject concord

CS03

Subject concord

CS04

Subject concord

CS05

Subject concord

CS06

Subject concord

CS07

Subject concord

CS08

Subject concord

CS09

Subject concord

CS10

Subject concord

CS11

Subject concord

CS14

Subject concord

CS15

Subject concord

CSINDEF

Subject concord

CSLOC

Subject concord

CSNEUT

Subject concord

CSPERS

Subject concord

ENUM

Enumerative

INT

Interjection

MNEG

Negative morpheme

N01

Noun

N01a

Noun

N02

Noun

N02b

Noun

N03

Noun

N04

Noun

N05

Noun

N06

Noun

N07

Noun

N08

Noun

N09

Noun

N10

Noun

N14

Noun

N17

Noun

N18

Noun

NPP

Pacenames

NLOC

Noun

PART

Particle

PARTQUE

Question particle

PROEMP01

Emphatic pronoun

PROEMP02

Emphatic pronoun

PROEMP03

Emphatic pronoun

PROEMP04

Emphatic pronoun

PROEMP05

Emphatic pronoun

PROEMP06

Emphatic pronoun

PROEMP07

Emphatic pronoun

PROEMP08

Emphatic pronoun

PROEMP09

Emphatic pronoun

PROEMP10

Emphatic pronoun

PROEMP14

Emphatic pronoun

PROEMPLOC

Emphatic pronoun

PROEMPPERS

Emphatic pronoun

PROPOSS02

Posessive pronoun

PROPOSS03

Posessive pronoun

PROPOSS04

Posessive pronoun

PROPOSS05

Posessive pronoun

PROPOSS06

Posessive pronoun

PROPOSS07

Posessive pronoun

PROPOSS08

Posessive pronoun

PROPOSS09

Posessive pronoun

PROPOSS10

Posessive pronoun

PROPOSS14

Posessive pronoun

PROPOSSPERS

Posessive pronoun

PROQUANT01

Quantitative pronoun

PROQUANT02

Quantitative pronoun

PROQUANT03

Quantitative pronoun

PROQUANT04

Quantitative pronoun

PROQUANT05

Quantitative pronoun

PROQUANT06

Quantitative pronoun

PROQUANT07

Quantitative pronoun

PROQUANT08

Quantitative pronoun

PROQUANT09

Quantitative pronoun

PROQUANT10

Quantitative pronoun

PROQUANT14

Quantitative pronoun

PROQUANT15

Quantitative pronoun

PROQUANT17

Quantitative pronoun

PROQUANTLOC

Quantitative pronoun

QUE

Question word

RO

RS

RV

TENSE

Present tense marker, future,

V

Verb

VAUX

Auxiliary verb

VCOP

Copulative verb

ZE

ZM

ZPL

ZPR