Chinese Morphological Database
The CJK Dictionary Institute maintains a comprehensive lexical database of about three million Simplified and Traditional Chinese entries covering a broad spectrum of fields including proper nouns, technical terminology and general vocabulary. This is a sample of our Chinese Morphological Database, a comprehensive database of Chinese derivative affixes with adjacency attributes.
A derivational affix (DA) is a bound morpheme (though some also function as free forms) prefixed or suffixed to a base to create new words. In traditional morphology, DAs do not have lexical meanings of their own, and only add grammatical meanings. Here, we include "lexical affixes" -- compound-forming word elements that have a substantial lexical meaning of their own. Identifying DAs is very useful in NLP, IME and information retrieval applications as they significantly contribute to the accuracy of algorithmically identifying countless lexemes not registered in the lexicon.
An important principle in our criteria for selecting an affix is its ability to combine with a base consisting of two or more characters, as 迷 'fan' combining with 独轮车 'unicycle' to produce 独轮车迷 'unicycle fan'. If an affix combines with only single-character bases, it is excluded because of the danger of confusing it with two-character compounds in which it does not function as an affix, as in 入迷, or with a coincidental juxtaposition of a free form.
An adjacency attribute is a part of speech (POS) code that indicates the morphological restrictions that apply to adjacent words or DAs when these are actually used in the formation of compound words or affixed lexemes. Adjacency attributes help programs identify DAs with greater reliability, especially in systems that fully support POS-tagging. For more details, see japaffix.htm.
Type | [A1] | productive derivational suffix -- always bound |
---|---|---|
[A2] | Productive derivational suffix -- almost always bound | |
[A3] | Productive derivational suffix -- sometimes bound | |
[B1] | Historically productive derivational prefix -- always bound | |
[B2] | Historically productive derivational prefix -- sometimes bound | |
POS | Part of speeech code. For details see chinpos.htm | |
Before | POS of lexeme or base preceding a suffix, e.g. "NC" for the suffix 迷 'fan' means that 迷 can be preceded by a common noun, as 独轮车 'unicycle', to produce 独轮车迷 'unicycle fan'. | |
After | POS of lexeme or base following a suffix, e.g. "NC" for the prefix 半 'semi-' means that 半 can follow a common noun, as 文盲 'illiterate', to produce 半文盲 'semiilliterate'. | |
Result | The POS of the lexeme resulting from affixing a prefix or suffix. For example, "NC" for 独轮车迷 'unicycle fan' means that 独轮车迷 is a common noun. |
Data Sample
SC ID | SC Affix | TC Affix | POS Code | TYPE Code | Pinyin | Before | After | Result |
---|---|---|---|---|---|---|---|---|
S0007529A | 县 | 縣 | WS | A3 | xian4 | NP | NP | |
S0009543A | 团 | 團 | WS | A3 | tuan2 | NC V | NC | |
S0010532B | 处 | 處 | WS | A3 | chu4 | NC V | NC | |
S0010875Aa | 头 | 頭 | WS | A3 | tou0 | NC | NC | |
S0015201Ad | 总 | 總 | WP | A2 | zong3 | NC V | NC V | |
S0034279A | 节 | 節 | WS | A3 | jie2 | NC V NP | NC | |
S0047893A | 镇 | 鎮 | WS | A3 | zhen4 | NP | NP | |
S0061252Aa | 炎 | 炎 | WS | A2 | yan2 | NC | NC | |
S0064269A | 化 | 化 | WS | A3 | hua4 | NC V A D | NC V A | |
S0070103Ad | 机 | 机 | WS | A1 | ji1 | V NC A | NC | |
S0072424Ab | 鬼 | 鬼 | WS | A3 | gui3 | A NC V | NC | |
S0078485Aa | 型 | 型 | WS | A1 | xing2 | NC V A NP | NC | |
S0084233Ah | 好 | 好 | WP | A3 | hao3 | NC A | NC A | |
S0084666A | 工 | 工 | WS | A3 | gong1 | V NC | NC | |
S0098752Aa | 者 | 者 | WS | A2 | zhe3 | NC V A | NC | |
S0096010Ad | 手 | 手 | WS | A3 | shou3 | NC V | NC | |
S0101751Aa | 所 | 所 | WS | A2 | suo3 | NC V NA | NC | |
S0106449Ab | 心 | 心 | WS | A3 | xin1 | NC V A | NC | |
S0112789Ab | 性 | 性 | WS | A3 | xing4 | A NC V | NC | |
S0112870Ag | 生 | 生 | WS | A3 | sheng1 | NC V A | NC | |
S0123643A | 族 | 族 | WS | A3 | zu2 | NP NC V A | NC | |
S0121387Ah | 多 | 多 | WP | A3 | duo1 | NC | NC | |
S0119011A | 大 | 大 | WP | A3 | da4 | NC V | NC V A D | |
S0120518Ag | 第 | 第 | WP | A1 | di4 | NN | NC | |
S0128279Ad | 超 | 超 | WP | A3 | chao1 | NC A | NC A D | |
S0138060A | 派 | 派 | WS | A3 | pai4 | NP NC A V | A NC | |
S0142229Af | 半 | 半 | WP | A2 | ban4 | NC V A | NC V A | |
S0142513Ad | 反 | 反 | WP | A3 | fan3 | NC A V | NC V A | |
S0143043A | 犯 | 犯 | WS | A2 | fan4 | V NC A | NC | |
S0141475Ad | 微 | 微 | WP | A1 | wei1 | NC NM V | NC | |
S0144731Ae | 品 | 品 | WS | A3 | pin3 | V NC A | NC | |
S0148106A | 部 | 部 | WS | A3 | bu4 | NC V | NC | |
S0148384Ae | 副 | 副 | WP | A2 | fu4 | NC | NC | |
S0157840A | 迷 | 迷 | WS | A3 | mi2 | NC V | NC | |
S0164882Aa | 率 | 率 | WS | A2 | lv4 | V NC A | NC | |
S0165711Af | 老 | 老 | WP | A3 | lao3 | NC A V | NC |