The CJK Dictionary Institute (CJKI), which specializes in CJK computational lexicography, is engaged in the continuous expansion of comprehensive CJK lexical databases. Currently, our databases contain nearly eight million entries, including a variety of grammatical and semantic attributes required for developing information retrieval applications, input method editors, named entity extraction, and electronic dictionaries.
This document describes some of the morphological attributes in our Japanese lexical databases, such as derivational affixes and binding valency, which are particularly useful for disambiguating and identifying Japanese lexemes in such applications as input method editors (IME) and search engine query processing. Information on our rich set of grammatical attributes, such as parts of speech (POS) and conjugation pattern codes, can be found at jappos.htm and our extensive frequency statistics database is described at japfreq.htm
We also maintain a comprehensive database of Simplified Chinese and Traditional Chinese morphological attributes and other data described at chinsam.htm.
A derivational affix (DA) is a word element (bound morpheme) that is prefixed or suffixed to a base or stem to create new words, not merely different forms of the same word. Strictly speaking, especially in traditional morphology, derivational affixes do not have lexical meaning of their own, and only add grammatical meanings such as negation and the formation of new parts of speech. Here, we use the term to include compound-forming word elements that have a substantial lexical meaning of their own, such as 人 jin in アメリカ人 amerikajin 'American'. See "A Brief Inroduction to Japanese Morphology" for a fuller discussion.
Jack Halpern's New Japanese English Character Dictionary (NJECD) provides an in-depth treatment of both on (Chinese-derived) and kun (native Japanese) derivational affixes, as explained in detail on pages 72a to 78a of the front matter to that dictionary. During a period of 16 years, we have made a systematic effort to collect these and provide accurate and comprehensive coverage.
Currently our database of derivational and inflectional suffixes contains about 2700 entries. Samples of these appear in Section 4 below. To get a fuller understanding, please look up these affixes in NJECD by referring to the entry numbers, where you will find many compounds and see how these affixes actually work in forming compounds.
The most important benefit is that derivational affixes could significantly contribute to the accuracy of identifying countless lexemes that can be freely coined at the whim of the author. That is, they allow an application to algorithmically construct lexemes not present in the lexicon, like 食べ始める tabehajimeru 'start to eat' from 食べる taberu and the derivational suffix -始める-hajimeru 'start to do'.
It is important to note that even though derivational affixes are often highly productive, ordinary word dictionaries, which focus on free lexemes, normally do not include them because they are bound morphemes.
Affix | NJECD
Entry Number |
Reading | Kind of Affix | English |
---|---|---|---|---|
川 | 0006 | kawa | suffix | names of rivers |
小- | 0007 | ko- | also prefix | little, small |
小- | 0007 | sa- | also prefix | |
水- | 0010 | mizu- | prefix | water |
心 | 0011 | kokoro | also suffix | heart, mind, spirit, soul; thoughts, ideas |
-切る | 0027 | -kiru | verbal suffix | |
-切り | 0027 | -giri | also suffix | way of cutting |
-切れる | 0027 | -kireru | verbal suffix | be able to do, be able to finish |
-切れ | 0027 | -gire | also suffix | |
-代わり | 0030 | -gawari | suffix | substitute, replacement |
-付ける | 0031 | -zukeru | verbal suffix | give, impart, provide with |
Affix | Entry Number | Reading | Kind Of Affix | English |
---|---|---|---|---|
小 | 0007 | shoo- | also prefix | small, little, minor, tiny |
小 | 0007 | -shoo | suffix | names of elementary schools |
水 | 0010 | -sui | also suffix | water, liquid, fluid; soda |
心 | 0011 | -shin | also suffix | sense, motive |
心 | 0011 | -shin | also suffix | heart (the organ) |
旧 | 0014 | kyuu- | also prefix | former, ex-, old-time, old |
順 | 0018 | -jun | also suffix | order, sequence, turn |
仏 | 0019 | -butsu | also suffix | Buddhist image |
化 | 0021 | -ka | also suffix | -ize, -ify |
比 | 0026 | -hi | also suffix | ratio |
比 | 0026 | -hi | also prefix | specific |
代 | 0030 | -dai | suffix | range of a person's age in ten-year periods |
代 | 0030 | -dai | suffix | years spanning a specific period |
代 | 0030 | -dai | also suffix | charge, fare, fee, price |
Affix | Rank | POS | Sub- POS | Reading |
---|---|---|---|---|
人 | 000020 | WS | N | にん |
年 | 000026 | WS | ねん | |
日 | 000028 | WS | にち | |
円 | 000037 | WP | えん | |
円 | 000037 | WS | えん | |
前 | 000044 | WS | まえ | |
第 | 000067 | WP | だい | |
始める | 000068 | WS | はじめる | |
うち | 000086 | WP | うち | |
社 | 000096 | WS | しゃ | |
今 | 000107 | WP | いま | |
目 | 000111 | WS | め | |
続ける | 000155 | WS | つづける | |
出す | 000162 | WS | だす | |
話 | 000179 | WS | はなし | |
度 | 000181 | WS | ど | |
上がる | 000185 | WS | あがる | |
台 | 000192 | WS | N | だい |
回 | 000195 | WS | N | かい |
声 | 000196 | WS | こえ | |
党 | 000204 | WS | とう | |
上げる | 000239 | WS | あげる | |
本 | 000247 | WS | ほん | |
例 | 000248 | WS | れい | |
キロ | 000250 | WP | きろ |
The table below shows some examples of Japanese inflectional affixes. Inflection is explained in "A Brief Inroduction to Japanese Morphology" and the POS codes are defined on the Japanese Part of Speech Codes page.
Affix | Rank | POS | Sub-POS | Reading |
---|---|---|---|---|
に | 000003 | FS | M | に |
に | 000003 | FS | V | に |
が | 000004 | FS | V | が |
て | 000005 | FS | V | て |
で | 000006 | FS | M | で |
と | 000007 | FS | M | と |
と | 000007 | FS | V | と |
から | 000009 | FS | A | から |
の | 000010 | FS | A | の |
の | 000010 | FS | M | の |
の | 000010 | FS | V | の |
する | 000014 | FS | V | する |
する | 000014 | FS | X | する |
か | 000016 | FS | V1 | か |
前 | 000044 | FS | S | まえ |
約 | 000046 | FP | P | やく |
くる | 000049 | FS | V | くる |
くる | 000049 | FS | X | くる |
An adjacency attribute is a part of speech (POS) code that indicates the morphological restrictions that apply to adjacent words or word elements when these are actually used in context in the formation of compound words or affixed lexemes. There are three types of adjacency attributes, as shown in the table below:
BEFORE | An adjacency attribute that indicates the part of speech (POS) of the lexeme, stem or base preceding a suffix or suffix-like element. For example, "NX" for the compounding suffix 員 means that 員 can be preceded by a common noun or verbal noun, as in 研究員. Given only for suffixes. |
---|---|
AFTER | An adjacency attribute that indicates the part of speech (POS) of the lexeme following a prefix or prefix-like element. For example, "NC" for the adnomial prefix 元 means that 元 can be followed by a common noun, as in 元総理大臣. Given only for prefixes. |
RESULT | The part of speech (POS) of the lexeme resulting from affixing a prefix or suffix. For example, "NC" for the adnomial prefix 元 means that prefixing 元 (to a common noun) results in a common noun (元総理大臣). Given only for affixes. |
The table below describes the POS (part of speech) codes used exclusively in the BEFORE and AFTER adjacency attributes. For other part of speech codes, see jappos.htm
POS | SubPOS | English Description | Japanese Description | Notes | Example | Binding Valency |
---|---|---|---|---|---|---|
NX | Noun class | 名詞及び「する」名詞連節 | Same as NC and VN. | 0 | ||
VC | Continuative | 連用形 | 1 | |||
SV | Verb stem | |||||
SA | Noun adjective stem | |||||
SJ | Adjective stem | |||||
SN | Noun stem |
The table below shows the adjacency attribute codes for some Japanese derivational affixes. The POS codes are explained in jappos.htm
Affix | Reading | POS | Sub- POS |
Valency | Rank | Before | After | Result |
---|---|---|---|---|---|---|---|---|
がましげ | がましげ | FS | M | 1 | 061089 | VC | AN | |
がましさ | がましさ | WS | 1 | 061089 | VC | NC | ||
がらみ | がらみ | WS | 1 | 061089 | NC | NC | ||
がわり | がわり | WS | 1 | 061089 | NC | VN | ||
慣れる | なれる | WS | 1 | 002465 | VC | V1 | ||
慣わす | ならわす | WS | 1 | 061089 | VC | V5 | ||
生 | う | WS | 1 | 061089 | NC | NC | ||
生 | うまれ | WS | 1 | 061089 | NC NP | NC | ||
生 | き | WP | 1 | 061089 | NC | NC | ||
生 | せい | WS | 1 | 003721 | NC | NC | ||
生 | なま | WP | 1 | 010656 | NC | NC | ||
生まれ | うまれ | WS | 1 | 002465 | NC NP | NC | ||
切 | ぎれ | WS | 1 | 061089 | NC | NC | ||
切っての | きっての | WS | 1 | 061089 | NC NP | AA | ||
切る | きる | WS | 1 | 001494 | VC | V5 | ||
切れ | ぎれ | WS | 1 | 061089 | NC | NC | ||
切れる | きれる | WS | 1 | 002247 | VC | V1 | ||
先 | さき | WS | 1 | 000491 | VC VN | NC | ||
先々 | せんせん | FP | P | 0 | 061089 | NC | NC | |
先先 | せんせん | FP | P | 0 | 061089 | NC | NC | |
撰 | せん | WS | 1 | 061089 | NC | NC | ||
栓 | せん | WS | 1 | 019135 | NX | NC | ||
泉 | せん | WS | 1 | 061089 | NC | NC | ||
染 | ぞめ | WS | 1 | 061089 | NC NP | NC | ||
染みる | じみる | WS | 1 | 061089 | NC | V1 | ||
染め | ぞめ | WS | 1 | 061089 | NC | VN | ||
染める | しめる | WS | 1 | 061089 | VC | V1 | ||
選 | せん | WS | 1 | 002652 | NC | NC | ||
選び | えらび | WS | 1 | 041445 | NC | VN | ||
前 | ぜん | FP | P | 0 | 005835 | NC | NC | |
前 | ぜん | FS | S | 0 | 005835 | NX | NC | |
前 | まえ | FS | S | 0 | 000044 | NX | NC | |
前々 | ぜんぜん | FP | P | 0 | 061089 | NC | NC | |
前前 | ぜんぜん | FP | P | 0 | 061089 | NC | NC | |
文応 | ぶんおう | NE | 0 | 061089 | NN | NC | ||
文化 | ぶんか | NE | 0 | 000536 | NN | NC | ||
兵 | へい | WS | 1 | 002150 | NX | NC | ||
平 | だいら | WS | 1 | 061089 | NP | NP | ||
平 | ひら | WP | 1 | 061089 | NC | NC | ||
平成 | へいせい | NE | 0 | 061089 | NN | NC | ||
別 | べつ | FS | S | 0 | 000331 | NC | NC | |
別 | べつ | WP | 1 | 000331 | NC VC | NC | ||
片 | かた | WP | 1 | 028538 | NC V | NC | ||
片 | へん | WS | 1 | 025149 | NC | NC | ||
片 | ぺん | WS | N | 1 | 061089 | NN | NC | |
編 | へん | WS | 1 | 008970 | NC NP | NC | ||
編み | あみ | WS | 1 | 061089 | NX | NC | ||
辺 | へん | NC | 0 | 008112 | NC NP | NC | ||
辺 | べ | WS | 1 | 061089 | NC | NC | ||
返す | かえす | WS | 1 | 002476 | VC | V5 | ||
返る | かえる | WS | 1 | 061089 | VC | V5 | ||
便 | びん | WS | 1 | 003030 | NC NN | NC |
The binding valency code indicates the degree of binding between a stem/lexeme and an affix. It enables an application such as a morphological analyzer or IME system to determine if a given an element is bound or free, aiding in the accurate identification of lexemes not registered in the lexicon.
Binding Valency | English Description | Japanese Description | Notes |
---|---|---|---|
0 | Free form | ||
1 | Always bound | ||
2 | Sometimes bound, sometimes free? | ||
D | Binding valency default | ||
U | Binding valency unknown |
The morphological status prefix is broad category for classifying Japanese affixes, and is used as the first letter in the POS codes.
Morph | English Description | Japanese Description | Binding Valency |
---|---|---|---|
E | Japanese phrase | 句 | 0 |
F | Non-derivational affix | 非派生接辞 | 1 |
H | Honorific affix | 待遇接辞 | 1 |
N | Non-affix (free word) | 非接辞 | 0 |
W | Derivational or compounding affix | 派生接辞 | 1 |
X | Generic affix | 接辞 | 1 |
M | Word element (bound morpheme) |
造語成分 | 1 |