Contextual Clues for Named Entity Recognition in Japanese


©2002-2011 The CJK Dictionary Institute, Inc.


1. Names Entity Recognition (NER)

Identifying names and their variants is a difficult computational linguistic task. Named Entity Recognition (NER) is one of the hottest topics in computational linguistics, and plays a major role in NLP applications such as question answering, machine translation, information extraction, and the like. Much work has been done on developing NER tools based on statistical methods, but performing NER accurately is beyond the ability of such methods alone, which must be supplemented by large-scale name databases to be truly effective. However, the small-scale lexicons currently used are inadequate to the task.

2. Lexical Resources for NER

To meet this need, CJKI maintains comprehensive databases of several million proper nouns (personal names and place names), especially of Japanese names and Chinese names, as well as company and organization names. More details can be found at:

These resources are of major importance to NER tool developers, especially because they include millions of variants in all the major and most minor romanization systems, including the major Chinese dialects such as Cantonese, Hakka and Hokkien.

3. Contextual Clues for Japanese NER

Even large-scale proper noun databases cannot be kept fully up-to-date as new names are created daily. Moreover, many kinds of named entities are created arbitrarily and have only an ephemeral existence. Various techniques have been used to automatically detect named entities. Some of these issues are discussed in The Role of Lexical Resources in CJK NLP Applications.

A major technique is the use of keywords or syntactic structures that precede or follow named entities, which we refer to as named entity contextual clues (NECC). We have developed a comprehensive database of contextual clues for Japanese named entities that can play a critical role in enhancing the precision of NER applications.

The table below shows example of Japanese NECCs classified by type. Other attributes are available to help fine tune the data to specific NER requirements. In the example column, the red portion indicates the NECC.

CCompany or organization name
Tpersonal title
Hhonorific term or title


Contextual Clues for Japanese Named Entities
IDTYPEContextual ClueReadingExample
NC0098Cアソシエイツあそしえいつ日本ネットワークアソシエイツ
NC0335Cセンターせんたー国民生活センター
NC0500Cホテルほてるホテルシオノ
NC0597Cえき朝霞
NC0700C協会きょうかい日本ユニセフ協会
NC0722C銀行ぎんこう三井住友銀行
NC0754C研究所けんきゅうじょ日中韓辞典研究所
NC0795C興業こうぎょう山口興業
NC0822C公団こうだん住宅都市整備公団
NC0824C高等学校こうとうがっこう細田学園高等学校
NC0848C
NC0910C書店しょてん旭屋書店
NC0915C振興会しんこうかい日本貿易振興会
NC0918C新聞しんぶん信濃毎日新聞
NC0933C自動車グループじどうしゃぐるーぷ三菱自動車グループ
NC1033Cそう東風
NC1181C百貨店ひゃっかてん東武百貨店
NC1258Cらーめん田舎
NC1308C連盟れんめい日本観光旅館連盟
NC1309Hさんさん春遍雀来さん
NC1314Hさま小泉純一郎
NC1317Hじょう佐伯日菜子
NC1324Tインストラクターいんすとらくたーパソコンインストラクター河野
NC1327Tコーディネーターこーでぃねーたー移植コーディネーター加藤
NC1336Tマネージャーまねーじゃー金子マネージャー
NC1340T委員いいん猪谷千春委員
NC1342T家元いえもと千宗室家元
NC1352T係長かかりちょう二宮係長
NC1360T鑑定士かんていし不動産鑑定士川端一郎
NC1382T建築士けんちくし桜井一級建築士
NC1400T主任しゅにん田中主任
NC1407T鍼灸師しんきゅうし塩沢鍼灸師
NC1417T助役じょやく深沢助役
NC1423T先生せんせい清水先生
NC1431T大使たいしアマコスト駐日アメリカ大使
NC1435T代表だいひょう高井代表
NC1446T通関士つうかんし佐藤通関士
NC1448T取扱者とりあつかいしゃ甲種危険物取扱者藤井
NC1463T保護司ほごし小島節子保護司
NC1474Cコーポレーションこーぽれーしょんベネッセコーポレーション
NC1482C医療法人いりょうほうじん医療法人菅野愛生会
NC1486C(株)かぶしきがいしゃ(株)東芝
NC1493C合資会社ごうしがいしゃ合資会社大和川酒造店
NC1501CSSさーびすすてーしょん志村SS
NC1507C社団法人しゃだんほうじん社団法人著作権情報センター