Jack Halpern


   Data Licensing

Comprehensive Database of Chinese Names


The CJK Dictionary Institute maintains comprehensive databases of several million proper nouns, especially of Japanese names and Chinese names. This document describes our Comprehensive Database of Chinese Personal Names. For reference, see also The Role of Lexical Resources in CJK NLP Applications and Named Entity Contextual Clues.

Currently our databases contain over 1,600,000 Chinese names, accompanied by both accurate pinyin readings and gender codes, and can be expanded to 4-5 million if romanized variants are included.

Practical Applications

Identifying, processing and normalizing names and their numerous variants are useful in a variety of applications, including:

  • Anti money-laundering by financial institutions.
  • Security applications such as identifying suspected name variants of terrorists and criminals.
  • Query processing by search engines.
  • Immigration control systems.
  • Improving the accuracy of machine translation.
  • Entity and information extraction.
  • Segmentation and morphological analysis of CJK languages.

Data Sample

The table below shows examples of Chinese names along with the name's gender and its pinyin reading. Larger samples are available upon request.

Name TYPE Gender Pinyin
专祥 G MZhuān Xiáng
业伦 G MYè Lún
业军 G MYè Jūn
业农 G MYè Nóng
业则 G MYè Zé
业宁 G FYè Níng
业彤 G FYè Tóng
业权 G MYè Quán
业浔 G MYè Xún
业经 G MYè jīng
业达 G MYè Dá
业进 G MYè Jìn
业钰 G FYè Yù
业英 G FYè Yīng
业海 G MYè Hǎi
业基 G MYè Jī
业教 G MYè Jiāo
业勤 G MYè Qín
业江 G MYè Jiāng
业志 G MYè Zhì
业祝 G FYè Zhù
业春 G FYè Chūn
业松 G MYè Sōng
业常 G MYè Cháng
业新 G MYè Xīn
业臣 G MYè Chén
业成 G MYè Chéng
业珍 G FYè Zhēn
S -
S - Lái
S - Jiāng
S - Xiàng