Comprehensive Database of Chinese Names
The CJK Dictionary Institute maintains comprehensive databases of several million proper nouns, especially of Japanese names and Chinese names. This document describes our Comprehensive Database of Chinese Personal Names. For reference, see also The Role of Lexical Resources in CJK NLP Applications and Named Entity Contextual Clues.
Currently our databases contain over 1,600,000 Chinese names, accompanied by both accurate pinyin readings and gender codes, and can be expanded to 4-5 million if romanized variants are included.
Identifying, processing and normalizing names and their numerous variants are useful in a variety of applications, including:
- Anti money-laundering by financial institutions.
- Security applications such as identifying suspected name variants of terrorists and criminals.
- Query processing by search engines.
- Immigration control systems.
- Improving the accuracy of machine translation.
- Entity and information extraction.
- Segmentation and morphological analysis of CJK languages.
The table below shows examples of Chinese names along with the name's gender and its pinyin reading. Larger samples are available upon request.