Jack Halpern


   Data Licensing

Multilingual Database of Proper Nouns

We maintain the world's largest databases of CJK proper nouns, with over ten million entries, used by some of the world's major IT companies for a wide variety of applications such as named entity recognition (NER), machine translation (MT), information retrieval (IR) and input method editors. This edition, the Multilingual Database of Proper Nouns (CJKE-DPN), currently contains about 150,000 entries (including variants), covering the most common CJK and Western personal names and surnames, brings together five languages -- Simplified Chinese (SC), Traditional Chinese (TC), Japanese, Korean and English, in a multidirectional format, and has been expanded to include Arabic (see Table 2) and Spanish (not shown here).

The database includes various data fields (many of which are not shown in the sample), such as readings in pinyin, zhuyin fuhao (註音符號), hiragana and several romanization systems, semantic classification codes and frequency rankings, locale codes, and other useful information such as frequency statistics, only some of which are shown here.

Editorial Policy

It is important to note that the TC names are not merely a code-conversion equivalent of the SC names, but are accurate on both the orthographic and the lexemic levels (similar to American 'color' vs. British 'colour' as opposed to American 'gas' vs. British 'petrol'). For example, New Zealand in SC is 新西兰 Xīnxīlán but in TC it is 紐西蘭 Niǔxīlán (click here for details).

This database is constantly kept up-to-date, and includes such recent changes and additions to proper names as the late 2005 change of the Chinese for Seoul (서울) from 汉城 (hànchéng) to 首尔 (shǒuěr).

A unique feature of this database is that we distingsuish between SC and TC readings. Thus the pinyin for SC 期荣 is qīróng, but for the TC 期榮 it is qírōng. For details, see Taiwan and PRC Pinyin differences.

Data Fields

Field Name Field Description
ID Unique identifying number ("N" prefix for "Name")
TYPE Semantic classification code such "S" for surname, "G" for given name and "P" for place name. (More details)
ENG Name in English
JAP Name in Japanese
SC Name in Simplified Chinese (hanzi)
TC Name in Traditional Chinese (hanzi)
KOR Name in Korean (hangul)
LO Lexemic or Orthographic (see "Editorial Policy" above)
YOMI Japanese reading in hiragana
SC_PIN SC reading in pinyin plus tone with hyphen separating syllables
TC_PIN TC reading in pinyin plus tone with hyphen separating syllables
MOE Korean transcription in former Ministry of Education romanization
ZHUYIN TC reading in zhuyin fuhao (註音符號) with tone (not shown here)
LOCALE Country code for origin of name (not shown here)

CJKE Multilingual Database of Place Names

I TYPE English Japanese SC TC Korean LO YOMI SC_PIN TC_PIN MOE
N002657 P Aruba アルーバ 阿鲁巴 阿盧巴 아루바섬 L あるーば a1-lu3-ba1 a1-lu2-ba1 a-ru-pa-so~m
N001635 P Azerbaijan アゼルバイジャン 阿塞拜疆 亞塞拜然 아제르바이잔 L あぜるばいじゃん a1-sai1-bai4-jiang1 ya4-se4-bai4-ran2 a-che-ru~-pa-i-chan
N081006 P Brasilia ブラジリア 巴西利亚 巴西利亞 브라질리아 O ぶらじりあ ba1-xi1-li4-ya4 ba1-xi1-li4-ya4 pu~-ra-chil-ri-a
N016658 P Caracas カラカス 加拉加斯 卡拉卡斯 카라카스 L からかす jia1-la1-jia1-si1 ka3-la1-ka3-si1 k'a-ra-k'a-su~
N014214 P Cairo カイロ 开罗 開羅 카이로 O かいろ kai1-luo2 kai1-luo2 k'a-i-ro
N017653 P Canton 広東 广东 廣東 광둥 O かんとん guang3-dong1 guang3-dong1 kwang-tung
N058842 SP Chad チャド 乍得 查德 차드 L ちゃど zha4-de2 cha2-de2 ch'a-tu~
N047517 GPu Georgia ジョージア 乔治亚 喬治亞 조지아 O じょーじあ qiao2-zhi4-ya4 qiao2-zhi4-ya4 cho-chi-a
N023778 P Guinea ギニア 几内亚 幾內亞 기니 O ぎにあ ji3-nei4-ya4 ji3-nei4-ya4 ki-ni
N078960 SP Fukuoka 福岡 福冈 福岡 후쿠오카 O ふくおか fu2-gang1 fu2-gang1 hu-k'u-o-k'a
N000617 P Ireland アイルランド 爱尔兰 愛爾蘭 아일랜드 O あいるらんど ai4-er3-lan2 ai4-er3-lan2 a-il-raen-tu~
N068134 P New Zealand ニュージーランド 新西兰 紐西蘭 뉴질랜드 L にゅーじーらんど xin1-xi1-lan2 niu3-xi1-lan2 nyu-chil-raen-tu~
N36301 P Seoul ソウル 首尔 首爾 서울 O そうる shou3-er3 shou3-er3 so~-ul
N054474 P Seoul ソウル 汉城 漢城 서울 O そうる han4-cheng2 han4-cheng2 so~-ul
N062125 P Tel Aviv テルアビブ 特拉维夫 特拉維夫 텔아비브 O てるあびぶ te4-la1-wei2-fu1 te4-la1-wei2-fu1 t'el-a-pi-pu~
N004005 P Yemen イエメン 也门 葉門 예멘 L いえめん ye3-men2 ye4-men2 ye-men
N100468 P Weishan 微山 微山 微山 웨이산 O びざん wei1-shan1 wei2-shan1 we-i-san
N080687 P Wuhan 武漢 武汉 武漢 우한 O ぶかん wu3-han4 wu3-han4 u-han

CJKA Multilingual Database of Place Names

English Japanese SC LO TC Korean Arabic
Aruba アルーバ 阿鲁巴 L 阿盧巴 아루바섬 أروبا
Brasilia ブラジリア 巴西利亚 O 巴西利亞 브라질리아 برازيليا
Caracas カラカス 加拉加斯 L 卡拉卡斯 카라카스 كراكاس
Cairo カイロ 开罗 O 開羅 카이로 القاهرة
Chad チャド 乍得 L 查德 차드 تشاد
Georgia ジョージア 乔治亚 O 喬治亞 조지아 جورجيا
Ireland アイルランド 爱尔兰 O 愛爾蘭 아일랜드 آيرلندا
Seoul ソウル 首尔 O 首爾 서울 سيول
Seoul ソウル 汉城 O 漢城 서울 سيول
Tel Aviv テルアビブ 特拉维夫 O 特拉維夫 텔아비브 تل أبيب
Yemen イエメン 也门 L 葉門 예멘 اليمن

CJKE Multilingual Database of Personal Names

N000034 S Abba アッバ 阿巴 亞伯 아바 L あっば a1-ba1 ya4-bo2 a-pa
N000035 S Abbas アッバース 阿巴斯 阿巴斯 아바스 O あっばーす a1-ba1-si1 a1-ba1-si1 a-pa-su~
N002982 G Alberto アルベルト 阿尔韦托 阿爾韋托 알베르토 O あるべると a1-er3-wei2-tuo1 a1-er3-wei2-tuo1 al-pe-ru~-t'o
N0386171 G Qirong 期栄 期荣 期榮 치룽 O きえい qi1-rong2 qi2-rong2 ch'i-rung
N000871 F Akiko 暁子 晓子 曉子 아키코 O あきこ xiao3-zi3 xiao3-zi3 a-k'i-k'o
N000872 F Akiko 顕子 显子 顯子 아키코 O あきこ xian3-zi3 xian3-zi3 a-k'i-k'o
N000873 F Akiko 昭子 昭子 昭子 아키코 O あきこ zhao1-zi3 zhao1-zi3 a-k'i-k'o
N001161 FM Akira 아키라 O あきら ming2 ming2 a-k'i-ra
C139707 G Deng O とう deng1 deng1 to~ng
N000629 S Einstein アインスタイン 爱因斯坦 愛因斯坦 아인슈타인 O あいんすたいん ai4-yin1-si1-tan3 ai4-yin1-si1-tan3 a-in-syu-t'a-in
N000134 G Ernest アーネスト 欧内斯特 歐尼斯特 어니스트 L あーねすと ou1-nei4-si1-te4 ou1-ni2-si1-te4 o~-ni-su~-t'u~
N026074 S Gregg グレッグ 格雷格 葛瑞格 그레그 L ぐれっぐ ge2-lei2-ge2 ge3-rui4-ge2 ku~-re-ku~
N026075 G Greg グレッグ 格雷格 葛瑞格 그레그 L ぐれっぐ ge2-lei2-ge2 ge3-rui4-ge2 ku~-re-ku~
N014143 G Haiyang 海洋 海洋 海洋 하이양 O かいよう hai3-yang2 hai3-yang2 ha-i-yang
N014144 G Huaiyang 懐陽 怀阳 懷陽 화이양 O かいよう huai2-yang2 huai2-yang2 hwa-i-yang
N046125 G Jack ジャック 杰克 傑克 O じゃっく jie2-ke4 jie2-ke4 chaek
N046119 G Jackie ジャッキー 杰基 傑基 재키 O じゃっきー jie2-ji1 jie2-ji1 chae-k'i
N028385 S Kennedy ケネディ 肯尼迪 甘迺迪 케네디 L けねでぃ ken3-ni2-di2 gan1-nai3-di2 k'e-ne-ti
N014142 P Kaiyang 開陽 开阳 開陽 카이양 O かいよう kai1-yang2 kai1-yang2 k'a-i-yang
N067417 SP Nakajima 中島 中岛 中島 나카지마 O なかじま zhong1-dao3 zhong1-dao3 na-k'a-chi-ma
N006561 G William ウィリアム 威廉 威廉 빌리암 O うぃりあむ wei1-lian2 wei1-lian2 pil-ri-am
C110425 S Zhang O ちょう zhang1 zhang1 chang