Dictionaries

   Overview
   Arabic
   Chinese
   Japanese
   Korean
   Mobile

Other

   Articles/papers
   KDPS
   Jack Halpern

Company

   About
   Data Licensing
   Jobs
   Location
   Contact
   Map










Comprehensive Database of Japanese Name Variants

包括的な日本人名異表記データベース


1. The Problem of Name Variants

The number of personal names and their variants in the world is in the billions. The number of place names is also large, but they have fewer variants. Identifying names and their variants is a difficult computational linguistic task. Named Entity Recognition (NER) is a hot topic in computational linguistics and plays an important role in many IT applications.

To enhance this technology, CJKI maintains comprehensive databases of several million proper nouns, especially of Japanese names and Chinese names. This document describes some issues of Japanese name variation and provides samples of our extensive Japanese name variant resources. For reference, see also The Role of Lexical Resources in CJK NLP Applications and Named Entity Contextual Clues.

2. Practical Applications

Identifying, processing and normalizing names and their numerous variants are useful in a variety of applications, including:

  1. Anti money-laundering by financial institutions.
  2. Security applications such as identifying suspected name variants of terrorists and criminals.
  3. Query processing by search engines.
  4. Immigration control systems.
  5. Improving the accuracy of machine translation.
  6. Entity and information extraction.
  7. Segmentation and morphological analysis of CJK languages.

Large databases of name variants play a critical role in such applications. CJKI maintains databases of several million names and name variants in all major and most minor romanization systems for Chinese, Japanese and Korean, including the major Chinese dialects, as well as for Arabic and Spanish.


3. The Challenge of Japanese Name Variants

Japanese personal names are extremely numerous. Our Japanese-English Dictionary of Proper Nouns contains about 400,000 unique given names and some 150,000 surnames. If we add to this the numerous romanized variants, we get millions of names.

There are several well-established systems for romanizing Japanese, as well as various popular ones and even hybrid ones where the same word is written in a mixture of different systems. The principal systems and other systems that CJKI databases support are as follows. The examples are for the name 大津 (おおづ) and 山口 (やまぐち).


Japanese Romanization Systems
System Example Description
Hepburn Ōzu The most widely used system, in several variations as shown in the table below.
Kunrei Ôzu The official Japanese government system that has become an ISO standard (ISO 3602).
Nippon Ôdu The predecessor of the Kunrei system but still in use.
Waapuro Ouzu Based on popular input methods.
English Ozu The most common English spelling based on Hepburn with long vowels omitted.
Germanic Jamagutschi German based romanization.
Romance Yamagutchi Romance language based romanization.
Variants Oozu
Ohzu
Oodu
Oudu
Ohdu
Odu
Miscellaneous variants of each system, such as the different flavors of Hepburn.

CJKI's name variant databases contain millions of entries that cover all the above systems, their variants, and hybrids. Below are samples of these variants and a brief description of why there is so much variation. There are other systems, like the JSL system devised by Eleanor Jorden, and the ALA-LC system, essentially identical to the Revised Hepburn system, which are not shown in the samples below.


4. Variation in Hepburn Romanization

The English-based Hepburn romanization system was devised by the Reverend James Curtis Hepburn and introduced in his Japanese–English dictionary published in 1867. It is the most widely used system and serves as the de facto standard. It is in common use even by the Japanese government in place of the Kunrei Romnization, the official standard.

Contrary to popular belief, the Hepburn system comes in many flavors. The standard, official Hepburn system is called Revised Hepburn, but some of the other variants shown below are just as popular, if not more so. Note that Revised Hepburn is sometimes confusingly referred to as Modified Hepburn, a less popular system used by some dictionaries and linguists.


Variation in Hepburn Romanization
KANJIYOMIENGLISHRevised HepburnModified HepburnTraditional HepburnPassport
Hepburn
Waapuro HepburnHepburn Variants
佐藤 さとう Sato Satō Satoo Satō Satoh, Sato Satou Satô
大津 おおづ Ozu Ōzu Oozu Ōzu Ohzu, Ozu Oozu Ôzu
井生 いおう Io Ioo Ioh, Io Iou
伊大地 いおおじIojiIōjiIōjiIōjiIohji, IojiIoojiIôji
天満屋 てんまんやTenman'ya, TenmanyaTenman'ya, Tenmanya, Tenman-yaTenman'ya, Ten̄man̄yaTenman'yaTenman'ya, Tenmanya, Tenman-yaTenmanya
山陰房 さんいんぼうSan'inbo, SaninboSan'inbō, Saninbō, San-inbōSan'inboo, Saninboo, San̄in̄booSan'imbō, SanimbōSan'imboh, Sanimboh, San-imboh, San'imbo, Sanimbo, San-imboSaninbouSan'inbô, Saninbô, San-inbô, San'imbô, Sanimbô, San-imbô
本間ほんまHonmaHonmaHonma, Hon̄maHommaHommaHonma
淳一郎じゅんいちろうJun'ichiro, JunichiroJun'ichirō, Junichirō, Jun-ichirōJun'ichiroo, Junichiroo, Jun̄ichirooJun'ichirō, JunichirōJun'ichiroh, Junichiroh, Jun-ichiroh, Jun'ichiro, Junichiro, Jun-ichiroJunichirouJun'ichirô, Junichirô, Jun-ichirô
山口やまぐちYamaguchiYamaguchiYamaguchiYamaguchiYamaguchiYamaguchi
愛子あいこAikoAikoAikoAikoAikoAiko

5. A Plethora of Romanization Systems

The table below shows examples of romanized names in various official and unofficial systems. Only the standard, official version is shown under the column for each of the three principal systems: Hepburn, Kuneri and Nippon. Variants of each of these systems, such as the different flavors of Hepburn, are all collected in the Variants column. All hybrids are shown in the Hybrids column. The Waapuro System, which has many variants, is not shown in a separate column but is included in the Variants column.

As can be seen from Table 2 and Table 3, variation occurs for various reasons:

  1. The representation of long vowels, especially /o:/ written as ō, o, ô, ou. or oh.
  2. Moraic /N/ (ん) sometimes written as m, rather than n, before /b/, /p/, and /m/.
  3. Apostrophes omitted or replaced by hyphens when /N/ is followed by a vowel or /y/.
  4. Multiple representations for certain consonants: e.g., じゃ is written as ja, zya or jya.

In the real world, each of the various systems has variants, and names are often written by mixing multiple systems. For example, Juniti consists of Jun, the Modified Hepburn for じゅんう, and iti, the Kunrei version of いち. We refer to such combinations as hybrids.


Japanese Romanization Systems
KanjiYomiEnglishHepburnKunreiNipponVariantsHybridsGermanicLatin
佐藤さとうSatoSatōSatôSatôSatoo, Satou, Satoh   
青塚あおづかAozukaAozukaAozukaAodukaAozucaAoduca  
愛子あいこAikoAikoAikoAikoAico   
生越いくごしIkugoshiIkugoshiIkugosiIkugosiIcugosiIcugoshiIkugoschiIkugochi
大津おおづOzuŌzuÔzuÔduOozu, Ouzu, Ohzu, Oodu, Oudu, Ohdu, OduŌdu  
井生いおうIoIoo, Iou, Ioh   
伊大地いおおじIojiIōjiIôziIôziIōzi, Ioozi, Iouzi, Iohzi, Iozi, Iooji, Iouji, Iohji, Iôji   
橋本はしもとHashimotoHashimotoHasimotoHasimoto  HaschimotoHachimoto
青柳塘あおやぎとうAoyagitoAoyagitōAoyagitôAoyagitôAoyagitoo, Aoyagitou, Aoyagitoh Aojagito 
天満屋てんまんやTenman'yaTenman'yaTenman'yaTenman'yaTemman'ya, Temmanya, Temman-ya, Tenmanya, Tenman-ya Tenman'ja, Tenmanja, Tenman-ja 
靑山あおやまAoyamaAoyamaAoyamaAoyama  Aojama 
赤口あかぐちAkaguchiAkaguchiAkagutiAkagutiAcaguciAkaguci, Acaguchi, AcagutiAkagutschiAkagutchi
山口やまぐちYamaguchiYamaguchiYamagutiYamagutiYamaguci JamagutschiYamagutchi
裕子ゆうこYukoYūkoYûkoYûkoYûco, Yūco, Yuuco, Yuco, Yuuko Juko 
相越あいこしAikoshiAikoshiAikosiAikosiAicosiAicoshiAikoschiAikochi
吉田よしだYoshidaYoshidaYosidaYosida  JoschidaYochida
正月しょうげつShogetsuShōgetsuSyôgetuSyôgetuSyōgetu, Syoogetu, Syougetu, Syohgetu, Syogetu, Shoogetsu, Shougetsu, Shohgetsu, ShôgetsuShōgetu, Shoogetu, Shougetu, Shohgetu, Shogetu, Shôgetu, Syôgetsu, Syōgetsu, Syoogetsu, Syougetsu, Syohgetsu, SyogetsuSchogetsuChogetsu
山陰房さんいんぼうSan'inboSan'inbōSan'inbôSan'inbôSaninbô, San-inbô, Saninbō, San-inbō, San'inboo, Saninboo, San-inboo, San'inbou, Saninbou, San-inbou, San'inboh, Saninboh, San-inboh, Saninbo, San-inbo, San'imbō, Sanimbō, San-imbō, San'imboo, Sanimboo, San-imboo, San'imbou, Sanimbou, San-imbou, San'imboh, Sanimboh, San-imboh, San'imbo, Sanimbo, San-imbo, San'imbô, Sanimbô, San-imbô   
四本松しほんまつShihonmatsuShihonmatsuSihonmatuSihonmatuShihommatsuShihonmatu, Shihommatu, Sihonmatsu, Sihommatsu, SihommatuSchihonmatsuChihonmatsu
佳子よしこYoshikoYoshikoYosikoYosikoYosicoYoshicoJoschikoYochiko

6. The Variant Explosion

As mentioned above, the reason there are so many Japanese name variants is because of such phenomena as the presence or absence of apostrophes and the multiple ways of expressing long vowels and certain consonants. If these factors happen to combine in the same name, the number of permutations explodes. Combined with the many hybrids, the number of variants for a single name can go into the hundreds.

An example of this is the first name of Japan's former prime minister Jun'ichirō Koizumi (in standard Revised Hepburn). The table below shows the 169 variants of Jun'ichirō, classified roughly by rank, many of which are high frequency and in widespread use. Although they are all legitimate in the sense that they follow the rules of spelling variation for each system, or are hybrids of such variants, some may be rare or non-existing at a particular time or a particular corpus. But since such variants can potentially occur at different times in different corpora, they are included in our databases, which aim to provide a full solution to identifying name variants.

(Please also see this .pdf document about the kanji string 淳子 ("Junko"), which shows the complex many-to-many relations between Japanese personal names.)


LS_IDTypeRomanizationRank
Variants of Jun'ichirō (純一郎)
LS038VARIANTJunichiroA
LS001ENGJun'ichiroA
LS039VARIANTJun-ichiroA
LS041VARIANTJunichirôA
LS093HYBRIDJuniciroA
LS002HEPBURNJun'ichirōA
LS059VARIANTJun-ichirōA
LS033VARIANTJunichirouB
LS032VARIANTJun'ichirouB
LS034VARIANTJun-ichirouB
LS058VARIANTJunichirōB
LS147HYBRIDJyunichiroB
LS069HYBRIDJunitirouB
LS075HYBRIDJunitiroC
LS055VARIANTZyun'itiroC
LS057VARIANTZyun-itiroC
LS030VARIANTJunichirooC
LS036VARIANTJunichirohC
LS141HYBRIDJyunichirouC
LS035VARIANTJun'ichirohC
LS037VARIANTJun-ichirohC
LS046VARIANTZyun'itirooC
LS048VARIANTZyun-itirooC
LS146HYBRIDJyun'ichiroC
LS148HYBRIDJyun-ichiroC
LS144HYBRIDJyunichirohC
LS029VARIANTJun'ichirooC
LS031VARIANTJun-ichirooC
LS159HYBRIDJyunitirouC
LS050VARIANTZyunitirouC
LS165HYBRIDJyunitiroC
LS072HYBRIDJunitirohC
LS047VARIANTZyunitirooD
LS049VARIANTZyun'itirouD
LS051VARIANTZyun-itirouD
LS056VARIANTZyunitiroD
LS111HYBRIDZyunichiroD
LS009LATINJunitchiroD
LS092HYBRIDJun'iciroD
LS094HYBRIDJun-iciroD
LS043VARIANTZyun'itirōD
LS045VARIANTZyun-itirōD
LS110HYBRIDZyun'ichiroD
LS112HYBRIDZyun-ichiroD
LS143HYBRIDJyun'ichirohD
LS145HYBRIDJyun-ichirohD
LS162HYBRIDJyunitirohD
LS104HYBRIDZyun'ichirouD
LS105HYBRIDZyunichirouD
LS106HYBRIDZyun-ichirouD
LS140HYBRIDJyun'ichirouD
LS142HYBRIDJyun-ichirouD
LS053VARIANTZyunitirohD
LS074HYBRIDJun'itiroD
LS076HYBRIDJun-itiroD
LS003KUNREIZyun'itirôE
LS004NIPPONZyun'itirôE
LS005GERMANICJun'itschiroE
LS006GERMANICJunitschiroE
LS007GERMANICJun-itschiroE
LS008LATINJun'itchiroE
LS010LATINJun-itchiroE
LS011VARIANTJyun'icirôE
LS012VARIANTJyunicirôE
LS013VARIANTJyun-icirôE
LS014VARIANTJyun'icirōE
LS015VARIANTJyunicirōE
LS016VARIANTJyun-icirōE
LS017VARIANTJyun'icirooE
LS018VARIANTJyunicirooE
LS019VARIANTJyun-icirooE
LS020VARIANTJyun'icirouE
LS021VARIANTJyunicirouE
LS022VARIANTJyun-icirouE
LS023VARIANTJyun'icirohE
LS024VARIANTJyunicirohE
LS025VARIANTJyun-icirohE
LS026VARIANTJyun'iciroE
LS027VARIANTJyuniciroE
LS028VARIANTJyun-iciroE
LS040VARIANTJun'ichirôE
LS042VARIANTJun-ichirôE
LS044VARIANTZyunitirōE
LS052VARIANTZyun'itirohE
LS054VARIANTZyun-itirohE
LS060VARIANTZyunitirôE
LS061VARIANTZyun-itirôE
LS062HYBRIDJun'itirōE
LS063HYBRIDJunitirōE
LS064HYBRIDJun-itirōE
LS065HYBRIDJun'itirooE
LS066HYBRIDJunitirooE
LS067HYBRIDJun-itirooE
LS068HYBRIDJun'itirouE
LS070HYBRIDJun-itirouE
LS071HYBRIDJun'itirohE
LS073HYBRIDJun-itirohE
LS077HYBRIDJun'itirôE
LS078HYBRIDJunitirôE
LS079HYBRIDJun-itirôE
LS080HYBRIDJun'icirōE
LS081HYBRIDJunicirōE
LS082HYBRIDJun-icirōE
LS083HYBRIDJun'icirooE
LS084HYBRIDJunicirooE
LS085HYBRIDJun-icirooE
LS086HYBRIDJun'icirouE
LS087HYBRIDJunicirouE
LS088HYBRIDJun-icirouE
LS089HYBRIDJun'icirohE
LS090HYBRIDJunicirohE
LS091HYBRIDJun-icirohE
LS095HYBRIDJun'icirôE
LS096HYBRIDJunicirôE
LS097HYBRIDJun-icirôE
LS098HYBRIDZyun'ichirōE
LS099HYBRIDZyunichirōE
LS100HYBRIDZyun-ichirōE
LS101HYBRIDZyun'ichirooE
LS102HYBRIDZyunichirooE
LS103HYBRIDZyun-ichirooE
LS107HYBRIDZyun'ichirohE
LS108HYBRIDZyunichirohE
LS109HYBRIDZyun-ichirohE
LS113HYBRIDZyun'ichirôE
LS114HYBRIDZyunichirôE
LS115HYBRIDZyun-ichirôE
LS116HYBRIDZyun'icirōE
LS117HYBRIDZyunicirōE
LS118HYBRIDZyun-icirōE
LS119HYBRIDZyun'icirooE
LS120HYBRIDZyunicirooE
LS121HYBRIDZyun-icirooE
LS122HYBRIDZyun'icirouE
LS123HYBRIDZyunicirouE
LS124HYBRIDZyun-icirouE
LS125HYBRIDZyun'icirohE
LS126HYBRIDZyunicirohE
LS127HYBRIDZyun-icirohE
LS128HYBRIDZyun'iciroE
LS129HYBRIDZyuniciroE
LS130HYBRIDZyun-iciroE
LS131HYBRIDZyun'icirôE
LS132HYBRIDZyunicirôE
LS133HYBRIDZyun-icirôE
LS134HYBRIDJyun'ichirōE
LS135HYBRIDJyunichirōE
LS136HYBRIDJyun-ichirōE
LS137HYBRIDJyun'ichirooE
LS138HYBRIDJyunichirooE
LS139HYBRIDJyun-ichirooE
LS149HYBRIDJyun'ichirôE
LS150HYBRIDJyunichirôE
LS151HYBRIDJyun-ichirôE
LS152HYBRIDJyun'itirōE
LS153HYBRIDJyunitirōE
LS154HYBRIDJyun-itirōE
LS155HYBRIDJyun'itirooE
LS156HYBRIDJyunitirooE
LS157HYBRIDJyun-itirooE
LS158HYBRIDJyun'itirouE
LS160HYBRIDJyun-itirouE
LS161HYBRIDJyun'itirohE
LS163HYBRIDJyun-itirohE
LS164HYBRIDJyun'itiroE
LS166HYBRIDJyun-itiroE
LS167HYBRIDJyun'itirôE
LS168HYBRIDJyunitirôE
LS169HYBRIDJyun-itirôE