The CJK Dictionary Institute, Inc. - Variation in Traditional Chinese Orthography

Variation in Traditional Chinese Orthography

by Jack Halpern

Introduction

As a result of the large-scale language reforms undertaken in the PRC in the postwar period, thousands of character forms underwent drastic simplifications. Chinese written in these simplified forms is called Simplified Chinese (SC). Taiwan and Hong Kong, and most overseas Chinese, did not follow the path of simplification. Taiwan, in particular, has adhered fairly strictly to the traditional forms, known as Traditional Chinese (TC).

Traditional Chinese Variants

Traditional Chinese as used in Taiwan and Hong Kong does not have a stable orthography. Dozens of characters have variant forms, some of which are shown below:

Variant Forms in Big Five
Pinyin Normal Variant English
li3 inside
tai2 platform
hui3 destroy
yan4 gloss
chun2 lip


Variant Forms Not in Big Five
Pinyin Big5 Non-Big5 English

wei4

for

jiao1

teach

zhe0

 particle

hui2

turn

chan3

produce

Considerable confusion prevails, and comprehensive Chinese dictionaries often disagree on their choice of the standard form. There are several reasons for the existence of such variants:

  1. Some TC forms, like 綫, are not available in the Big Five character set.
  2. Some forms have coexisted historically, and usage depends on personal preference or editorial policy.
  3. Unavailability of certain glyphs in some fonts.
  4. The use of simplified character forms, especially in handwriting, which are often identical to the corresponding SC forms. For example, 体 is often used instead of the standard 體.

The processing of Chinese texts for such applications as MT, IR and TC-to-SC conversion requires a normalization algorithm supported by hard-coded tables that map TC variants to their canonical forms, used for such operations as dictionary lookup and indexing.

Varieties of Traditional Chinese

The traditional character forms were not entirely abandoned in the PRC. They are still used for some classical literature, newspapers for the overseas Chinese, on name cards, etc. The Chinese government has published a standard that defines Traditional Chinese as used in the PRC by mapping the SC forms to their corresponding TC forms. However, these mappings do not necessarily agree with those widely used in Taiwan and Hong Kong, as can be seen in the table below. We will refer to the variety of Traditional Chinese used in the PRC as "Simplified Traditional Chinese" (STC), and to that used in Taiwan and Hong Kong as "Traditional Traditional Chinese" (TTC).

In SC-to-TC conversion, it is necessary to take both TC variants and the STC/TTC differences into account by preprocessing the input or post processing the output to the desired flavor of TC. Below are some examples of STC to TTC mappings:

STC to TTC Mappings
Pinyin SC STC TTC
xian4 线
beng1
bo1
ce4
ma4