The CJK Dictionary Institute
CJKI

ENG   


Dictionaries
   Resources
   Overview
   Japanese
   Chinese
   Korean
   Arabic
   How to Order
   Dictionaries




Websites
   Articles/papers
   What is CJKI?
   What is KDPS?
   Jack Halpern
   Links

Principal Chinese Lexical Resources

Our comprehensive Chinese lexical resources, covering both Simplified Chinese (SC) and Traditional Chinese (TC) currently contain about three million entries covering general vocabulary, technical terms, proper nouns, company names, etc., used in such applications as machine translation (MT), information retrieval (IR) and input method editors (IME). Statistics on some of our Chinese resources are available here.



  1. News Flash Announcing our Chinese-English Database of Technical Terminology, covering approximately five million etries in some 70 domains of science, technology, and finance.

  2. News Flash CJKI announces the release of its one-of-a-kind Simplified Chinese-English Dictionary, covering over 700,000 entries of general vocabulary, technical terms, and important proper nouns.

  3. News Flash Chinese-Japanese Database of Technical Terminology undergoes major expansion and now covers about 750,000 entries in 20 domains of science and technology.

  4. News Flash Hanzi-Pinyin Transcription System supported by large-scale databases containing millions of entries enables accurate transcription from Hanzi to Pinyin, Zhuyin or Cantonese in some fifteen major and many minor romanization systems.

  5. Major Expansion: Chinese Name Variants Database underwent major expansion. CJKI's proper noun databases contain millions of names and name variants. For Chinese this covers eight romanization systems with their numerous variants.

  6. Comprehensive Chinese Lexical Database Contains nearly 500,000 SC and 500,000 TC entries covering general vocabulary and personal names, with special focus on NLP applications such as segmentation, information retrieval and entity extraction. This represents a major contribution to the field of Chinese information processing.

  7. Chinese Proper Nouns This comprehensive, bilingual database covers some two million SC and TC personal names, place names, company names, and others. Combining it with our Chinese Lexical Database offers a package of unsurpassed effectiveness for Chinese language technology applications.

  8. News Flash Multilingual Database of Proper Nouns, covering Simplified Chinese, Traditional Chinese, Japanese, English and Korean, has been expanded to include worldwide coverage for Arabic.

  9. Newly Updated: English-to-Simplified Chinese Dictionary An English-Chinese dictionary of over 80,000 (expandable to 100,000) covering general vocabulary and important proper names.

  10. Expanded: English-Chinese Bidirectional Dictionary of Computer Terms A dictionary of Simplified (SC) and Traditional (TC) Chinese computer terms. Constantly updated to include very recent terms, this dictionary contains over 100,000 entries.

  11. Expanded: A Japanese-Chinese-English Multilingual Database of Computer Terms covering both SC and TC.

  12. Expanded:Dictionary of Chinese-English Neologisms A comprehensive, up-to-date database of Chinese neologisms maintained by our team of Chinese editors.

  13. Chinese Morphological Database Our database of Chinese derivational affixes is ideal for MT, NLP and information retrieval applications as it enables recognition of unknown compounds and entity extractions.

  14. Chinese Frequency Statistics: A comprehensive databases of Chinese lexical statistics, such as frequency of occurrence of words and characters, based on large-scale corpora, suited for NLP applications and mobile phone input methods.

  15. Taiwan and PRC Pinyin differences: A large-scale database of Chinese pinyin readings (about 2.6 million items). Especially noteworthy are the differences in pronunciation between Taiwan and the People's Republic of China.

  16. English-to-Traditional Chinese Dictionary: An English-Chinese dictionary of some 80,000 entries covering general vocabulary and important proper names.

  17. SC-to/from-TC mapping tables Contains orthographic and lexemic mapping tables to support sophisticated Simplified-to-Traditional Chinese script conversion, including hundreds of thousands of proper nouns.

  18. Lexemic mapping tables Contains a subset of all the lexemic mappings in the main lexeme mapping table. The mappings given here are bidirectional: both SC-to-TC and TC-to-SC are equally valid.

  19. Japanese Names in Chinese Covers about 106,000 Japanese proper names written in simplified Chinese, including common personal and place names, as well as many rare ones.

 

 

CJKI Home