Jack Halpern


   Data Licensing

CJK Lexical Resources

The CJK Dictionary Institute, specializing in CJK computational lexicography, is engaged in the continuous expansion of comprehensive CJK lexical databases. Currently, these databases cover about 50 million entries in Japanese, Chinese (both simplified and traditional), Korean and Arabic, and include a rich set of grammatical and semantic attributes. These resources are used by major portals and software developers in a wide variety of applications, including named entity recognition, machine translation, anti-money laundering and risk management, morphological analysis.

  • Arabic lexical resources
    Proper nouns -- especially personal names -- and their variants.

  • Chinese lexical resources
    Both Simplified Chinese (SC) and Traditional Chinese (TC) data, covering general vocabulary, technical terms, and proper nouns.

  • Japanese lexical resources
    Over seven million entries, including technical terminology, proper nouns and variants, a phonetic database, and lexical and morphological databases.

  • Korean lexical resources
    Includes large proper noun data (both personal and place names), as well as a Korean Lexical Database.

  • Other lexical resources
    CJKI also maintains resources for Persian, Spanish and Vietnamese.