Database of Arab Names
قاعدة بيانات الأسماء العربية
DAN at a Glance
- 6.5 million validated Arabic name variants.
- Based on over 25 million source names.
- Constantly updated and expanded.
- Proofread by native Arab editors.
- Validated against web and corpora.
- Fully vocalized Arabic.
- Web-based frequency statistics.
- Various romanization system support.
- Attributes such as type and gender.
- Support for OFAC SDNs and aliases.
- Supports various non-English language systems.
- Information retrieval and query processing.
- Entity recognition and extraction.
- Machine translation.
- Compliance and risk management.
- Anti-money laundering and fraud detection.
- Anti-terror and immigration control.
Now covers over 6.5 million names and variants
CJKI's Database of Arab Names (DAN) v3.0 covers over 6.5 million entries and consists of Arabic personal names and name variants mapped to the original Arabic script with a large variety of supplementary information. Based on authoritative linguistic resources, DAN was compiled by a team of Arabic native editors, and includes numerous orthographic variants and other attributes such as web frequency, name type codes and normalized forms.
DAN plays an important role in helping software developers, especially of security applications and natural language processing tools, enhance their technology by enabling named entity recognition and extraction, machine translation, variant normalization, and information retrieval of Arabic names.
Constantly expanding coverage
DAN contains a large and constantly growing collection of romanized Arabic name variants mapped to the original Arabic script. It continues to undergo extensive expansion and proofreading.
Several years have passed since March 2008 when CJKI released DAN v2.0 with coverage of approximately 1.5 million entries. Since that time our team of editors and programmers have been vigorously working on further expansion and validation, and now covers over 6.5 million validated entries with version 3.0.
In addition to comprehensive coverage, DAN offers such unique features as manual vocalization for every Arabic name, support for various romanization systems, and the validation of all romanized variants based on their frequency of occurrence. The database contains a web frequency statistic for each of the millions of variants. Augmenting DAN with frequency data from relevant lexical resources increases the effectiveness with which it can be used to distinguish names from non-names. By including relevant frequency data, DAN can be used to determine the likelihood of an arbitrary string of romainized Arabic actually being a name.
DAN also has both vocalized and unvocalized versions of the Arabic names, and sometimes multiple vocalizations for the same name. Full and accurate diacritics are provided, even such relatively rare ones as alif-wasla and dagger alif. This is not only of academic interest, but is also a practical means to ensure that romanized versions of great accuracy and variety can be provided.
DAN exists both as a standalone database, or it can be paired with our Database of Arab Names in Arabic (DANA), which contains orthographic variants of the canonical, fully sanitized Arabic name.