oai:arXiv.org:2411.01523
Computer Science
2024
11/6/2024
We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding.
SinaTools is a unified package allowing people to integrate it into their system workflow, offering solutions for various tasks such as flat and nested Named Entity Recognition (NER), fully-flagged Word Sense Disambiguation (WSD), Semantic Relatedness, Synonymy Extractions and Evaluation, Lemmatization, Part-of-speech Tagging, Root Tagging, and additional helper utilities such as corpus processing, text stripping methods, and diacritic-aware word matching.
This paper presents SinaTools and its benchmarking results, demonstrating that SinaTools outperforms all similar tools on the aforementioned tasks, such as Flat NER (87.33%), Nested NER (89.42%), WSD (82.63%), Semantic Relatedness (0.49 Spearman rank), Lemmatization (90.5%), POS tagging (97.5%), among others.
SinaTools can be downloaded from (https://sina.birzeit.edu/sinatools).
;Comment: 10 pages, 3 figures
Hammouda, Tymaa,Jarrar, Mustafa,Khalilia, Mohammed, 2024, SinaTools: Open Source Toolkit for Arabic Natural Language Processing