Transformer-based automatic Arabic text diacritization

Ali Assad; Abdul Hadi M. Alaidi; Amjad Yousif Sahib; Haider TH. Salim ALRikabi; Ahmed Magdy

doi:10.37868/sei.v6i2.id305

Transformer-based automatic Arabic text diacritization

Authors

Ali Assad Wasit University, Iraq
Abdul Hadi M. Alaidi Wasit University, Iraq
Amjad Yousif Sahib Wasit University, Iraq
Haider TH. Salim ALRikabi Wasit University, Iraq
Ahmed Magdy Suez Canal University, Egypt

DOI:

https://doi.org/10.37868/sei.v6i2.id305

Abstract

In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER) of 2.03%, even though the model trained with the Clean-400 dataset had a DER of 1.37%. As compared to state-of-the-art results, the improvement for the Clean-50 dataset is minimal. However, for the larger Clean-400 dataset, it is a notable improvement, indicating that this approach can deliver more accurate solutions for applications requiring precise diacritical marks with larger datasets. Additionally, this method achieves a DER of 1.21% for the Clean-400 dataset, and it performs even better when given extended input text with overlapping windows.

Published

2024-11-29

How to Cite

[1]

A. Assad, A. H. M. Alaidi, A. Y. Sahib, H. T. S. ALRikabi, and A. Magdy, “Transformer-based automatic Arabic text diacritization”, Sustainable Engineering and Innovation, vol. 6, no. 2, pp. 285-296, Nov. 2024.

Download Citation

Issue

Vol. 6 No. 2 (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Transformer-based automatic Arabic text diacritization

Authors

DOI:

Abstract

Published

How to Cite

Issue

Section

Make a Submission

Copyright Notice