“ANALYSIS AND RETRAINING OF THE BERT MODEL FOR THE UZBEK LANGUAGE: METHODS AND RESULTS”

Rajabov J.

Authors

Rajabov J. National University of Uzbekistan, Tashkent, Uzbekistan

Abstract

This paper discusses the use of the BERT model for processing texts in the Uzbek language. BERT (Bidirectional Encoder Representations from Transformers), being one of the most advanced models in the field of natural language processing (NLP), demonstrates high efficiency when working with various languages. The study analyzes the main aspects of adapting BERT for the Uzbek language, including data collection and preparation, model training and evaluation of its performance.

References

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805..

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (pp. 5998-6008).

Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12).

Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 328-339).

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.