Formation of text corpus and frequency definition for the words in the Arabic language: problems and solutions
Abstract
Although the problem of formation of corpus on the material of the Indo-European languages, including Russian, is comparatively developed in relation to other languages and particularly Arabic, it is far from its final solution. The article deals with the problems and solutions for building the Arabic corpus, based on the material from the Internet and other available sources, and identifies the principles of data selection. The article also considers the results of formation of frequency dictionary of Arabic, as well as peculiarities of the Arabic phonology, morphology and script. Besides, the article studies some peculiarities of the stress in Arabic. The article is supplied with a list of the most common Arabic words with their frequency indexing. Refs 6. Tables 1.
Keywords:
Arabic, corpus, computer, data, proceeding, frequency, dictionary
Downloads
References
Downloads
Published
How to Cite
Issue
Section
License
Articles of "Vestnik of Saint Petersburg University. Asian and African Studies" are open access distributed under the terms of the License Agreement with Saint Petersburg State University, which permits to the authors unrestricted distribution and self-archiving free of charge.