From Digitized Literature to Language Corpuses: Significance, Prospects and Challenges
DOI:
https://doi.org/10.52027/18294685-hga2023.spKeywords:
linguistic database, digital heritageAbstract
Digitization of natural texts first of all implies the systematic storage of extra-linguistic data and knowledge in the way of natural language processing. Depending on the way language is documented, it is oral/auditory and written/visual. Since the natural language and its oral and written texts, in addition to the facts, contain the historical-cultural, value-oriented, linguistic realities of the people, the transformation of the digitized texts into linguistic databases or corpora is invaluable from the point of view of effective management for the development of the state, international integration and security. Armenian is among the languages with medium or low resources, and in that sense, its development requires the involvement of large personnel and program resources. The report discusses various dimensions of solving this problem, including design of databases, continuous construction, their parallelization with other languages, and descriptive markup work.
