Preview

Humanities and law research

Advanced search

On the corpus of speech samples with errors in the use of Russian as a foreign language: methods of data representation and deep markup parameters

https://doi.org/10.37493/2409-1030.2022.4.17

Abstract

The purpose of the study, the results of which are presented in the article, is to develop the optimal composition and method of presenting data in the developed corpus of Russian speech samples with errors made by foreign students. The development of such a corpus is conditioned, firstly, by the need for a scientific description of erroneous linguistic expressions, as all significant facts of the use of the language are currently being described, and secondly, by the need to create a unified database of systematized data on errors in the speech of Russian language learners for linguodidactic purposes. The creation of such a corpus requires an in-depth description of errors in speech, therefore, in this article, it is proposed to describe an erroneous linguistic expression as a violation of a certain language norm, a certain semantic, morphological, syntactic or lexical language model underlying the normatively correct expression, indicating the type of speech activity, speech situation, native language, specialty of the student. Within the framework of the task of creating a corpus, an error is understood as a failure at a certain level of speech generation, therefore, the model for describing errors is based on the model for describing language expressions developed by domestic researchers when creating an explanatory-combinatorial dictionary. The model of deep annotation of erroneous expressions proposed in the article includes schematized models of semantic representation, syntactic and lexical compatibility (depending on the nature of the error) of a linguistic expression, which is intended, on the one hand, to accurately localize the error in the use of the language, on the other hand, to serve as educational material in linguodidactics. It is concluded that when a statistically significant number of annotated samples with errors in Russian speech made by foreign students is reached, these corpora may well be used as a source of empirical data for a comprehensive scientific description of the facts of linguistic reality. It was also concluded that for the proposed corpus to be viable, it must be an open system that involves the inclusion of new description parameters in deep annotation.

About the Authors

S. V. Gusarenko
North-Caucasus Federal University
Russian Federation

Sergey V. Gusarenko – Doctor of Philology, Professor, Chair of Russian Language as Foreign Language

The address: 1, Pushkin st., 355017, Stavropol, the Russian Federation



M. K. Gusarenko
North-Caucasus Federal University
Russian Federation

Marina K. Gusarenko – PhD in Philology, Associate Professor, Chair of Romance & Germanic Languages and Linguodidactics

The address: 1, Pushkin st., 355017, Stavropol, the Russian Federation



References

1. Apresian Iu. D. Tipy informacii dlja poverhnostno-semanticheskogo komponenta modeli «Smysl↔Tekst» (Types of information for the surface-semantic component of the model «Meaning ↔Text»). Vienna: Wiener Slavistischer Almanach Publ., 1980. 119 p. (In Russian)

2. Apresian Iu. D., Boguslavskii I. M., Iomdin B. L., Iomdin L. L., Sannikov A. V., Sannikov V. Z., Sizov V. G., Tsinman L. L. Sintaksicheski i semanticheski annotirovannyj korpus russkogo jazyka: sovremennoe sostojanie i perspektivy (Syntactically and semantically annotated corpus of the Russian language: current state and prospects. In: Natsional’nyi korpus russkogo iazyka: 2003-2005 (rezul’taty i perspektivy). Moscow: Indrik Publ., 2005. P. 193–214. (In Russian)

3. Baranchikova A. D., Speranskaia A. N. Dinamika glagol’noj sochetaemosti substantiva kljatva po dannym Nacional’nogo korpusa russkogo jazyka (Dynamics of verbal compatibility of the substantive oath according to the National Corpus of the Russian language) // Mir russkogo slova. 2019. No. 2. P. 19–23. (In Russian)

4. Grudeva E. V., Buchilova I. A. Volkova N. A. Korpusy oshibok: celevaja auditorija, vozmozhnaja arhitektura korpusa (Error cases: target audience, possible architecture of the case) // Vestnik Cherepovetskogo gosudarstvennogo universiteta. 2018. No.5 (86). P. 63–72. (In Russian)

5. Zolotov P. Iu. Lingvodidakticheskie svojstva korpusnyh tehnologij (Linguodidactic properties of corpus technologies) // Vestnik Tambovskogo universiteta. Seriia: Gumanitarnye nauki. 2020. Vol. 25. No. 185. P. 75–82. (In Russian)

6. Korpus russkih uchebnyh tekstov. URL: https://ling.hse.ru/krut (Accessed: 25.10.2021)

7. Liashevskaia O. N., Kashkin E. V. Tipy informacii o leksicheskih konstrukcijah v sisteme FrejmBank (Types of information about lexical constructions in the FrameBank system) // Trudy instituta russkogo iazyka im. V. V. Vinogradova. Moscow: Inst. RIa im. V. V. Vinogradova RAN Publ. 2015. No. 6. P. 464–555. (In Russian)

8. Liashevskaia O. N. Korpusnye instrumenty v grammaticheskih issledovanijah russkogo jazyka (Corpus tools in grammatical studies of the Russian language). Moscow: Izdatel’skii Dom IaSK Publ., 2016. 520 p. (In Russian)

9. Mel’chuk I. A. Opyt teorii lingvisticheskih modelej «smysl↔tekst». Semantika. Sintaksis (The experience of the theory of linguistic models «meaning ↔ text». Semantics. Syntax). Moscow: Nauka Publ., 1974. 314 p. (In Russian)

10. Mel’chuk I. A., Zholkovskii A. K. Tolkovo-kombinatornyj slovar’ russkogo jazyka. Opyty semantiko-sintaksicheskogo opisanija russkoj leksiki (Explanatory-combinatorial dictionary of the Russian language. Experiments of semantic and syntactic description of Russian vocabulary). Vienna: Wiener Slavistischer Almanach Publ., 1984. 992 p. (In Russian)

11. Mel’chuk I. A. Russkij jazyk v modeli “smysl↔tekst” (The Russian language in the “meaning ↔ text» model). Moscow – Vienna: Shkola «Iazyki russkoi kul’tury», Venskii slavisticheskii al’manakh Publ., 1995. 682 p. (In Russian)

12. Rezanova Z. I., Vesnina G. Iu. Podkorpus russkoj rechi bilingvov lingvisticheskogo korpusa «Tomskij regional’nyj tekst»: principy razmetki i metarazmetki korpusa (Subcorpus of Russian speech of bilinguals of the linguistic corpus «Tomsk regional text»: principles of markup and meta-markup of the corpus). // Voprosy leksikografii. 2016. No. 1. P. 29–39. (In Russian)

13. Russkij uchebnyj korpus RLC. URL: https://web-corpora.net/RLS/ (Accessed: 25.10.2021)

14. FrameNet Project. URL: https://framenet.icsi.berkeley.edu/fndrupal/ (Accessed: 25.10.2021)


Review

For citations:


Gusarenko S.V., Gusarenko M.K. On the corpus of speech samples with errors in the use of Russian as a foreign language: methods of data representation and deep markup parameters. Humanities and law research. 2022;9(4):650-658. (In Russ.) https://doi.org/10.37493/2409-1030.2022.4.17

Views: 183


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2409-1030 (Print)