Date: 2024-07-23 03:00 PM – 03:10 PM
Last modified: 2024-07-05
Abstract
Code-switching (CS) is the alternate use of two or more varieties of language in a single conversation episode. While it can happen unintentionally, driven by the ease of production or environmental influence, bilinguals also leverage code-switching intentionally to achieve specific purposes, such as rhetorical effects, audience design, and the expression of emotions. This study categorizes the motivations of code-switching and summarizes its usages in two Chinese-English conversation datasets (ASCEND and SEAME). A key aspect of this study is to differentiate between unintentional and intentional CS with spontaneous speech corpora and examine the difference in linguistic structures of each type. The study will employ large language models to efficiently scale up annotation of the utterances. The artificial intelligence tools will be instructed to follow a comprehensive taxonomy of the CS motivations and perform hierarchical classification on a large quantity of examples using the taxonomy. The accuracy will be tested by the agreement of AI and human annotators. We anticipate two main findings: 1) CS is utilized both unintentionally, prompted by the ease of production, and intentionally, to serve specific purposes; and 2) CS utterances differing in intentionality exhibit significant structural differences, including the length of the switch, part of speech, and syntactic complexity. Our postulation is that intentional code-switching might be more fluent and occur at specific syntactic boundaries (intersentential or tag-switch). Conversely, unintentional code-switching may be more abrupt and occur mid-sentence (intrasentential switch).