Objective: Use computational language technology to better understand code-switching in spoken and written language. Use cognitive linguistic insights on code-switching to improve multilingual language generation systems.
Researchers: Safaa Shehadi, Yuli Zeira, Doreen Osmelak, Dean Geckt, Shuly Wintner. In collaboration with Yunal Nov (University of Haifa), Melinda Fricke (University of Pittsburgh) and Yulia Tsvetkov (Univeristy of Washington).
Status: Ongoing
Funding: United States-Israel Binational Science Foundation grant no. 2019785; United States National Science Foundation grants no. 2007960, 2007656, 2125201, and 2040926.
When bilingual speakers interact with each other, they almost inevitably engage in code-switching (CS): moving from one language to another, between and within utterances. Much research shows that this conduct is cognitively tasking; why, then, is it so ubiquitous? The overarching goal of this research is to expand our understanding of CS and the factors that contribute to it. We will pose fundamental research questions regarding CS that did not previously benefit from large-scale, data-driven research; base the investigation on huge, multimodal (spoken and written) naturally occurring (as opposed to lab-induced) linguistic data, in multiple language pairs; and use advanced, state-of-the-art and novel computational methodologies to explore this fascinating phenomenon.
The implementation and analysis code for the MapTask experiments are available on GitHub. Datasets and code for Arabizi and Denglisch are also available. Spanish-English datasets and code for the cognatehood experiments are on GitHub.
Dean Geckt, Melinda Fricke and Shuly Wintner. Strategies of code-switching in human–machine dialogs. Bilingualism: Language and Cognition, to appear. 📖
Shuly Wintner, Safaa Shehadi, Yuli Zeira, Doreen Osmelak and Yuval Nov. Shared Lexical Items as Triggers of Code Switching. Transactions of the Association for Computational Linguistics 11(1471-1484), December 2023. 📖
Doreen Osmelak and Shuly Wintner. The Denglisch corpus of German-English code-switching. Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 42–51, May 2023. 📖
Safaa Shehadi and Shuly Wintner. Identifying Code-switching in Arabizi. Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 194–204, December 2022. 📖
Alissa Ostapenko, Shuly Wintner, Melinda Fricke and Yulia Tsvetkov. Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 3853–3867, May 2022. 📖