Objective: To develop a morphologically-annotated CHILDES corpus of Hebrew and enrich it with dependency-based grammatical relations.
Researchers: In Haifa, Sheli Kol, Bracha Nir, Anat Prior and Shuly Wintner. This project is joint with a team at Carnegie Mellon University, headed by Alon Lavie and Brian MacWhinney; and with Shai Gretz and Alon Itai at the Technion.
Status: Complete
Funding: BSF (grant 2007241).
We propose to develop an accurate, high-quality, syntactically annotated corpus of spontaneous conversational Hebrew in parent-child interactions, based on the existing Hebrew section of CHILDES. We focus specifically on the challenge of accurately annotating the Hebrew corpora in the CHILDES database with morphological and syntactic information that is of particular interest and utility to researchers in child language acquisition. We will define a representative, diverse corpus of transcribed Hebrew speech, reflecting interactions with children of various ages, and standardize the transcription used across the corpus to facilitate computational processing of the transcripts. We will refine an existing morphological analyzer so as to adequately analyze all tokens in the corpus, and develop techniques for morphological disambiguation, so that each token in the corpus is assigned a unique analysis. We will develop a syntactic annotation scheme for Hebrew and manually annotate a subset of the corpus with syntactic relations. The annotated data will be used to train a state-of-the-art parser, which will then be used to automatically annotate the remainder of the corpus with grammatical relations. We will then utilize the annotated Hebrew corpus, together with the available English section of CHILDES, to compare the syntactic behavior of children acquiring Hebrew with that of children acquiring English, especially where linguistic structures are different across the two languages. All the resources we will develop in this project, including the annotated corpus and the parser, will be distributed through the CHILDES website, for the benefit of the entire scientific community.
The annotated corpus, along with the morphological grammar, is available from the main CHILDES repository.
Shai Gretz, Alon Itai, Brian MacWhinney, Bracha Nir, and Shuly Wintner. Parsing Hebrew CHILDES Transcripts. Language Resources and Evaluation, 49(1):107-145, March 2015.
Aviad Albert, Brian MacWhinney, Bracha Nir, and Shuly Wintner. The Hebrew CHILDES Corpus: Transcription and Morphological Analysis. Language Resources and Evaluation 47(4):973-1005, December 2013.
Sheli Kol, Bracha Nir, and Shuly Wintner. Computational Evaluation of the Traceback Method. Journal of Child Language 41(1):174-197, January 2014.
Shai Gretz. Syntactic Annotation of the Hebrew CHILDES Corpora, M.Sc. thesis, Technion, May 2013.
Aviad Albert, Brian MacWhinney, Bracha Nir and Shuly Wintner. A Morphologically Annotated Hebrew CHILDES Corpus. Proceedings of the Workshop on Computational Models of Language Acquisition and Loss, pages 20-22, Avignon, France, April 2012.
Anat Prior, Shuly Wintner, Brian MacWhinney and Alon Lavie. Translation ambiguity in and out of context. Applied Psycholinguistics 32(1):93-111, January 2011.
Bracha Nir, Brian MacWhinney and Shuly Wintner. A Morphologically-Analyzed CHILDES Corpus of Hebrew. Proceedings of the seventh international conference on Language Resources and Evaluation (LREC-2010), pages 1487-1490, Malta, May 2010.
Shuly Wintner. Computational Models of Language Acquisition. In alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing. Volume 6008 of Lecture Notes in Computer Science, pages 86-99, Berlin and Heidelberg: Springer. 2010.
Kenji Sagae, Eric Davis, Alon Lavie, Brian MacWhinney and Shuly Wintner. Morphosyntactic annotation of CHILDES transcripts Journal of Child Language 37(3):705-729, June 2010.