Computational Investigation of Translationese

Project description

Objective: To use insights from Translation Studies for improving the quality of machine translation; and to use computational methodology for corroborating hypotheses of Translation Studies.

Researchers: In Haifa, Noam Ordan, Gennadi Lembersky, Vered Volansky, Naama Twitto, Ehud Alexander Avner, Ella Rabinovich and Shuly Wintner. This project is joint with a team at Bar Ilan University, headed by Moshe Koppel. We also collaborate with Sergiu Nisioi in Bucharest.

Status: Complete

Funding: ISF (grant 137/06); Israel Ministry of Science and Technology.

Abstract

We propose to develop methodologies for improving the quality of (statistical) machine translation (SMT), using novel machine-learning-based text categorization approaches. Our main motivation is research in Translation Studies, that establishes the ontological difference between translated and original texts. We propose to use computational linguistic methods to further explore such differences. We will use machine-learning-based text categorization techniques, informed by features that are motivated by Translation Studies theories, to determine with high accuracy whether a given text is original or a translation. The resulting insights will drive two additional key research directions, improving both the Language Models and the Translation Models used in SMT. The potential contribution of this work is dramatic: we already have preliminary results that show significant improvement in the quality of SMT. This work also carries with it a huge commercial potential.

Resources

A corpus of native, non-native and translated texts, see a detailed description. Please cite Nisioi et al. (2016) if you're using this corpus.
Corpora for translationese research, with a reliable indication of the direction of translation. A detailed documentation is provided in Rabinovich et al. (2016), which we would like you to cite if you are using the corpora.
The UN Parallel Corpus Annotated for Translation Direction, see a detailed description in Tolochinsky et al. (2018).

Publications

Ilia Sominsky and Shuly Wintner. Automatic Detection of Translation Direction. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1131--1140, Varna, Bulgaria, September 2019. 📖
Elad Tolochinsky, Ohad Mosafi, Ella Rabinovich and Shuly Wintner. The UN Parallel Corpus Annotated for Translation Direction. Unpublished manuscript, arXiv:1805.07697 [cs.CL], 2018. 📖
Ella Rabinovich, Noam Ordan and Shuly Wintner. Found in Translation: Reconstructing Phylogenetic Language Trees from Translationss. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL-2017), pages 530-540, Vancouver, Canada, July 2017. 📖
Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia and Shuly Wintner. Personalized Machine Translation: Preserving Original Author Traits. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), pages 1074--1084, Valencia, Spain, April 2017. 📖
Shuly Wintner. Computational Approaches to Translation Studies. In Patrick Marcel and Esteban Zimanyi, editors, Business Intelligence, chapter 2, pages 38-58, Berlin and Heidelberg: Springer. 2017 📖
Ella Rabinovich, Sergiu Nisioi, Noam Ordan and Shuly Wintner. On the Similarities Between Native, Non-native and Translated Texts. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016), pages 1870-1881, Berlin, Germany, August 2016. 📖
Sergiu Nisioi, Ella Rabinovich, Liviu P. Dinu and Shuly Wintner. A Corpus of Native, Non-native and Translated Texts. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2016), pages 4197-4201, Portoroz, Slovenia, May 2016. 📖
Ella Rabinovich, Shuly Wintner and Ofek Luis Lewinsohn. A Parallel Corpus of Translationese. Proceedings of the 17th International Confernece on Computational Linguistics and Intelligent Text Processing (CICLing-2016), pages 140-155, Konya, Turkey, April 2016. 📖
Ehud Alexander Avner, Noam Ordan and Shuly Wintner. Identifying translationese at the word and sub-word level. Digital Scholarship in the Humanities 31(1):30-54, April 2016. 📖
Ella Rabinovich and Shuly Wintner. Unsupervised Identification of Translationese. Transactions of the Association for Computational Linguistics 3:419-432, 2015. 📖
Naama Twitto, Noam Ordan and Shuly Wintner. Statistical Machine Translation with Automatic Identification of Translationese. Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT-2015), pages 47-57, Lisbon, Portugal, September 2015. 📖
Vered Volansky, Noam Ordan and Shuly Wintner. On the features of translationese. Digital Scholarship in the Humanities 30(1):98-118, April 2015. 📖
Gennadi Lembersky, Noam Ordan and Shuly Wintner. Improving Statistical Machine Translation by Adapting Translation Models to Translationese. Computational Linguistics 39(4):999-1023, December 2013. 📖
Yulia Tsvetkov, Naama Twitto, Nathan Schneider, Noam Ordan, Manaal Faruqui, Victor Chahuneau, Shuly Wintner, and Chris Dyer. Identifying the L1 of non-native writers: the CMU-Haifa system, Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 279-287, Atlanta, Georgia, June 2013. 📖
Gennadi Lembersky, Noam Ordan and Shuly Wintner. Language Models for Machine Translation: Original vs. Translated Texts. Computational Linguistics 38(4):799-825, December 2012. 📖
Gennadi Lembersky, Noam Ordan and Shuly Wintner. Adapting Translation Models to Translationese Improves SMT. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 255-265, Avignon, France, April 2012. 📖
Gennadi Lembersky, Noam Ordan and Shuly Wintner. Language Models for Machine Translation: Original vs. Translated Texts. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), pages 363-374, Edinburgh, Scotland, July 2011. 📖

Page updated

Report abuse