23.05.2019

Should Wikipedia consider banning machine translations on its platform if there is no human supervision?

by Pisana Ferrari – cApStAn Ambassador to the Global Village

Content translation tools on Wikipedia allow editors to generate a preview of a new article based on an automated translation from another edition. Used correctly, these tools can save valuable time for editors, in particular those building out understaffed language editions – but when it goes wrong, the results can be disastrous, says the author of a recent article in “The Verge”. (2) One global administrator pointed to a particularly bad translation from English to Portuguese. What is “village pump” in the English version became “bomb the village” when put through machine translation into Portuguese.

Shoddy machine translations have become such a problem that some language editions of Wikipedia have created admin rules to stamp them out, reports the author of this article. João Alexandre Peschanski, a professor of journalism at Faculdade Cásper Líbero in Brazil, who teaches a course on Wikiversity, says a community-wide strategy to improve machine learning should be discussed, as “we might be losing efficiency in our rather arduous translation endeavour”. A broader possible solution, he says, would be to enact some sort of project-wide policy banning machine translations without human supervision.

Our CEO, Steve Dept, comments that he is not a “staunch opponent” of machine translation but, much rather, promotes research to identify good models of “augmented translation”, where machine output is used selectively to boost a human translator’s output (3). There are good MMT workflows (Man-machine translation, his own acronym) where source versions are optimized before being fed into a well maintained bespoke MT engine, and where validated translation memories take precedence over machine output. Then there are various stages of post-editing and coherence/consistency checks. The latter can also be partly automated. However, pure MT can at best help a reader to get a sense of the general meaning of a text written in a language in which s/he is not conversant. 

1) https://www.capstan.be/the-language-imbalance-and-lopsided-geography-of-wikipedia/

2) https://www.theverge.com/2019/5/8/18526739/wikipedia-translation-tool-machine-learning-ai-english

3) Steve Dept’s blog: https://www.linkedin.com/feed/update/urn:li:activity:6533660763846950912