Working at the intersection of linguistics and artificial intelligence to advance machine translation performance
by Pisana Ferrari – cApStAn Ambassador to the Global Village
Chris Callison-Burch –associate professor in Computer and Information Science, University of Pennsylvania — has in past years developed novel cost- and time-saving methods to translate languages, including crowdsourcing and images. In this recent interview for “Medium” he shares a new translation method which is very promising for some of the world’s most difficult-to-translate languages. His research group used images (for instance, of a cat) plus vast quantities of crowdsourced data identifying linked words for each image, to create “reverse-engineered dictionaries” for 10.000 words in 100 languages. Images “are somehow interlingual”, he says, i.e. an image of a cat is the same whether in English or Indonesian, and simplified representations of images were used to train the model. “This language-independent way of thinking about words through their visual representations allows us to use a new type of data to learn translations.” In a recent post we mentioned research along the same lines by Chinese e-commerce giant Alibaba, which has trained a NMT system with image descriptions in multiple languages.
Medium article: https://medium.com/penn-engineering/translating-the-worlds-languages-e100f98c4c1d
cApStAn blog post on NMT research: https://www.capstan.be/promising-research-on-machine-translation-for-low-resource-languages/