NEURAL MACHINE TRANSLATION (NMT): ZERO SHOT TRANSLATION

Sudarshan S
3 min readAug 11, 2022

We all have used Google translate at least once, haven’t you? Then try it once, you will be amazed! How can a machine can translate into various languages within a second? This concept in Machine learning is called “Machine Translation”.

Machine translation

Machine translation is an automatic process of a machine to translate from one language (source language) to another language (target language) without any human involvement. The machine translation can be achieved by using some pre-trained models also like confomer, jasper, BERT. The machine translation makes use of transformers to translate from one language to another language.

Transformer

So, what is a transformer in Machine learning?

A transformer is an encoder-decoder model which uses “Attention”. Attention is used in the encoder-decoder model to make the model as transformer model.

The Machine translation models that are used earlier are:
1. Statistical Machine translation (SMT)
2. Rule-based Machine translation (RMT)
3. And recently, Neural Machine Translation (NMT) has been used by most of the models.

NMT uses neural networks and artificial intelligence to get the job done. This is most complicated machine translation and it’s the most effective too. Neural machine translation models map from source to target language directly. NMT models are highly dependent on the availability of extensive parallel data. But unfortunately, the extensive parallel data are available for only limited number of languages. Hence, we should somehow map the other languages too for NMT model.

Hence for low-resource languages, we can use the multilingual NMT, which is an NMT model used to translate various languages. In Multilingual NMT, we make use of a single NMT model to translate various languages.

But, have you thought that how can be the low-resource languages are translated with low or no dataset? That’s where we Zero shot learning concept as Zero shot translation. We use cross-lingual model which shares the parameters across multiple languages.

In Multilingual NMT, there would be no change in the standard NMT model, but the only change is that there will be a beginning artificial token, which indicates the target language to which the input/source language should be translated.

Zero shot translation

Before knowing about what is zero shot translation, let’s first know what is “Zero shot learning”.

Zero shot learning allows the model to recognize the things that it hasn’t seen before. In other words, the model can able to recognize the data that wasn’t in the training dataset. The model learns how to classify classes that it hasn’t seen before.

Why zero shot learning is necessary?
The reason we need zero shot learning is that, collection and annotating large number of samples are impossible. Though we do so, new classes are always emerging. So, it is better to use zero shot learning to allow the model to recognize the unseen data.

Now coming to zero shot translation, the same principle of zero shot learning applies here. The zero shot translation allows the model to translate between the language pairs that it hasn’t seen before. For example, if the model is trained using the following language pairs: English to German, English to Spanish.

Then the model is also capable of translating from German to Spanish, though the model hasn’t trained using German-Spanish language pair dataset. This is the major scope of zero shot translation in the Multilingual NMT.

Performance metrics of Machine translation

BLEU Score

BLEU (Bilingual Evaluation understudy) is an algorithm used for evaluating the quality of text that has been machine-translated from one language to another. The quality is considered by comparing the machine output with the human output. Scores are calculated for individual translated segments, i.e. for individual sentences. Those scores are then averaged to estimate the overall quality of the translation.

The BLEU score’s range is always between 0 to 1. The score that are closer to 1 represents that the machine-translated text is similar to human translated text and the score closer to 0 indicated the dissimilarity between the human and machine-translation.

Stay updated!! Happy learning!

--

--

Sudarshan S

Tech enthusiast | Developer | Machine learning | Data science | Cybersecurity