Machine Translation and Game Localization: Common Pitfalls Illustrated with DeepL

These days, everyone is talking about DeepL as being revolutionary for the world of machine translation (MT) with

These days, everyone is talking about DeepL as being revolutionary for the world of machine translation (MT) with an output of far higher quality than what you get with Google Translate. I put DeepL to the test on various kinds of texts and I have to admit that it is indeed much better than Google Translate. However, MT is still only useful to get the gist of a text in a language that you don’t speak at all and it is never acceptable for any kind of publication. Yes, even for video games.

MT has no sensibility or judgement, no professional experience, no humour, no common sense that allows it to second guess what the original writers had in mind, especially if they are not native speakers of the language they are writing in. Professional human localization translators have learned to recognize the various obstacles and pitfalls in video game localization, while machine translation succumbs to them every single time.

Machine translation can make at least all the mistakes that a human translator can potentially make, except that a human will draw on their experience to avoid the majority of these issues and will ask the client for clarification if guesses are not enough. Read on to see why, even though there are many glorious reviews claiming that DeepL is being better than Google Translate, you should not use it to translate your game for free.

User interface

The UI is the backbone of a game, the structure that holds everything together. It’s made up of the buttons that appear on screen, the various menus that allow you to change the settings, the error messages, etc. It is the container as opposed to the game content. UI text often appears in text files as a list of very short strings of 1 or 2 words, usually with very little or no context. It’s technical stuff, right? So does it make sense to cut costs by using MT for the UI and to keep the expensive human translators for the dialogs and creative texts?

Consider these few strings that I have taken from different games, which came without context or even string IDs. Of course, I have kept only the problematic strings but, as you can see, they are essential to the game and will be seen by all players. A mistake in the UI can ruin the game experience.

UI2

For those of you who don’t speak French, here is a breakdown of the issues. Every single string is badly translated.
—Random words were added: “Options” became “Possible Options”, “Quality” became “Quality of the quality” and “Text” became “Text Text”

—The translation doesn’t fit the context: “Back” has been translated as the body part, “Graphics” as “Charts”, “Characters” as “Graphic symbol” or “personality traits”, “Resume” as “Curriculum vitae” and “Single” as “Not in a relationship”.

—Word for word translation and nonsense: “Master volume” and “Effects volume” have been translated word for word and make no sense. “Load Game” has been translated as “Game of loading” and “Overwrite Game” as “Game of overwriting”, which seem to mean that “loading” and “overwriting” are game genres.

—Wrong grammatical form: “Windowed” became “Window”, “Quit” became the conjugated form “[he] quits” and “Restart” became “Restarting”.

In smaller games, the menu is made of only a dozen words. The number of ways MT can mess them up is far too high for it to be worth it. For a human translator used to translating games, translating UI is a piece of cake. The same strings are used again and again (Help, Music volume, Pause, etc.), which makes their translation straightforward, with the exception of a few complex and specific settings. For more unusual strings, the translator can deduct the meaning—and check the available space—from the game build, the screenshots or the string IDs. If all else fails, a quick message to the client should clarify things.

Using MT for UI translation is very risky, possibly even more so than for creative texts, dialogs and stories, since the issues are more insidious and can be very hard to spot before it’s too late, i.e., when the text has been implemented.

Placeholders

Another technical part of game translation, and software localization in general, is the use of variables in placeholders. These are typically something that MT can’t work with because they don’t have an inherent meaning. When I read “You get {0} boxes”, I know immediately that {0} is a number. When I read “Press %@ to attack”, I know that %@ is a button name or possibly a button icon. When I read “Well done {0}! You’re a %@ warrior now!”, I know that {0} is not a number anymore in this context, but probably the name of the player and %@ is not a button name or icon, but something that indicates the rank or quality of the warrior.

Moreover, MT is often confused by the extra punctuation signs in {1}, [Noun] or %@ and tries to place them in the correct spot, often without great success, and sometimes even breaks their integrity. Of course, the main issue is that some MT will translate placeholders like [ATTACK] to [ATTAQUE], even though these should never be modified.

placeholders3

With more placeholders, DeepL simply gives up:

placeholders

Google Translate does slightly better, but leaves words in English and breaks the code by translating the placeholders (‘Icons’ and ‘Currency’ become ‘Icônes’ and ‘Monnaie’):

placeholders2

So far, it seems that because of the technical specificities of video game localization, MT could be useful with some serious post-editing. Checking that the code hasn’t been altered in any way is very painful, since the changes can be very subtle. I recently reviewed a translation in which the translator had automatically changed all the single straight quotes for curly quotes, which made the typography nicer in the actual text, however, it unfortunately broke the code completely.

Punctuation

A friend recently told me about a colleague who claimed, “I don’t put the space in percentages in French [compulsory hard space like so: 30 %]. After all, it’s only a video game, why should we follow the rules?”. Why indeed? While we’re at it, why bother with any rule at all? Farewell grammar, spelling and syntax. It’s only a game, after all. I’m being sarcastic, of course. The point is, why, as a developer, would you consider your own work to be subpar compared to other forms of publication like newspapers, movies or books? By using MT, this is exactly what you are doing, and it will be obvious through some things that are essential but often overlooked, like punctuation.

An unbreakable space is compulsory before colons, semicolons, question marks and exclamation marks in French, and French quotation marks (« ») are compulsory too. In the example above, DeepL simply followed the source punctuation and didn’t adapt it to the French punctuation and typography rules at all, which makes it unpublishable. Interestingly, Google Translate uses the correct punctuation marks but misses all the compulsory spaces before the colon and the question marks and inside the quotation marks.

google translate

These incorrect punctuation rules make the text look a bit off. Even if the players are not actually aware of the issue, even if they can’t quite put their finger on it, it will be too late: the suspension of disbelief essential to the game experience will be damaged.

Technical checks and official names

When developers submit a game to the biggest publishers, they have to pass a number of checks (TRC for Sony, TCR for Microsoft and Lotcheck for Nintendo). These will include checking that the correct name is used for each button and feature of their product, be it the Nintendo Switch Joy-Con or the PlayStation system software. For example, if a string is “Press any button”, the translation will be different for PlayStation (Appuyez sur une touche) and for Nintendo (Appuyez sur un bouton). Translating help pages and user manuals is not as straightforward as it looks. This same issue happens with the word “Settings” when talking about “your mobile phone settings”. On iOS, the Settings app is called “Réglages”, while on Android it is called “Paramètres”. MT has no way of knowing which term should be used.

Consistency

On the subject of user manuals, another essential issue in game localization, and in translation in general, is consistency. A given skill, item or power must always have the same translation everywhere: in game, in the manual, in the help pages, in the hints, on the website, in the store descriptions, etc. MT used on its own does not remember its previous translations and will choose the words based on the surrounding sentence, not on the past translations.

inconsistent

Here, the same word, ‘try’ has been translated in two different ways ‘essaient’ and ‘essayent’. While both spellings are correct, this is inconsistent.

Confidentiality

The confidentiality issue was illustrated last year by the translate.com fiasco, which saw a large amount of highly sensitive data released online. Whatever you feed into online translation services is stored somewhere and can potentially be made available to the public. Of course, it is possible to deploy a private machine translation portal, but this costs a lot of time and money for a result that will still be far inferior to what a translator would produce.

Length restrictions

I once translated a game in which each and every string had to be below 140 characters—purely for aesthetic reasons, no connection with Twitter. Needless to say, MT, or at least free MT, doesn’t take into account any length restriction. This can be problematic for games designed for handheld devices, such as mobile phones, the 3DS and the Switch, on which there is very little space available for each button or item name. A human translator will take into account all the available pieces of information—character limitations, screenshots of the UI, etc.—to ensure that the translated text fits into the allocated space.

Creative text

This is the heart of the matter. This part is what too many people reduce game localization to, overlooking all the aspects mentioned earlier. Creative text includes dialogs and stories, subtitles and voiceovers, hints, tutorials, store text, press releases and everything marketing, all of which can be made up of nicely flowing text with puns, jokes, rhymes or obscure references. Will MT recognize a quote from Stranger Things, a description alluding to a character in The Lord of the Rings or a phrase that stems from a recent Reddit post? No, it won’t.

A brainstorming game writer, maybe.

This part is also the most obviously problematic with machine translation. It’s a bit like asking a child who just learned how to spell to write a feature for the New York Times. Language is more than a set of rules and that is where human creativity can’t be replaced. Translators can create new words (neologisms), use incorrect syntax to add flavour to a character (like Yoda), or replace a reference to an obscure ‘90s TV show known only to American kids by a similar reference that will be meaningful to the target audience. Bending the rules is essential, and MT just can’t do it. The creative writers who worked on the source text spent weeks polishing each sentence, so it makes no sense to think that MT can produce a suitable translation in a few seconds. It is also essential to rewrite and move away from the syntax and structure of the source. And of course, phrases and idioms can’t be translated literally. They absolutely have to be adapted.

An example will be more useful than lengthy explanations, so here are some sentences from various games, translated from English into French.

“Rats” has been translated literally even though this word is not used as an exclamation in French. Similarly, “nail” has been translated as nailing the dish, with an actual nail.

Untranslated text, literal translation for “on the books” and awkward phrasing.

As we have seen, there are so many things to take into account when translating a game that there are more than a few places where automatic translation might go awry. I work from English to French and these languages are very close. The syntax is basically the same (Subject-Verb-Object), and with a few exceptions (for example “siblings”, which doesn’t have a direct translation in French), all words have a direct equivalent. This means that all these issues I keep stumbling upon are only scratching the surface of the bottomless pit of potential problems. With a combination of languages that are more remote, many more issues appear. At this point, I want to mention the amazing work of Clyde Mandelin, a Japanese to English video games translator with loads of experience, who fed the entire text of Final Fantasy IV to Google Translate, and played the Google translated version for hours on end.

DeepL is currently only available in seven languages. I’m waiting with sincere curiosity and impatience for them to introduce more languages, as I would be genuinely happy if MT could help me make sense of the original Japanese or Chinese when I’m working from the broken non-native English translations.

One thought on “Machine Translation and Game Localization: Common Pitfalls Illustrated with DeepL

Leave a Comment