Skip Navigation

Deepseek is great at translation tasks thanks to its reasoning capabilities

(We're going to need an AI community lol instead of posting to genzedong all the time)

Today I want to share how I use Deepseek to translate text.

I was given a friend's game to translate in various languages. These usually come in json files or similar, which are structured in a specific way with a key:value pair. This makes it easier for devs and translators to handle various languages

It's also a file structure LLMs understand very well.

The file can look like this:

 
        "tx_b_menu"			: "Menu",
    "tx_b_newgame"			: "New Game",
    "tx_b_continue"			: "Continue",
    "tx_b_options"			: "Options",


  

etc

If it's properly set up, like this one, then that's additional context the LLM can use to understand what it's translating by reading the key property.

Now mind you, this file has only 650 lines in it - it's a small indie game. This is something deepseek can handle in one go without needing to break it up in various tasks. My upper limit was sending it 1060 lines of JS so far so it has some context size.

These strings can also contain variables such as [%v] in them, which will be replaced by numbers or words in game. It can also contain other markings such as [color=yellow]text[/c] to indicate the text should display as yellow.

I made a huge prompt for deepseek just to properly frame the task, but on the first try, thanks to the reasoning capabilities, it understood the structure just fine and that it should leave variables and other markings alone.

To complete the translation, I sent deepseek the json file instead of pasting it (it can read text but not pictures), but I sent two of them: one in english, the other in french. Both were human-made and so should be consistent across each other. That way, deepseek can properly (and hopefully) cross-reference the two files to eliminate ambiguity if it's not sure about a string. I once saw "Options" translated as "Choices" in a game, so.

In my prompt, I explained:

  • that I was sending two json files, and which languages they were in (and which one is which)
  • explained what the json file is and why games do this
  • told deepseek it was a professional translator who knows both languages perfectly AND has experience working with video games and devs
  • task is to translate to [language] while leaving the json alone, i.e. you have key:value pairs and you can only touch the value portion, not the key.
  • retain the flair of the original strings, meaning don't add or change stuff that's not there (very important)
  • also explained the game is 'fantasy' setting so it knows what kind of words it's looking for.
  • read the files carefully first before doing any translation
  • how to handle special characters like \n so as not to break the UI
  • Remind deepseek to translate to X language, and to output translated content.

It took around 25 seconds to think about it, catching stuff I didn't necessarily think about and thinking about how it would approach the task. Then, it just generated a complete json file

This was all done through the web interface. Because the file is so small in the first place, I don't need the API which you have to pay for. Too much of a hassle.

It does take a while to output the translated strings, but that's okay. I'm playing another game while it does that and check back in a while. I just have to wait.

The translated strings come in a perfect JSON format and I can even click to download the file. Then I just need to rename it, and I can test it in the game.

With that, you can translate stuff very easily and make it accessible more broadly. There are hundreds of theory essays and books that only exist in one language. I've already used older LLMs for book translation tasks. By properly testing your prompt

Some caveats:

  • I only stay on HRL (High resource languages), since there is sufficient training data for the LLM. It will hallucinate in some languages.
  • Make a translation to a language you can read first so you know how it handles it, and refine your prompt afterwards. Keep doing small batch tests like this (a few strings at a time) until you're satisfied. The prompt I shared above was created after years of doing translation tasks with LLMs, I know what to tell them (mostly) now.
  • Also confirm your translations in languages you don't understand. Run some strings through google translate, ask someone who speaks the language if they can take a quick look, google the terms - for example I took its word for fireball in Japanese and looked online to confirm it was used in other contexts (I found it on magic cards lol).
  • Is it perfect? probably not. But the original translations were done by amateurs too (e.g. me for French) because the dev, like many people, has no money to pay a professional for everything.

But the good part is that it doesn't destroy the original, right? A human can always come along and do a perfect human translation, or you can always redo the translations later with better models. It's not destructive.

Hope this helps you out. If there's theory books that only exist in your language, I can only recommend making them accessible. We'd be happy to host them on prolewiki if you don't know where to disseminate them. (spoiler: usually you go through the trouble and then find a super obscure translated edition from 50 years ago as soon as you finish lol)

-> late edit: since the game is not compiled (like a ton of indie games), it's also possible for users to add their own language if it's missing and they would like it. I expect this usecase will become bigger in the future, being able to customize your software and tailor it to your needs.

You can, for example, already find models that will generate subtitle files from a video (https://freesubtitles.ai/ is one I've used a few times, it's free lol). If a series you'd like to watch is not available in your language, then you can have subtitles generated for it and enjoy it today.

4 comments