[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

8. Pretranslating PO Files

As a translator, you may save yourself some work by starting from a reasonably good translation produced by a machine, and modify that translation, to make it perfect.

Thus, before you start working on a translation, you might have the PO file pretranslated.

This process, also called machine translation, is nowadays best performed through a Large Language Model (LLM). (See https://en.wikipedia.org/wiki/Machine_translation#Neural_MT, https://en.wikipedia.org/wiki/Neural_machine_translation#Generative_LLMs.)

8.1 Installing a Large Language Model

We don't recommend to use machine translation through a web service in the cloud, controlled by someone else than yourself. Such a machine translation service would be have major drawbacks (it could go away any time, it could be used to spy on you or manipulate you, or the costs could go up beyond your control); see https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html. Additionally, such a service typically has some cost (between $10 and $25 per megabyte, as of 2025).

Instead, we recommend a Large Language Model execution engine that runs on hardware under your control. This can be a desktop computer, or for instance a single-board computer in your local network.

At this point (in 2025), a Large Language Model execution engine that is Free Software is ‘ollama’, that can be downloaded from https://ollama.com/.

Next, you will need to pick a Large Language Model. There are two properties to watch out for:

Together with an LLM of reasonable quality, such as the model ministral-3:14b, the system requirements are as follows:

Additional configuration:

8.2 Invoking the msgpre Program

 
msgpre [option...]

The msgpre program pretranslates a translation catalog.

Warning: The pretranslations might not be what you expect. They might be of the wrong form, be of poor quality, or reflect some biases.

8.2.1 Input file location

-i inputfile
--input=inputfile

Input PO file.

-D directory
--directory=directory

Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting ‘.po’ file will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

8.2.2 Output file location

-o file
--output-file=file

Write output to specified file.

The results are written to standard output if no output file is specified or if it is ‘-’.

8.2.3 Message selection

--keep-fuzzy

Keep fuzzy messages unmodified. Pretranslate only untranslated messages.

8.2.4 Large Language Model (LLM) options

--species=type

Specifies the type of Large Language Model execution engine. The default and only valid value is ollama.

--url=url

Specifies the URL of the server that runs Large Language Model execution engine. For ollama, the default is http://localhost:11434.

-m model
--model=model

Specifies the model to use. This option is mandatory; no default exists. The specified model must already be installed in the Large Language Model execution engine.

--prompt=text

Specifies the prompt to use before each msgid from the PO file. It allows you to specify extra instructions for the LLM. The prompt should include an instruction like "Translate into target language.". Some hints for good prompts are described in the article “How to write AI prompts for translation” https://poeditor.com/blog/ai-prompts-for-translation/.

--postprocess=command

Specifies a command to post-process the output from the LLM. This should be a Bourne shell command that reads from standard input and writes to standard output.

For instance, the ministral-3:14b model often emphasizes part of the output with ‘**’ characters. To eliminate these markers, you could use the command ‘sed -e 's/[*][*]//g'’.

8.2.5 Input file syntax

-P
--properties-input

Assume the input file is a Java ResourceBundle in Java .properties syntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in .strings syntax, not in PO file syntax.

8.2.6 Output details

--color
--color=when

Specify whether or when to use colors and other text attributes. See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color. See The --style option for details.

--force-po

Always write an output file even if it contains no message.

--indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n
--add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or ‘never’. If it is not given or ‘full’, it generates the lines with both file name and line number. If it is ‘file’, the line number part is omitted. If it is ‘never’, it completely suppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.

-p
--properties-output

Write out a Java ResourceBundle in Java .properties syntax. Note that this file format doesn't support plural forms and silently drops obsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax. Note that this file format doesn't support plural forms.

-w number
--width=number

Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.

-s
--sort-output

Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.

-F
--sort-by-file

Sort output by file location.

8.2.7 Informative output

-h
--help

Display this help and exit.

-V
--version

Output version information and exit.

-q
--quiet
--silent

Suppress progress indicators.

8.2.8 Examples

To pretranslate the file foo.po:

 
msgpre --model=ministral-3:14b < foo.po > foo-pretranslated.po

Note that this command can take a long time, depending on the model and the available hardware.

8.3 Invoking the spit Program

 
spit [option...]

The spit program passes its input to a Large Language Model (LLM) instance and prints the response. With the --to option, it translates its input to the specified language through a Large Language Model (LLM) and prints the translation.

Warning: The output might not be what you expect. It might be of the wrong form, be of poor quality, or reflect some biases.

8.3.1 Large Language Model (LLM) options

--species=type

Specifies the type of Large Language Model execution engine. The default and only valid value is ollama.

--url=url

Specifies the URL of the server that runs Large Language Model execution engine. For ollama, the default is http://localhost:11434.

-m model
--model=model

Specifies the model to use. This option is mandatory; no default exists. The specified model must already be installed in the Large Language Model execution engine.

--to=language

Specifies the target language. language may be specified as an ISO 639 language code (such as fr for French), as a combination of an ISO 639 language code and an ISO 3166 country code (such as fr_CA for French in Canada, or zh_TW for traditional Chinese), or as the English name of a language (such as French).

The effect of this option is to add a prompt similar to "Translate to language:".

--prompt=text

Specifies the prompt to use before the input that comes from standard input. It allows you to specify extra instructions for the LLM.

This option overrides the --to option.

--postprocess=command

Specifies a command to post-process the output. This should be a Bourne shell command that reads from standard input and writes to standard output.

For instance, the ministral-3:14b model often emphasizes part of the output with ‘**’ characters. To eliminate these markers, you could use the command ‘sed -e 's/[*][*]//g'’.

8.3.2 Informative output

-h
--help

Display this help and exit.

-V
--version

Output version information and exit.

8.3.3 Examples

Machine translation of a single sentence:

 
$ echo 'Translate into German: "Welcome to the GNU project!"' \
    | spit --model=ministral-3:14b \
           --postprocess="sed -e 's/[*][*]//g'"
"Willkommen zum GNU-Projekt!"

The perfect translation would be "Willkommen beim GNU-Projekt!". You can see: some manual adjustment after the machine translation is needed.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Bruno Haible on January, 13 2026 using texi2html 1.78a.