| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
After preparing the sources, the programmer creates a PO template file.
This section explains how to use xgettext for this purpose.
xgettext creates a file named ‘domainname.po’.  You
should then rename it to ‘domainname.pot’.  (Why doesn't
xgettext create it under the name ‘domainname.pot’
right away?  The answer is: for historical reasons.  When xgettext
was specified, the distinction between a PO file and PO file template
was fuzzy, and the suffix ‘.pot’ wasn't in use at that time.)
xgettext Program | xgettext [option] [inputfile] … | 
The xgettext program extracts translatable strings from given
input files.
Input files.
Read the names of the input files from file instead of getting them from the command line.
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting ‘.po’ file will be written relative to the current directory, though.
If inputfile is ‘-’, standard input is read.
Use ‘name.po’ for output (instead of ‘messages.po’).
Write output to specified file (instead of ‘name.po’ or ‘messages.po’).
Output files will be placed in directory dir.
If the output file is ‘-’ or ‘/dev/stdout’, the output is written to standard output.
Specifies the language of the input files.  The supported languages
are C, C++, ObjectiveC, PO, Shell,
Python, Lisp, EmacsLisp, librep, Scheme,
Guile,
Smalltalk, Java, JavaProperties, C#, awk,
YCP, Tcl, Perl, PHP, Ruby,
GCC-source, NXStringTable, RST, RSJ, Glade,
Lua, JavaScript, Vala, GSettings, Desktop.
This is a shorthand for --language=C++.
By default the language is guessed depending on the input file name extension.
Specifies the encoding of the input files. This option is needed only if some untranslated message strings or their corresponding comments contain non-ASCII characters. Note that Tcl and Glade input files are always assumed to be in UTF-8, regardless of this option.
By default the input files are assumed to be in ASCII.
Join messages with existing file.
Entries from file are not extracted. file should be a PO or POT file.
Place comment blocks starting with tag and preceding keyword lines in the output file. Without a tag, the option means to put all comment blocks preceding keyword lines in the output file.
Note that comment blocks are only extracted if there is no program code between the comment and the string that gets extracted. For example, in the following C source code:
| /* This is the first comment.  */
gettext ("foo");
/* This is the second comment: not extracted  */
gettext (
  "bar");
gettext (
  /* This is the third comment.  */
  "baz");
/* This is the fourth comment.  */
gettext ("I love blank lines in my programs");
 | 
the second comment line will not be extracted, because there is a line with some tokens between the comment line and the line that contains the string. But the fourth comment is extracted, because between it and the line with the string there is merely a blank line.
Perform a syntax check on msgid and msgid_plural. The supported checks are:
Prefer Unicode ellipsis character over ASCII ...
Prohibit whitespace before an ellipsis character
Prefer Unicode quotation marks over ASCII "'`
Prefer Unicode bullet character over ASCII * or -
The option has an effect on all input files.  To enable or disable
checks for a certain string, you can mark it with an xgettext:
special comment in the source file.  For example, if you specify the
--check=space-ellipsis option, but want to suppress the check on
a particular string, add the following comment:
| /* xgettext: no-space-ellipsis-check */
gettext ("We really want a space before ellipsis here ...");
 | 
The xgettext: comment can be followed by flags separated with a
comma.  The possible flags are of the form ‘[no-]name-check’,
where name is the name of a valid syntax check.  If a flag is
prefixed by no-, the meaning is negated.
Some tests apply the checks to each sentence within the msgid, rather
than the whole string.  xgettext detects the end of sentence by
performing a pattern match, which usually looks for a period followed by
a certain number of spaces.  The number is specified with the
--sentence-end option.
The supported values are:
Expect at least one whitespace after a period
Expect at least two whitespaces after a period
Extract all strings.
This option has an effect with most languages, namely C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade, Lua, JavaScript, Vala, GSettings.
Specify keywordspec as an additional keyword to be looked for. Without a keywordspec, the option means to not use default keywords.
If keywordspec is a C identifier id, xgettext looks
for strings in the first argument of each call to the function or macro
id.  If keywordspec is of the form
‘id:argnum’, xgettext looks for strings in the
argnumth argument of the call.  If keywordspec is of the form
‘id:argnum1,argnum2’, xgettext looks for
strings in the argnum1st argument and in the argnum2nd argument
of the call, and treats them as singular/plural variants for a message
with plural handling.  Also, if keywordspec is of the form
‘id:contextargnumc,argnum’ or
‘id:argnum,contextargnumc’, xgettext treats
strings in the contextargnumth argument as a context specifier.
And, as a special-purpose support for GNOME, if keywordspec is of the
form ‘id:argnumg’, xgettext recognizes the
argnumth argument as a string with context, using the GNOME glib
syntax ‘"msgctxt|msgid"’.
Furthermore, if keywordspec is of the form
‘id:…,totalnumargst’, xgettext recognizes this
argument specification only if the number of actual arguments is equal to
totalnumargs.  This is useful for disambiguating overloaded function
calls in C++.
Finally, if keywordspec is of the form
‘id:argnum...,"xcomment"’, xgettext, when
extracting a message from the specified argument strings, adds an extracted
comment xcomment to the message.  Note that when used through a normal
shell command line, the double-quotes around the xcomment need to be
escaped.
This option has an effect with most languages, namely C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade, Lua, JavaScript, Vala, GSettings, Desktop.
The default keyword specifications, which are always looked for if not explicitly disabled, are language dependent. They are:
gettext, dgettext:2,
dcgettext:2, ngettext:1,2, dngettext:2,3,
dcngettext:2,3, gettext_noop, and pgettext:1c,2,
dpgettext:2c,3, dcpgettext:2c,3, npgettext:1c,2,3,
dnpgettext:2c,3,4, dcnpgettext:2c,3,4.
NSLocalizedString, _,
NSLocalizedStaticString, __.
gettext, ngettext:1,2, eval_gettext,
eval_ngettext:1,2, eval_pgettext:1c,2,
eval_npgettext:1c,2,3.
gettext, ugettext, dgettext:2,
ngettext:1,2, ungettext:1,2, dngettext:2,3, _.
gettext, ngettext:1,2, gettext-noop.
_.
_.
gettext, ngettext:1,2, gettext-noop.
GettextResource.gettext:2,
GettextResource.ngettext:2,3, GettextResource.pgettext:2c,3,
GettextResource.npgettext:2c,3,4, gettext, ngettext:1,2,
pgettext:1c,2, npgettext:1c,2,3, getString.
GetString, GetPluralString:1,2,
GetParticularString:1c,2, GetParticularPluralString:1c,2,3.
dcgettext, dcngettext:1,2.
::msgcat::mc.
gettext, %gettext, $gettext, dgettext:2,
dcgettext:2, ngettext:1,2, dngettext:2,3,
dcngettext:2,3, gettext_noop.
_, gettext, dgettext:2, dcgettext:2,
ngettext:1,2, dngettext:2,3, dcngettext:2,3.
label, title, text, format,
copyright, comments, preview_text, tooltip.
_, gettext.gettext, gettext.dgettext:2,
gettext.dcgettext:2, gettext.ngettext:1,2,
gettext.dngettext:2,3, gettext.dcngettext:2,3.
_, gettext, dgettext:2,
dcgettext:2, ngettext:1,2, dngettext:2,3,
pgettext:1c,2, dpgettext:2c,3.
_, Q_, N_, NC_, dgettext:2,
dcgettext:2, ngettext:1,2, dngettext:2,3,
dpgettext:2c,3, dpgettext2:2c,3.
Name, GenericName, Comment,
Keywords.
To disable the default keyword specifications, the option ‘-k’ or ‘--keyword’ or ‘--keyword=’, without a keywordspec, can be used.
Specifies additional flags for strings occurring as part of the argth
argument of the function word.  The possible flags are the possible
format string indicators, such as ‘c-format’, and their negations,
such as ‘no-c-format’, possibly prefixed with ‘pass-’.
The meaning of --flag=function:arg:lang-format
is that in language lang, the specified function expects as
argth argument a format string.  (For those of you familiar with
GCC function attributes, --flag=function:arg:c-format is
roughly equivalent to the declaration
‘__attribute__ ((__format__ (__printf__, arg, ...)))’ attached
to function in a C source file.)
For example, if you use the ‘error’ function from GNU libc, you can
specify its behaviour through --flag=error:3:c-format.  The effect of
this specification is that xgettext will mark as format strings all
gettext invocations that occur as argth argument of
function.
This is useful when such strings contain no format string directives:
together with the checks done by ‘msgfmt -c’ it will ensure that
translators cannot accidentally use format string directives that would
lead to a crash at runtime.
The meaning of --flag=function:arg:pass-lang-format
is that in language lang, if the function call occurs in a
position that must yield a format string, then its argth argument
must yield a format string of the same type as well.  (If you know GCC
function attributes, the --flag=function:arg:pass-c-format
option is roughly equivalent to the declaration
‘__attribute__ ((__format_arg__ (arg)))’ attached to function
in a C source file.)
For example, if you use the ‘_’ shortcut for the gettext function,
you should use --flag=_:1:pass-c-format.  The effect of this
specification is that xgettext will propagate a format string
requirement for a _("string") call to its first argument, the literal
"string", and thus mark it as a format string.
This is useful when such strings contain no format string directives:
together with the checks done by ‘msgfmt -c’ it will ensure that
translators cannot accidentally use format string directives that would
lead to a crash at runtime.
This option has an effect with most languages, namely C, C++, ObjectiveC,
Shell, Python, Lisp, EmacsLisp, librep, Scheme, Guile, Java, C#, awk,
YCP, Tcl, Perl, PHP, GCC-source, Lua, JavaScript, Vala.
Defines the behaviour of tagged template literals with tag word.
This option has an effect only with language JavaScript.
format is a symbolic description
of the first step of the JavaScript function named word,
namely how this function constructs a format string
based on the parts of the template literal.
Currently only one value is supported: javascript-gnome-format,
which describes the construction of a format string with numbered placeholders
{0}, {1}, {2}, etc.
For example, javascript-gnome-format transforms the template literal
word`My name is ${id.name} and I am ${id.age} years old.`
into the format string "My name is {0} and I am {1} years old.".
Understand ANSI C trigraphs for input
(deprecated, since trigraphs have been removed from ISO C 23).
This option has an effect only with the languages C, C++, ObjectiveC.
Recognize Qt format strings.
This option has an effect only with the language C++.
Recognize KDE 4 format strings.
This option has an effect only with the language C++.
Recognize Boost format strings.
This option has an effect only with the language C++.
Use the flags c-format and possible-c-format to show who was
responsible for marking a message as a format string.  The latter form is
used if the xgettext program decided, the former form is used if
the programmer prescribed it.
By default only the c-format form is used.  The translator should
not have to care about these details.
This implementation of xgettext is able to process a few awkward
cases, like strings in preprocessor macros, ANSI concatenation of
adjacent strings, and escaped end of lines for continued strings.
When some of the input files are XML files
and they are not of one of the types covered
by the system-wide installed *.its files,
a *.its file is needed for each such file type,
so that xgettext can handle them.
There are two ways to specify such a file:
GETTEXTDATADIRS.
Together with the *.its file, you need a corresponding *.loc file
(see section Preparing Rules for XML Internationalization).
Furthermore you need to store these files
in a directory ‘parent_dir/its/’
and set the environment variable GETTEXTDATADIRS to include
parent_dir.
More generally, the value of GETTEXTDATADIRS should be
a colon-separated list of directory names.
Note that when the option --its is specified,
the system-wide installed *.its files are ignored
and the environment variable GETTEXTDATADIRS has no effect either.
Specify whether or when to use colors and other text attributes.
See The --color option for details.
Specify the CSS style rule file to use for --color.
See The --style option for details.
Always write an output file even if no message is defined.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines. Note that using this option makes it harder for technically skilled translators to understand each message's context.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or
‘never’.  If it is not given or ‘full’, it generates the
lines with both file name and line number.  If it is ‘file’, the
line number part is omitted.  If it is ‘never’, it completely
suppresses the lines (same as --no-location).
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
Write out a Java ResourceBundle in Java .properties syntax.  Note
that this file format doesn't support plural forms and silently drops
obsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings syntax.
Note that this file format doesn't support plural forms.
Write out comments recognized by itstool (http://itstool.org). Note that this is only effective with XML files.
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
Generate sorted output (deprecated). Note that using this option makes it much harder for the translator to understand each message's context.
Sort output by file location.
Don't write header with ‘msgid ""’ entry. Note: Using this option may lead to an error in subsequent operations if the output contains non-ASCII characters.
This is useful for testing purposes because it eliminates a source
of variance for generated .gmo files.  With --omit-header,
two invocations of xgettext on the same files with the same
options at different times are guaranteed to produce the same results.
Note that using this option will lead to an error if the resulting file would not entirely be in ASCII.
Set the copyright holder in the output. string should be the copyright holder of the surrounding package. (Note that the msgstr strings, extracted from the package's sources, belong to the copyright holder of the package.) Translators are expected to transfer or disclaim the copyright for their translations, so that package maintainers can distribute them without legal risk. If string is empty, the output files are marked as being in the public domain; in this case, the translators are expected to disclaim their copyright, again so that package maintainers can distribute them without legal risk.
The default value for string is the Free Software Foundation, Inc.,
simply because xgettext was first used in the GNU project.
Omit FSF copyright in output. This option is equivalent to ‘--copyright-holder=''’. It can be useful for packages outside the GNU project that want their translations to be in the public domain.
Set the package name in the header of the output.
Set the package version in the header of the output. This option has an effect only if the ‘--package-name’ option is also used.
Set the reporting address for msgid bugs. This is the email address or URL to which the translators shall report bugs in the untranslated strings:
It can be your email address, or a mailing list address where translators can write to without being subscribed, or the URL of a web page through which the translators can contact you.
The default value is empty, which means that translators will be clueless! Don't forget to specify this option.
Use string (or "" if not specified) as prefix for msgstr values.
Use string (or "" if not specified) as suffix for msgstr values.
Display this help and exit.
Output version information and exit.
Increase verbosity level.
A sample invocation of xgettext, in a project
that has a single source file ‘src/hello.c’
that uses ‘_’ as shorthand for the gettext function,
could be:
| xgettext -o hello.pot \
         --add-comments=TRANSLATORS: \
         --keyword=_ --flag=_:1:pass-c-format \
         --directory=.. \
         src/hello.c
 | 
When a package contains sources in different programming languages and
different, incompatible xgettext command line options are required
for these different parts of the package, the solution is to create
intermediate PO template files for each of the parts and then combine (merge)
them together.
For example, assume you have two source files ‘a.c’ and ‘b.py’, and want to extract their translatable strings in separate steps.
Each of the following command sequences does this. The output is the same.
| xgettext -o part-c.pot a.c xgettext -o part-py.pot b.py xgettext -o all.pot part-c.pot part-py.pot | 
xgettext invocations, with a
single POT file that accumulates the translatable strings.
| xgettext -o all.pot a.c xgettext -o all.pot --join-existing b.py | 
| xgettext --default-domain=all a.c xgettext --default-domain=all --join-existing b.py mv all.po all.pot | 
One might be tempted to think that ‘msgcat’ can do the same thing, through a command sequence such as:
| xgettext -o part-c.pot a.c xgettext -o part-py.pot b.py msgcat -o all.pot part-c.pot part-py.pot | 
But no, this does not work reliably, because sometimes part-c.pot
and part-py.pot will contain different POT-Creation-Date
values, and msgcat then produces an all.pot file that has
conflict markers in the header entry.
This is because msgcat generally is meant to produce PO files that
are to be reviewed and edited by a translator; this is not desired here.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
 
  This document was generated by Bruno Haible on June, 4 2025 using texi2html 1.78a.