TransWikia.com

Having problems with listings and UTF-8. Can it be fixed?

TeX - LaTeX Asked by KramerTheCat on March 30, 2021

I’m having some problems with listings and UTF-8 in my document. Maybe someone can help me? Some characters work, like é and ó, but á and others appear at the beginning of words…

documentclass[12pt,a4paper]{scrbook}
KOMAoptions{twoside=false,open=any,chapterprefix=on,parskip=full,fontsize=14pt}

usepackage[portuguese]{babel}
usepackage[utf8]{inputenc}
usepackage[T1]{fontenc}
usepackage{listingsutf8}

usepackage{inconsolata}
lstset{
    language=bash, %% Troque para PHP, C, Java, etc... bash é o padrão
    basicstyle=ttfamilysmall,
    numberstyle=footnotesize,
    numbers=left,
    backgroundcolor=color{gray!10},
    frame=single,
    tabsize=2,
    rulecolor=color{black!30},
    title=lstname,
    escapeinside={%*}{*)},
    breaklines=true,
    breakatwhitespace=true,
    framextopmargin=2pt,
    framexbottommargin=2pt,
    extendedchars=false,
    inputencoding=utf8
}

begin{document}
begin{lstlisting}
<?php

echo 'Olá mundo!';
print 'Olá mundo!';
end{lstlisting}

end{document}
end{lstlisting}

6 Answers

One way to get around this limitation of listings is to use the option extendedchars=true and then to use the literate option for each accents you're going to be using (it's a bit tedious to do, but once you've done all the accents of your language, you never have to worry about them again). The syntax is

literate={á}{{'a}}1 {ã}{{~a}}1 {é}{{'e}}1 

For each accent you must put the real character inside braces (e.g. {á}) then you put what you want this character to be inside double braces (e.g. {{'a}}) and finally you put the number one (1); between two entries, you can put a space for clarity.

Here's your example modified to use this:

documentclass[12pt,a4paper]{scrbook}
KOMAoptions{twoside=false,open=any,chapterprefix=on,parskip=full,fontsize=14pt}

usepackage[portuguese]{babel}
usepackage[utf8]{inputenc}
usepackage[T1]{fontenc}
usepackage{listings}
usepackage{xcolor}

usepackage{inconsolata}
lstset{
    language=bash, %% Troque para PHP, C, Java, etc... bash é o padrão
    basicstyle=ttfamilysmall,
    numberstyle=footnotesize,
    numbers=left,
    backgroundcolor=color{gray!10},
    frame=single,
    tabsize=2,
    rulecolor=color{black!30},
    title=lstname,
    escapeinside={%*}{*)},
    breaklines=true,
    breakatwhitespace=true,
    framextopmargin=2pt,
    framexbottommargin=2pt,
    inputencoding=utf8,
    extendedchars=true,
    literate={á}{{'a}}1 {ã}{{~a}}1 {é}{{'e}}1,
}

begin{document}

begin{lstlisting}
<?php

echo 'Olá mundo!';
print 'áãé';
end{lstlisting}

end{document}

Correct answer by Philippe Goutet on March 30, 2021

The way the inputenc package works with non-ASCII UTF-8-encoded characters (by making the first byte active and then reading the following ones as arguments) is fundamentally incompatible with the way the listing package works, which reads each byte individually and expects it to be an individual character.

The listingsutf8 package tries to work around this for the case that your characters are convertible to some 8-bit encoding (and you are using PdfLaTeX) - but this will work only with lstinputlisting (as Marc's answer pointed out), not with inline listings. For inline listings the literate option (as pointed out by Phillipe) sounds good. An alternative would be escaping to LaTeX (as pointed out by Gonzalo) - but this makes simple cut-and-paste not work.

The last time I had to typeset a code which included non-ASCII Unicode characters (stuff like ℤ as Java identifiers, which are not in any 8-bit encoding, AFAIK), I switched to XeLaTeX, which supports UTF-8 input out of the box, without needing the inputenc package. With this, it worked nicely. I suppose LuaLaTeX would work the same way (but it was not that mature then).

(But I later wanted the comments to be formatted, too, thus I started/revived my ltxdoclet project to include source code and formatted comments.)

Answered by Paŭlo Ebermann on March 30, 2021

With the listingsutf8 package and a traditional (not UTF-8) TeX engine, you have to use the lstinputlisting command only, which properly displays a UTF-8 encoded file. You can't use the lstlisting environment, unless the code inside is plain ASCII.

Answered by Marc Baudoin on March 30, 2021

Escape those characters to LaTeX, as the documentation (listings manual, page 14) suggests:

Similarly, if you are using UTF-8 extended characters in a listing, they must be placed within an escape to LaTeX.

documentclass[12pt,a4paper]{scrbook}
KOMAoptions{twoside=false,open=any,chapterprefix=on,parskip=full,fontsize=14pt}

usepackage[portuguese]{babel}
usepackage[utf8]{inputenc}
usepackage[T1]{fontenc}
usepackage{listingsutf8}
usepackage{xcolor}

usepackage{inconsolata}
lstset{
    language=bash, %% Troque para PHP, C, Java, etc... bash é o padrão
    basicstyle=ttfamilysmall,
    numberstyle=footnotesize,
    numbers=left,
    backgroundcolor=color{gray!10},
    frame=single,
    tabsize=2,
    rulecolor=color{black!30},
    title=lstname,
    escapeinside={%*}{*)},
    breaklines=true,
    breakatwhitespace=true,
    framextopmargin=2pt,
    framexbottommargin=2pt,
    extendedchars=false,
    inputencoding=utf8
}

begin{document}
begin{lstlisting}
<?php

echo '%*Olá mundo*)!';
print '%*Olá mundo*)!';
end{lstlisting}

end{document}

enter image description here

Answered by Gonzalo Medina on March 30, 2021

This is a modified version for adding support to Swedish and German characters (åäö üß) as well as Portuguese characters.

Put the following line in the header:

usepackage{inconsolata} % Swedish encoding in lstlisting

and then where you want the code listing put the code below.

lstset{
  language=bash, % Switch code language ... bash is the default
  basicstyle=ttfamilyfootnotesize,
  numberstyle=tiny,
  numbers=left,
  backgroundcolor=color{gray!10},
  frame=single,
  tabsize=2,
  rulecolor=color{black!30},
  title=lstname,
  escapeinside={%*}{*)},
  breaklines=true,
  breakatwhitespace=true,
  framextopmargin=2pt,
  framexbottommargin=2pt,
  inputencoding=utf8,
  extendedchars=true,
  % Support for Swedish, German and Portuguese umlauts
  literate=%
  {Ö}{{"O}}1
  {Ä}{{"A}}1
  {Å}{{AA{}}}1
  {Ü}{{"U}}1
  {ß}{{ss}}1
  {ü}{{"u}}1
  {ö}{{"o}}1
  {ä}{{"a}}1
  {å}{{aa{}}}1
  {á}{{'a}}1
  {ã}{{~a}}1
  {é}{{'e}}1,
}
lstinputlisting[language=bash]{your_code_file.txt}

Answered by Tobias Holm on March 30, 2021

Just to help people, here is a quite complete literate statement for using with lstlistings:

lstset{
    inputencoding = utf8,  % Input encoding
    extendedchars = true,  % Extended ASCII
    literate      =        % Support additional characters
      {á}{{'a}}1  {é}{{'e}}1  {í}{{'i}}1 {ó}{{'o}}1  {ú}{{'u}}1
      {Á}{{'A}}1  {É}{{'E}}1  {Í}{{'I}}1 {Ó}{{'O}}1  {Ú}{{'U}}1
      {à}{{`a}}1  {è}{{`e}}1  {ì}{{`i}}1 {ò}{{`o}}1  {ù}{{`u}}1
      {À}{{`A}}1  {È}{{'E}}1  {Ì}{{`I}}1 {Ò}{{`O}}1  {Ù}{{`U}}1
      {ä}{{"a}}1  {ë}{{"e}}1  {ï}{{"i}}1 {ö}{{"o}}1  {ü}{{"u}}1
      {Ä}{{"A}}1  {Ë}{{"E}}1  {Ï}{{"I}}1 {Ö}{{"O}}1  {Ü}{{"U}}1
      {â}{{^a}}1  {ê}{{^e}}1  {î}{{^i}}1 {ô}{{^o}}1  {û}{{^u}}1
      {Â}{{^A}}1  {Ê}{{^E}}1  {Î}{{^I}}1 {Ô}{{^O}}1  {Û}{{^U}}1
      {œ}{{oe}}1  {Œ}{{OE}}1  {æ}{{ae}}1 {Æ}{{AE}}1  {ß}{{ss}}1
      {ç}{{c c}}1 {Ç}{{c C}}1 {ø}{{o}}1  {å}{{r a}}1 {Å}{{r A}}1
      {ã}{{~a}}1  {õ}{{~o}}1  {Ã}{{~A}}1 {Õ}{{~O}}1
      {ñ}{{~n}}1  {Ñ}{{~N}}1  {¿}{{?`}}1  {¡}{{!`}}1
      {°}{{textdegree}}1 {º}{{textordmasculine}}1 {ª}{{textordfeminine}}1
      % ¿ and ¡ are not correctly displayed if inconsolata font is used
      % together with the lstlisting environment. Consider typing code in
      % external files and using lstinputlisting to display them instead.      
  }

Please feel free to edit this list with more/missing characters!

Answered by Rmano on March 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP