TransWikia.com

How can I generate a PDF file in such I can copy text like it's a plain text

TeX - LaTeX Asked by aldo vaz on January 31, 2021

What I want to say is, if I generate a PDF file, LaTeX often splits words beetween lines to fit correctly adding the "-". For example "exam-ple". So, my question is, what can I do to avoid the copied text to have the "-" also copied and get only the word "example".

One Answer

After doing some research, I have found a pretty neat solution that works for LuaTeX.

The basic idea is that fonts in LuaTeX comes with tounicode property, which determines how a LaTeX character is translated into a UTF16-BE sequence. An example of this mapping can be found here. Obviously, we need to change this mapping so that the hyphenation symbol is translated to nothing. Fortunately, LuaTeX provides the prehyphenchar property that allows you to set which character is used for automatic hyphenation. Therefore, the plan is as follows:

  1. Find a "burner" hyphenation character for our purpose, because we don't want to affect the behavior of the normal one. From this table, I select U+2010 (8208 in decimal). Therefore, I set prehyphenchar=8208.
  2. When the document ends, I update all the internal fonts in LuaTeX, effectively mapping character 8208 to nothing. (Of course, you can map it to something else, just for fun.) To do this, call create_new_font with the correct font pattern to update fonts' tounicode tables. I try to print the name of all fonts in the log file, in case you don't know which ones to update. Of course, you can discard this pattern matching step in create_new_font and simply just modify all fonts available.

After all these steps, in the compiled document, when you copy "contem-porary", the resulting text is "contemporary"; when you copy "a-b", the resulting text is still "a-b".

documentclass[a4paper]{article}
usepackage{fontspec}
usepackage{luacode}

setmainfont{DejaVu Serif}
% using U+2010
% http://jkorpela.fi/dashes.html
prehyphenchar=8208


begin{document}

contemporary contemporary contemporary contemporary contemporary contemporary contemporary

a-b

begin{luacode}
-- show all fonts in the log
for i,f in font.each() do
  texio.write_nl(f.name)
end

function create_new_font(pattern)
    local tounicodevalues = {
        [8208] = "",
    }
    for i,f in font.each() do
        if (string.match(f.name, pattern)) then
            for u, v in pairs(tounicodevalues) do
                f.characters[u].tounicode = v
            end
            font.define(i, f)
        end
    end
end
end{luacode}

directlua{
  create_new_font("DejaVuSerif")
}


end{document}

If you want to dig deeper into this problem or figure out how to implement this in other TeX compilers, these links might be helpful:

Answered by Alan Xiang on January 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP