TransWikia.com

How to create a compact glossary / dictionary using words in bold as keys and appending the list at the end of the page, section or chapter?

TeX - LaTeX Asked on January 19, 2021

Here it is an example of compacted glossary with words’ meaning. First the text, then the glossary:

The combination of photovoltaic technology, solar thermal
technology, and reflective or refractive solar concentrators has been
a highly appealing option for developers and researchers since the
late 1970s and early 1980s. The result is what is known as a
concentrated photovoltaic thermal system which is a hybrid
combination of concentrated photovoltaic and photovoltaic
thermal systems.

photovoltaic: definition of this word here, solar concentrators: another definition here, photovoltaic thermal: and another here

This format would help authors when writing to language learners, so that the text has normal size and the definitions are compact and there for reference if needed.

Such feature can be described as follows:

  1. Words in bold mean that you find their meaning in the glossary.
  2. The glossary is appended at the end of chapter/section and it is compact, without many spaces and with smaller font-size. Another option would be to place it as footnote, but depending on the text format and how many words have definition, the footnote can take too much space. Books for language learners, for instance, are usually small, like pocketbooks, so that the learner can "conclude" a book and stay motivated to read the next. I imagine that a code for appending it at the end of a section is easier anyway.
  3. Extra (not so essential): if a word definition was already written before, it would not appear again in the following pages or chapters.

Would it be possible to do so using an inline command like wd{word}{definition} along the text? It would be used all the time, for instance, on average each 10 words, one would have a definition attached to it.

The combination of wd{photovoltaic}{def of this word} technology, solar thermal
technology, and reflective or refractive ws{solar concentrators}{def of this word} has been
a highly appealing option for developers and researchers since the
late 1970s and early 1980s. The result is what is known as a
concentrated wd{photovoltaic thermal}{def of this word} system which is a hybrid
combination of concentrated photovoltaic and photovoltaic
thermal systems.

2 Answers

To make it short ... you described exactly the function of packages likes glossary (click here for more information). But i know it's difficult to jump right into it, so see code below for your start, try to add more functionalities you want to include yourself (because learning by doing ^_-) and have fun.

documentclass{article}
usepackage[acronyms]{glossaries}

% code to define your entries
newacronym[description={definition of this word here}]{PV}{PV}{photovoltaic}
newacronym[description={definition of this word here}]{SC}{SC}{solar concentrators}
newacronym[description={and another here}]{CPVT}{CPVT}{photovoltaic thermal}

% code to format the entires in a bold fashion
renewcommand{glstextformat}[1]{textbf{#1}}
renewcommand*{glsentryfmt}{%
    glsgenentryfmt
    ifglsused{glslabel}{}{space(glsentrysymbol{glslabel})}%
}

% code to control the size of entries within the glossary
renewcommand{glossarypreamble}{small}
    
% code to see the acronym description as footnote instead of inline
% setacronymstyle{footnote-sc-desc}

makeglossaries
begin{document}

section{First}
Use of gls{PV} and gls{SC} 

section{Second}
Use of gls{CPVT} and again gls{SC} 


printacronyms
end{document}

enter image description here

Answered by Venez on January 19, 2021

Many aspects about this solution is highly experimental. But I think this should provide you with a framework to achieve what you want. Make sure you complete following steps to prepare for document compilation. This solution only works with LuaTeX.

  1. Prepare your dictionary and save it in JSON format. For demonstration purposes, I found a bunch of freely available dictionaries here and I chose to download FOLDOC because its text seems easier to parse. I convert it into JSON format with the Python script below.
import json

_latex_special_chars = {
    '&': r'&',
    '%': r'%',
    '$': r'$',
    '#': r'#',
    '_': r'_',
    '{': r'{',
    '}': r'}',
    '~': r'textasciitilde{}',
    '^': r'^{}',
    '': r'textbackslash{}',
    'n': 'newline ',
    '-': r'{-}',
    'xA0': '~',  # Non-breaking space
    '[': r'{[}',
    ']': r'{]}',
}

def escape_latex(s):
    return ''.join(_latex_special_chars.get(c, c) for c in str(s))

stop_words = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself',
              'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself',
              'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these',
              'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do',
              'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while',
              'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before',
              'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again',
              'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each',
              'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than',
              'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now']

the_dict = dict()
cur_item = ''
cur_item_def = []


def flush_cur_item():
    global cur_item
    if len(cur_item) > 0:
        if cur_item.lower() not in stop_words:
            defn_text = 'n'.join(cur_item_def)
            the_dict[cur_item] = str(escape_latex(defn_text))
        cur_item = ''
        cur_item_def.clear()


with open('foldoc.txt', 'r')  as infile:
    for line in infile:
        line = line.rstrip()
        if line.startswith('t'):
            cur_item_def.append(line.strip())
        elif len(line.strip()) == 0:
            cur_item_def.append(line.strip())
        else:
            flush_cur_item()
            cur_item = line

flush_cur_item()

with open('fodol.json', 'w') as outfile:
    json.dump(the_dict, outfile, indent=2)

  1. Download json.lua (https://github.com/rxi/json.lua) and put it in your working directory. This allows Lua to parse JSON files.
  2. Compile the following source file
documentclass{article}
usepackage{luacode}
usepackage{fontspec}
usepackage{expl3}

setmainfont{DejaVu Serif}

% get the font id of bfseries font
ExplSyntaxOn
group_begin:
bfseries
directlua{
    bf_font_id = font.current()
}
group_end:
ExplSyntaxOff

begin{luacode*}
-- load the dictionary
-- this step requires JSON.lua library
-- https://github.com/rxi/json.lua
local json = require"json"
local infile = io.open("fodol.json", "r")
dictionary = json.decode(infile:read("a+"))
infile:close()

inspect = require"inspect"

do_glossary = true
glossary_word = {}
glossary_defn = {}

local glyph_id = node.id("glyph")
local glue_id = node.id("glue")
processed_words = {}

function is_letter(glyph)
    local chr = glyph.char
    return (chr >= 65 and chr <= 90) or (chr >= 97 and chr <= 122)
end

function glyph_table_to_str(tbl)
    local res = ""
    for _, item in ipairs(tbl) do
        res = res .. utf8.char(item.char)
    end
    return res
end

function process_glyphs(glyphs)
    if #glyphs == 0 then
        return
    end
    local word = glyph_table_to_str(glyphs)
    if processed_words[word] ~= nil then
        return
    end
    processed_words[word] = 1
    
    -- try original case and lowercase
    local defn = dictionary[word] or dictionary[string.lower(word)]
    if defn ~= nil then
        table.insert(glossary_word, word)
        table.insert(glossary_defn, defn)
        for _ ,item in ipairs(glyphs) do
            item.font = bf_font_id
        end
    end
    texio.write_nl(tostring(glossary_word))
end

function show_glossary()
    texio.write_nl(inspect(glossary_defn))
    if #glossary_word > 0 then
        tex.print([[begin{itemize}]])
        for ind, item in ipairs(glossary_word) do
            tex.print(string.format([[item textbf{%s}: %s]], item, glossary_defn[ind]))
        end
        tex.print([[end{itemize}]])
        glossary_word = {}
        glossary_defn = {}
    end
end

function pre_callback(n)
    if not do_glossary then
        return n
    end
    local prev_glyph = {}
    local word = ""
    for n1 in node.traverse(n) do
        if n1.id == glyph_id and is_letter(n1) then
            table.insert(prev_glyph, n1)
        elseif n1.id == glue_id then
            process_glyphs(prev_glyph)
            prev_glyph = {}
        end
    end
    process_glyphs(prev_glyph)
    return n
end

luatexbase.add_to_callback("pre_linebreak_filter", pre_callback, "pre_callback")
end{luacode*}

newcommand{EnableGlossary}{
    directlua{do_glossary=true}
}
newcommand{DisableGlossary}{
    directlua{do_glossary=false}
}
newcommand{PrintGlossary}{
    subsection{Glossary}
    begingroup
    small
    DisableGlossary
    directlua{show_glossary()}
    EnableGlossary
    endgroup
}

begin{document}
% https://arstechnica.com/science/2020/12/google-develops-an-ai-that-can-learn-both-chess-and-pac-man/
section{First section}
The first major conquest of artificial intelligence was chess. The game has a dizzying number of possible combinations, but it was relatively tractable because it was structured by a set of clear rules. An algorithm could always have perfect knowledge of the state of the game and know every possible move that both it and its opponent could make. The state of the game could be evaluated just by looking at the board.

% always call this command on a new paragraph
PrintGlossary

section{Second section}
But many other games aren't that simple. If you take something like Pac-Man, then figuring out the ideal move would involve considering the shape of the maze, the location of the ghosts, the location of any additional areas to clear, the availability of power-ups, etc., and the best plan can end up in disaster if Blinky or Clyde makes an unexpected move. We've developed AIs that can tackle these games, too, but they have had to take a very different approach to the ones that conquered chess and Go.


% always call this command on a new paragraph
PrintGlossary
end{document}

The result is shown as follows.

enter image description here

Existing problems:

  • It is difficult to control where to apply glossary and where not to. At this point, I only provide EnableGlossary and DisableGlossary commands, but it seems quite tedious to toggle them constantly to avoid glossaries in headings, captions, etc.
  • Now, glossary has to be printed maunally with PrintGlossary command. It seems like detecting the end of a section is difficult in LaTeX. See Macro that knows it is at the end of a section? for more ideas.
  • I forgot to implement the inline glossary function. But with this infrastructure, it should be pretty straightforward:
    newcommand{inlg}[2]{%
        directlua{
            table.insert(glossary_word, "luaescapestring{#1}")
            table.insert(glossary_defn, "luaescapestring{#2}")
        }%
        textbf{#1}%
    }
    
    However, due to the mechanism of LuaTeX, these inline glossary items will appear out of order.

Answered by Alan Xiang on January 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP