TeX - LaTeX Asked on November 29, 2021
Is there any guidance on creating good screen-readable documents using
LaTeX?
I’m aiming to make ‘tablet-friendly’ versions of documents primarily
intended for paper, so I’m looking for fairly lightweight class- or
style-file adaptations, or alternative toolchains. The minimal thing
to do here is to format the document for a small stock size (say 180 x
120mm), narrow the margins, and perhaps fiddle with the information in
the footer. Doing that much would be… OK, but I feel sure (a) I
could do more to help a reader read and navigate within the document;
and (b) I would be reinventing a wheel here, probably badly, and
rediscovering problems that must be well-known to someone.
What I’ve found so far:
tex4ebook
and I thinkpandoc
; some of the answers are rather old), but it’s clear this struggles with highly-mathematical text,Searching for ‘epub’, ‘tablet typesetting‘, and so on, leads me into a
winding maze of .docx
converters, self-publishing gurus, and other
blind alleys.
I have an answer to my own question. The following notes – something of a brain-dump, but I hope useful to someone – are correct as of early 2021. They're broadly positive; there are a few things that don't work as well as one might hope, but things should improve, on that front, in time.
In the end, I did three things here.
I adjusted the class file I'm using for these documents to produce an ‘ebook’ variant of the output. The key changes were:
{memoir}
class, and used its option b6paper
,
but the {geometry}
package would doubtless be able to hand this, too).section
and subsection
(using @startsection
) to add a newpage
before each.In {memoir}
terms, the layout I used was:
% the numbers here seem to work OK on an iPad,
% but there's no deep principle behind them.
settypeblocksize{34baselineskip}{25pc}{*}
setlrmargins{*}{*}1
setulmargins{*}{*}{1.5}
setheadfoot{baselineskip}{2baselineskip}
setheaderspaces{*}{*}{*}
checkandfixthelayout
and I made sure the ToC was useful with setcounter{tocdepth}4
.
I made the page footer helpful about navigation, showing the current
section number in the footer, with a link to the ToC, and feedback on
how far into the document this page is. Expressed again in {memoir}
terms, this is:
makeoddfoot{<mypagestyle>}
{hyperref[foo@toc]{rightmark}}
{}
{thepage/pageref{foo@lastpage}}
(that obviously depends on a label{foo@toc}
on the ToC page, and a
label{foo@lastpage}
on the last page of the document).
I expected I'd have to work quite hard to get something decent here. But along with a small number of other class-specific layout tweaks, that was all I felt I needed to do to produce a quite respectable screen-readable (though as noted above not particularly accessible) PDF document.
LaTeXML is the most successful of the LaTeX-to-HTML converters I've used, and seemed admirably robust, when used with the structurally simple but maths-heavy document I was working with. I managed to convert my LaTeX sources to EPUB in two related ways.
Using LaTeXML's EPUB generation pipeline is 90% successful. This still has a few bugs, and as of right now produces EPUB that works, but which doesn't pass the conformance checker I used (W3C, see below). But I was able to fix up the results without major difficulty, those bugs are currently reported in the LaTeXML developers’ queue, and so with time, and possibly with help from this community, they should disappear before too long.
What I in fact ran with was using LaTeXML's intermediate XML format (from which it generates HTML and related outputs), and develop my own XSLT stylesheets to convert this to XHTML, and to generate the associated EPUB metadata. This is a flexible technique, but it's obviously more work than something canned (I happen to have experience of this general route, so this was both relatively easy and quite enjoyable to do). I mention this, not because I'm necessarily recommending it, but in order to draw attention to that well-designed intermediate format, awareness of which might be similarly useful to others with special end-result needs.
Relevant resources here are as follows. Spec:
Validators and good practice:
java epubcheck.jar book.epub
). I managed to get zero
errors on this checker, eventually!PDF output from LaTeX scores terribly poorly on the ‘accessibility’ checkers I used; I got scores below 10/100 with some documents. It's not completely clear how that checker was scoring things, but in large part, it appears that a large part of the poor score is attributable to the outputs not being ‘tagged PDF’. There's a 2017 discussion about this on stackoverflow (A guide on how to produce accessible PDF files?).
There does not appear to be a simple answer to this question.
One answer to the stackoverflow question points to a google code project, which of course has disappeared, but sounds like it might be related to a set of templates for ‘SIGCHI’ (which starts off saying ‘This repository was...’), which points in turn to ACM article templates which may or may be useful. Unfortunately, the only mention of accessibility on the templates page is an exhortation to produce figure descriptions, which doesn't appear to be the principal problem.
Another answer in the
stackoverflow discussion
points to the accsupp
package, which adds alt-texts, and the
pdfcomment
package, which does... something else (along with
bringing in a fearsome set of package dependencies). One of the
comments points to the
axessibility package, which
is concerned with formulae. Finally, there's a page at
CTAN covering access to various
PDF features, none of which is obviously structural tags.
Although the accessibility
package exists, its github
page has a wide
variety of disclaimers on it, going as far as saying ‘I’d like to
discourage people from using the package any more’. I wonder,
however, however, if there's an 80-20 solution here, in the sense
that if only the structural tags are available, then do we get a
much higher score? The issues list for that package suggests that
its author is interested in finding development money to get
someone to work on this (issues last updated mid-2020).
There is a thing called PDF/UA, and a TUG talk about it.
Most significantly, there is a package tagpdf
by Ulrike Fischer
(CTAN and
github), which is billed as
being ‘to experiment with tagging with pdflatex and lualatex’
(last contributions appear to be late 2019). It starts off by
saying: ‘This package is not meant for normal document
production.’ and notes that it requires a current expl3
version
of LaTeX3. It suggests that the various accessibility packages in
LaTeX are fundamentally flawed, to the extent that they rely on
monkeypatching LaTeX rather than using a PDF API provided by the
LaTeX3 kernel.
The tagpdf
documentation
notes that:
I nevertheless think that the lua mode is the future and the only one that will be usable for larger documents. pdf is a page orientated format and so the ability of luatex to manipulate pages and nodes after the TeX-processing is really useful here
That is, there is significant structure in PDF tags that makes sense only after the TeX page-breaking algorithm has done its work, at a stage where (La)TeX no longer has any purchase. Thus the general problem is very hard; but even so, there might be some minimal structural-only tagging that it's possible to add (see pp.22 and 23 of the document, which you'd never want to type), and which might make a significant difference to an accessibility checker, if it's possible to automate at all.
The LaTeX3 project is interested in Tagged PDF, and one of the current goals there is to ‘provide functionality to automatically produce structured PDF, without the need for user intervention or post-processing’ (see 20:24 of a TUG 2020 talk by Frank Mittelbach). The implication of the rest of that talk is that this tagging is effectively infeasible with current LaTeX. Interestingly, they appear to have some modest funding from Adobe to do this.
Answered by Norman Gray on November 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP