Guidance on producing screen-readable documents

Question

Is there any guidance on creating good screen-readable documents using
LaTeX?
I'm aiming to make ‘tablet-friendly’ versions of documents primarily
intended for paper, so I'm looking for fairly lightweight class- or
style-file adaptations, or alternative toolchains.  The minimal thing
to do here is to format the document for a small stock size (say 180 x
120mm), narrow the margins, and perhaps fiddle with the information in
the footer.  Doing that much would be... OK, but I feel sure (a) I
could do more to help a reader read and navigate within the document;
and (b) I would be reinventing a wheel here, probably badly, and
rediscovering problems that must be well-known to someone.
What I've found so far:

Stackexchange question about geometry:
this is primarily a question about how to use the geometry package,
which I don't really need advice about (memoir.cls=good).
LaTeX to epub:
these work better than I'd expect (of the toolchains mentioned there I tried tex4ebook and I think
pandoc; some of the answers are rather old), but it's clear this struggles with highly-mathematical text,
and I get the strong impression that it heads towards a nightmare of
epub-version, reader (product and version), and MathML compatibility.
It's well known that ConTeXt can produce impressively
interactive documents.
If these tablet-friendly documents were the main goal of this
project this is the direction I'd go, but a large-scale switch of
macro package (tempting though ConTeXt is) is probably more effort
than I'm willing to invest for the side-benefit I have in mind.
a question about on-screen viewing:
the question includes pointers to ConTeXt and pdfscreen, but similarly to the previous point, both would indicate more substantial rethinking of the documents than I'm hoping for.  David Carlisle's answer essentially says that one should use MathML at an early stage in such a project; this is surely true, but would require a major rethink of the documents I have.

Searching for ‘epub’, ‘tablet typesetting‘, and so on, leads me into a
winding maze of .docx converters, self-publishing gurus, and other
blind alleys.

Norman Gray · Answer

I have an answer to my own question.  The following notes – something
of a brain-dump, but I hope useful to someone – are correct as of
early 2021.  They're broadly positive; there are a few things that
don't work as well as one might hope, but things should improve, on
that front, in time.
In the end, I did three things here.

I adjusted the class file I developed for the relevant documents,
to create a B6-sized document, which works when viewed on-screen
as PDF.  This works pretty completely, and produced something I'm
content with, at the cost of less work than I anticipated.
However it is generally
not ‘accessible’, in the sense of being friendly to, eg, text-to-speech
readers, and other tools to support folk with print disabilities
(supporting whom was a part of my motivation here).
I've used LaTeXML to produce
EPUB documents.  Amongst other virtues, these tick ‘accessibility’
boxes.  This mostly works, and the problems are
relatively minor bugs, and aesthetic points which one could tweak
more or less indefinitely.
I've learned a little more about the general accessibility of
LaTeX documents (short version: much harder than you expect now,
but potentially due to improve).

Screen PDF
I adjusted the class file I'm using for these documents to produce an
‘ebook’ variant of the output.  The key changes were:

Single-sided B6 paper looks like the right pre-set size for this
(I leant heavily on the {memoir} class, and used its option b6paper,
but the {geometry} package would doubtless be able to hand this, too).
Font: I used 10/12pt STIX2.
Lots of extra white space where it was at all reasonable.
Specifically, I redefined section and subsection
(using @startsection) to add a newpage before each.

In {memoir} terms, the layout I used was:
% the numbers here seem to work OK on an iPad,
% but there's no deep principle behind them.
settypeblocksize{34baselineskip}{25pc}{*}
setlrmargins{*}{*}1
setulmargins{*}{*}{1.5}
setheadfoot{baselineskip}{2baselineskip}
setheaderspaces{*}{*}{*}
checkandfixthelayout

and I made sure the ToC was useful with setcounter{tocdepth}4.
I made the page footer helpful about navigation, showing the current
section number in the footer, with a link to the ToC, and feedback on
how far into the document this page is.  Expressed again in {memoir}
terms, this is:
makeoddfoot{<mypagestyle>}
  {hyperref[foo@toc]{rightmark}}
  {}
  {thepage/pageref{foo@lastpage}}

(that obviously depends on a label{foo@toc} on the ToC page, and a
label{foo@lastpage} on the last page of the document).
I expected I'd have to work quite hard to get something decent here.
But along with a small number of other class-specific layout tweaks,
that was all I felt I needed to do to produce a quite respectable
screen-readable (though as noted above not particularly accessible)
PDF document.
Via LaTeXML to EPUB
LaTeXML is the most successful of
the LaTeX-to-HTML converters I've used, and seemed admirably robust,
when used with the structurally simple but maths-heavy document I was
working with.  I managed to convert my LaTeX sources to EPUB in two
related ways.
Using LaTeXML's EPUB generation pipeline is 90% successful.  This
still has a few bugs, and as of right now produces EPUB that works,
but which doesn't pass the conformance checker I used (W3C, see
below).  But I was able to fix up the results without major
difficulty, those bugs are currently reported in the LaTeXML
developers’ queue, and so with time, and possibly with help from this
community, they should disappear before too long.
What I in fact ran with was using LaTeXML's intermediate XML format
(from which it generates HTML and related outputs), and develop my own
XSLT stylesheets to convert this to XHTML, and to generate the
associated EPUB metadata.  This is a flexible technique, but it's
obviously more work than something canned (I happen to have experience
of this general route, so this was both relatively easy and quite
enjoyable to do).  I mention this, not because I'm necessarily
recommending it, but in order to draw attention to that well-designed
intermediate format, awareness of which might be similarly useful to
others with special end-result needs.
Relevant resources here are as follows.  Spec:

The epub 3.2 spec
Overview
Spec
Containers

Validators and good practice:

A very thorough W3C validator
(Usage: java epubcheck.jar book.epub).  I managed to get zero
errors on this checker, eventually!
The Canadian National Network for Equitable Library
Service
best practice guide.
O'Reilly EPUB 3 Best Practices
from 2013
(individual chapters
of this are available online).
There is a large collection of notes on the EPUB standard,
and on EPUB accessibility in the
DAISY knowledgebase.
Apple Books Asset Guide
is essentially a manual for creating EPUB books compatible with
the macOS Books app, though much of the advice is not specific to
that app.

LaTeX and accessibility
PDF output from LaTeX scores terribly poorly on the ‘accessibility’
checkers I used; I got scores below 10/100 with some documents.  It's
not completely clear how that checker was scoring things, but in large
part, it appears that a large part of the poor score is attributable
to the outputs not being ‘tagged PDF’.  There's a 2017 discussion
about this on stackoverflow (A guide on how to produce accessible PDF
files?).
There does not appear to be a simple answer to this question.

One answer to the stackoverflow question points to
a google code project, which
of course has disappeared, but sounds like it might be related to
a set of templates for ‘SIGCHI’
(which starts off saying ‘This repository was...’),
which points in turn to
ACM article templates
which may or may be useful.  Unfortunately, the only mention of
accessibility on the templates page is an exhortation to produce
figure descriptions, which doesn't appear to be the principal
problem.

Another answer in the
stackoverflow discussion
points to the accsupp package, which adds alt-texts, and the
pdfcomment package, which does...  something else (along with
bringing in a fearsome set of package dependencies).  One of the
comments points to the
axessibility package, which
is concerned with formulae.  Finally, there's a page at
CTAN covering access to various
PDF features, none of which is obviously structural tags.

Although the accessibility
package exists, its github
page has a wide
variety of disclaimers on it, going as far as saying ‘I’d like to
discourage people from using the package any more’.  I wonder,
however, however, if there's an 80-20 solution here, in the sense
that if only the structural tags are available, then do we get a
much higher score?  The issues list for that package suggests that
its author is interested in finding development money to get
someone to work on this (issues last updated mid-2020).

There is a thing called PDF/UA, and a TUG talk about
it.

Most significantly, there is a package tagpdf by Ulrike Fischer
(CTAN and
github), which is billed as
being ‘to experiment with tagging with pdflatex and lualatex’
(last contributions appear to be late 2019).  It starts off by
saying: ‘This package is not meant for normal document
production.’ and notes that it requires a current expl3 version
of LaTeX3.  It suggests that the various accessibility packages in
LaTeX are fundamentally flawed, to the extent that they rely on
monkeypatching LaTeX rather than using a PDF API provided by the
LaTeX3 kernel.

The tagpdf
documentation
notes that:

I nevertheless think that the lua mode is the future and the only one
that will be usable for larger documents. pdf is a page orientated
format and so the ability of luatex to manipulate pages and nodes
after the TeX-processing is really useful here

That is, there is significant structure in PDF tags that makes sense
only after the TeX page-breaking algorithm has done its work, at a
stage where (La)TeX no longer has any purchase.  Thus the general
problem is very hard; but even so, there might be some minimal
structural-only tagging that it's possible to add (see pp.22 and 23 of
the document, which you'd never want to type), and which might make
a significant difference to an accessibility checker, if it's possible
to automate at all.
The LaTeX3 project is interested in Tagged PDF, and one of the current
goals there is to ‘provide functionality to automatically produce
structured PDF, without the need for user intervention or
post-processing’ (see 20:24 of a TUG 2020 talk by Frank Mittelbach).
The implication of the rest of that talk is that this tagging is
effectively infeasible with current LaTeX.  Interestingly, they appear
to have some modest funding from Adobe to do this.

Guidance on producing screen-readable documents

One Answer

Screen PDF

Via LaTeXML to EPUB

LaTeX and accessibility

Add your own answers!

Ask a Question