TransWikia.com

Guidance on producing screen-readable documents

TeX - LaTeX Asked by Norman Gray on August 26, 2021

Is there any guidance on creating good screen-readable documents using
LaTeX?

I’m aiming to make ‘tablet-friendly’ versions of documents primarily
intended for paper, so I’m looking for fairly lightweight class- or
style-file adaptations, or alternative toolchains. The minimal thing
to do here is to format the document for a small stock size (say 180 x
120mm), narrow the margins, and perhaps fiddle with the information in
the footer. Doing that much would be… OK, but I feel sure (a) I
could do more to help a reader read and navigate within the document;
and (b) I would be reinventing a wheel here, probably badly, and
rediscovering problems that must be well-known to someone.

What I’ve found so far:

  • Stackexchange question about geometry:
    this is primarily a question about how to use the geometry package,
    which I don’t really need advice about (memoir.cls=good).
  • LaTeX to epub:
    these work better than I’d expect (of the toolchains mentioned there I tried tex4ebook and I think
    pandoc; some of the answers are rather old), but it’s clear this struggles with highly-mathematical text,
    and I get the strong impression that it heads towards a nightmare of
    epub-version, reader (product and version), and MathML compatibility.
  • It’s well known that ConTeXt can produce impressively
    interactive documents
    .
    If these tablet-friendly documents were the main goal of this
    project this is the direction I’d go, but a large-scale switch of
    macro package (tempting though ConTeXt is) is probably more effort
    than I’m willing to invest for the side-benefit I have in mind.
  • a question about on-screen viewing:
    the question includes pointers to ConTeXt and pdfscreen, but similarly to the previous point, both would indicate more substantial rethinking of the documents than I’m hoping for. David Carlisle’s answer essentially says that one should use MathML at an early stage in such a project; this is surely true, but would require a major rethink of the documents I have.

Searching for ‘epub’, ‘tablet typesetting‘, and so on, leads me into a
winding maze of .docx converters, self-publishing gurus, and other
blind alleys.

One Answer

I have an answer to my own question. The following notes – something of a brain-dump, but I hope useful to someone – are correct as of early 2021. They're broadly positive; there are a few things that don't work as well as one might hope, but things should improve, on that front, in time.

In the end, I did three things here.

  • I adjusted the class file I developed for the relevant documents, to create a B6-sized document, which works when viewed on-screen as PDF. This works pretty completely, and produced something I'm content with, at the cost of less work than I anticipated. However it is generally not ‘accessible’, in the sense of being friendly to, eg, text-to-speech readers, and other tools to support folk with print disabilities (supporting whom was a part of my motivation here).
  • I've used LaTeXML to produce EPUB documents. Amongst other virtues, these tick ‘accessibility’ boxes. This mostly works, and the problems are relatively minor bugs, and aesthetic points which one could tweak more or less indefinitely.
  • I've learned a little more about the general accessibility of LaTeX documents (short version: much harder than you expect now, but potentially due to improve).

Screen PDF

I adjusted the class file I'm using for these documents to produce an ‘ebook’ variant of the output. The key changes were:

  • Single-sided B6 paper looks like the right pre-set size for this (I leant heavily on the {memoir} class, and used its option b6paper, but the {geometry} package would doubtless be able to hand this, too).
  • Font: I used 10/12pt STIX2.
  • Lots of extra white space where it was at all reasonable. Specifically, I redefined section and subsection (using @startsection) to add a newpage before each.

In {memoir} terms, the layout I used was:

% the numbers here seem to work OK on an iPad,
% but there's no deep principle behind them.
settypeblocksize{34baselineskip}{25pc}{*}
setlrmargins{*}{*}1
setulmargins{*}{*}{1.5}
setheadfoot{baselineskip}{2baselineskip}
setheaderspaces{*}{*}{*}
checkandfixthelayout

and I made sure the ToC was useful with setcounter{tocdepth}4.

I made the page footer helpful about navigation, showing the current section number in the footer, with a link to the ToC, and feedback on how far into the document this page is. Expressed again in {memoir} terms, this is:

makeoddfoot{<mypagestyle>}
  {hyperref[foo@toc]{rightmark}}
  {}
  {thepage/pageref{foo@lastpage}}

(that obviously depends on a label{foo@toc} on the ToC page, and a label{foo@lastpage} on the last page of the document).

I expected I'd have to work quite hard to get something decent here. But along with a small number of other class-specific layout tweaks, that was all I felt I needed to do to produce a quite respectable screen-readable (though as noted above not particularly accessible) PDF document.

Via LaTeXML to EPUB

LaTeXML is the most successful of the LaTeX-to-HTML converters I've used, and seemed admirably robust, when used with the structurally simple but maths-heavy document I was working with. I managed to convert my LaTeX sources to EPUB in two related ways.

Using LaTeXML's EPUB generation pipeline is 90% successful. This still has a few bugs, and as of right now produces EPUB that works, but which doesn't pass the conformance checker I used (W3C, see below). But I was able to fix up the results without major difficulty, those bugs are currently reported in the LaTeXML developers’ queue, and so with time, and possibly with help from this community, they should disappear before too long.

What I in fact ran with was using LaTeXML's intermediate XML format (from which it generates HTML and related outputs), and develop my own XSLT stylesheets to convert this to XHTML, and to generate the associated EPUB metadata. This is a flexible technique, but it's obviously more work than something canned (I happen to have experience of this general route, so this was both relatively easy and quite enjoyable to do). I mention this, not because I'm necessarily recommending it, but in order to draw attention to that well-designed intermediate format, awareness of which might be similarly useful to others with special end-result needs.

Relevant resources here are as follows. Spec:

Validators and good practice:

LaTeX and accessibility

PDF output from LaTeX scores terribly poorly on the ‘accessibility’ checkers I used; I got scores below 10/100 with some documents. It's not completely clear how that checker was scoring things, but in large part, it appears that a large part of the poor score is attributable to the outputs not being ‘tagged PDF’. There's a 2017 discussion about this on stackoverflow (A guide on how to produce accessible PDF files?).

There does not appear to be a simple answer to this question.

  • One answer to the stackoverflow question points to a google code project, which of course has disappeared, but sounds like it might be related to a set of templates for ‘SIGCHI’ (which starts off saying ‘This repository was...’), which points in turn to ACM article templates which may or may be useful. Unfortunately, the only mention of accessibility on the templates page is an exhortation to produce figure descriptions, which doesn't appear to be the principal problem.

  • Another answer in the stackoverflow discussion points to the accsupp package, which adds alt-texts, and the pdfcomment package, which does... something else (along with bringing in a fearsome set of package dependencies). One of the comments points to the axessibility package, which is concerned with formulae. Finally, there's a page at CTAN covering access to various PDF features, none of which is obviously structural tags.

  • Although the accessibility package exists, its github page has a wide variety of disclaimers on it, going as far as saying ‘I’d like to discourage people from using the package any more’. I wonder, however, however, if there's an 80-20 solution here, in the sense that if only the structural tags are available, then do we get a much higher score? The issues list for that package suggests that its author is interested in finding development money to get someone to work on this (issues last updated mid-2020).

  • There is a thing called PDF/UA, and a TUG talk about it.

  • Most significantly, there is a package tagpdf by Ulrike Fischer (CTAN and github), which is billed as being ‘to experiment with tagging with pdflatex and lualatex’ (last contributions appear to be late 2019). It starts off by saying: ‘This package is not meant for normal document production.’ and notes that it requires a current expl3 version of LaTeX3. It suggests that the various accessibility packages in LaTeX are fundamentally flawed, to the extent that they rely on monkeypatching LaTeX rather than using a PDF API provided by the LaTeX3 kernel.

The tagpdf documentation notes that:

I nevertheless think that the lua mode is the future and the only one that will be usable for larger documents. pdf is a page orientated format and so the ability of luatex to manipulate pages and nodes after the TeX-processing is really useful here

That is, there is significant structure in PDF tags that makes sense only after the TeX page-breaking algorithm has done its work, at a stage where (La)TeX no longer has any purchase. Thus the general problem is very hard; but even so, there might be some minimal structural-only tagging that it's possible to add (see pp.22 and 23 of the document, which you'd never want to type), and which might make a significant difference to an accessibility checker, if it's possible to automate at all.

The LaTeX3 project is interested in Tagged PDF, and one of the current goals there is to ‘provide functionality to automatically produce structured PDF, without the need for user intervention or post-processing’ (see 20:24 of a TUG 2020 talk by Frank Mittelbach). The implication of the rest of that talk is that this tagging is effectively infeasible with current LaTeX. Interestingly, they appear to have some modest funding from Adobe to do this.

Correct answer by Norman Gray on August 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP