Ebooks Asked by Ondrej Janacek on September 26, 2021
I have many technical PDF ebooks where there are no chapters (no clickable table of contents or other means for quick navigation through a document) and therefore it’s really painful to search for information without full-text search. How could I create them? I would like to just take a PDF book and generate exact structure of chapters and subchapters, like:
Well, you could always buy a copy of Adobe Acrobat, which is actually designed solely for the purpose of creating and editing PDF files.
Or you could import the file into Calibre, convert it to ePub format, edit the ePub to add the table of contents, then convert it back to PDF.
There are other free tools for working with PDF directly. If you do a Google search, you would find this page which lists several free tools for editing or modifying PDF in various ways.
Correct answer by Donald.McLean on September 26, 2021
The best solution I found:
https://wiki.gnome.org/Attic/PdfMod
Extract:
You can reorder, rotate, and remove pages, export images from a document, edit the title, subject, author, and keywords, and combine documents via drag and drop.
It also allow to modify and/or create the Table of content.
Answered by Dieudo on September 26, 2021
I wrote an open source command line toolset called pdf.tocgen just for doing this. It uses the embedded font attributes and position information of headings to generate a table of contents automatically.
For example, for the PDF version Paul Graham's On Lisp, available for download on his website but comes without a table of contents. You could use the pdfxmeta
tool to build a "recipe" file
[[heading]]
level = 1
font.name = "Times-Bold"
font.size = 19.92530059814453
[[heading]]
level = 2
font.name = "Times-Bold"
font.size = 11.9552001953125
save it as recipe.toml
, and use the pdftocgen
command to automatically generate an outline
$ pdftocgen onlisp.pdf < recipe.toml
"Preface" 5
"Bottom-up Design" 5
"Plan of the Book" 7
"Examples" 9
"Acknowledgements" 9
"Contents" 11
"The Extensible Language" 14
"1.1 Design by Evolution" 14
"1.2 Programming Bottom-Up" 16
"1.3 Extensible Software" 18
"1.4 Extending Lisp" 19
"1.5 Why Lisp (or When)" 21
"Functions" 22
"2.1 Functions as Data" 22
"2.2 Defining Functions" 23
"2.3 Functional Arguments" 26
"2.4 Functions as Properties" 28
"2.5 Scope" 29
"2.6 Closures" 30
"2.7 Local Functions" 34
"2.8 Tail-Recursion" 35
"2.9 Compilation" 37
"2.10 Functions from Lists" 40
[--snip--]
You could save the output to a file called toc
$ pdftocgen onlisp.pdf < recipe.toml > toc
and import it to the PDF file using pdftocio
:
$ pdftocio -o output.pdf onlisp.pdf < toc
Please read the homepage for the details on how to use this toolset. I hope you find it useful.
Answered by Krasjet on September 26, 2021
THIS PART IS EDITED
For 'software-generated' PDF-files, i.e. PDF's not created from scans, I recommend to use (and upvote the answer by Krasjet) pdf.tocgen. Using this package becomes even easier with the toc-mode package for (Spac)Emacs described next.
For all other PDF and DJVU documents there is a new package called toc-mode for Emacs, which in my opinion provides the easiest way to add Table of Contents to documents (for linux and possibly also for different OS's). It includes options to extract the TOC via OCR.
In case this package's functionality is not sufficient or using Emacs is no option then the remaining part of this answer remains valid.
END EDITED PART
(Not enough reputation points to comment) Like the answer by Patrick Bourdon, I would also recommend HandyOutliner (http://handyoutlinerfo.sourceforge.net/). However, I would suggest you try the python script called document-contents-extractor to extract the contents.
If these options do not work, then I would also like to recommend PDF-XChange Viewer as a very powerful bookmark/contents extractor (selected text can be easily added to the bookmarks). It works well under wine.
Although not related to the question, I just would like to add that at the moment PDF X-Change viewer appears to me to be the most powerful PDF editor/viewer on linux. (although Emacs's amazing PDF-tools and Zathura are my favorite PDF editor and viewer respectively).
Answered by dalanicolai on September 26, 2021
WPS office suite on Windows allows creating or editing pdf's TOC
Be aware that it's a bit invasive though (shortcuts, default, runs in background...)
Answered by Arthur on September 26, 2021
Here is my repository that I plan to automate the procedure. https://github.com/aminya/tocPDF
For now, it is the manual procedure (which is also inspired by other people answers).
Answered by Amin on September 26, 2021
I am used to a simple and free tool that adds clickable bookmarks to PDF or DjVU files: http://handyoutlinerfo.sourceforge.net/.
You first have to prepare (and import in the tool) bookmarks entered in a text file as an indented list of labels and pages. Then the tool creates them in the document as bookmarks you can open/reduce and click on using the left panel.
There are some good options such as shifting all page numbers first with a given constant. This is useful when the prepared file is actually a copy/paste extract from the table of contents existing (but only as text and without bookmarks) in the document: as cover, preface, introduction, ... are generally numbered separately.
Answered by Patrick Bourdon on September 26, 2021
Prepare the TOC in .txt file
Chapter 1. The Beginning/23
Para 1.1 Child of The Beginning/25,FitWidth,96
Para 1.1.1 Child of Child of The Beginning/26,FitHeight,43
Chapter 2. The Continue/30,TopLeft,120,42
Para 2.1 Child of The Beginning/32,FitPage
You can OCR the TOC and use regex to fix it.
Load that TOC
Expand all bookmarks (Ctrl + E), select all of them, then go to Tools > Apply Page Offset
Enter the first pages that outmatch the page number in the TOC
Your can read its manual or watch a quick video tutorial. It has command line mode and can work on Linux, Mac.
Answered by Ooker on September 26, 2021
k2pdfopt (free, open source) can also do this by supplying a text file. See the -toclist
option. Use like so:
k2pdfopt -mode copy -n -toclist my_chapter_list.txt srcfile.pdf -o outfile.pdf
...where my_chapter_list.txt is a simple ASCII file with page numbers beginning each line, e.g.
1 Cover
2 Table of Contents
5 Chapter 1
25 Chapter 2
...
Answered by willus on September 26, 2021
The full Adobe Acrobat Pro ver.8 is available as a free legal download from http://www.techspot.com/downloads/4683-adobe-acrobat-8-free.html for both Mac and Windows. Sure, it's not the latest version, but free is good, and it works just fine for adding or editing a table of contents.
Answered by Gunnar Heiberg on September 26, 2021
I have used jPdfBookmarks on both Windows and Linux to do exactly what you describe - create your own bookmarks. Find it here.
Answered by Jeffrey DeLeo on September 26, 2021
There are also free tools that allow editing/adding bookmarks. A cross platform example is jPdf Tweak.
It is a little clumsy to use, but you can create the table of contents in your favourite spreadsheet program, export as csv and then just import it.
Answered by Tim on September 26, 2021
I will extend on @Donald's answer but I would also like to note I do not recommend, personally for quality issues, ever using Calibre for ebook development.
As stated I would suggest getting a copy of Acrobat and you can (for this example I am using Acrobat X Standard but note the I have not seen any difference in the shortcut buttons BUT the GUI has changed since version 9):
.pdf
file in AcrobatYou can create bookmarks by dragging and dropping the sub-level bookmark onto the level 1 bookmark such as:
Answered by DᴀʀᴛʜVᴀᴅᴇʀ on September 26, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP