This document contains a guide for the usage of the pdfsplit
script.
This guide uses the Adobe PDF 1.6 specification file (~9 MB).
It can be downloaded using:
$ wget https://wwwimages2.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference16.pdf
or from https://www.pdfa.org/resource/pdf-specification-index/.
The shell commands in this file assume pdfsplit
is in your PATH
.
There are 22 pages in the PDF file before the page numbered 1
.
This PDF conveniently numbers these pages using roman numerals, but
you may need to manually count or use your PDF reader's "current page"
function.
pdf-1.6-chapters.txt contains a manually parsed list from the PDF's table of contents for chapters. With some PDFs you may need to manually search for the ending page of each chapter since they may have miscellaneous pages in between chapters.
We'll extract the chapters from the PDF into a directory called "Chapters"
and name each chapter PDF as chXX.pdf
with the following commands:
$ mkdir Chapters
$ pdfsplit PDFReference16.pdf pdf-1.6-chapters.txt "Chapters/ch{_i:02d}.pdf" -o 22
$ ls -1 Chapters
ch01.pdf
ch02.pdf
ch03.pdf
ch04.pdf
ch05.pdf
ch06.pdf
ch07.pdf
ch08.pdf
ch09.pdf
ch10.pdf
pdf-1.6-appendices.txt contains a manually parsed list from the PDF's table of contents for appendices.
Appendices are often named using letters instead of numbers. Since the
out_fmt
string is interpreted as a Python f-string, we can execute
Python code to convert 1-indexed numbers to capital letters.
$ mkdir Appendices
$ pdfsplit \
PDFReference16.pdf \
pdf-1.6-appendices.txt \
"Appendices/Appendix {chr(ord('A') - 1 + _i)}.pdf" \
-o 22
$ ls -1 Appendices
'Appendix A.pdf'
'Appendix B.pdf'
'Appendix C.pdf'
'Appendix D.pdf'
'Appendix E.pdf'
'Appendix F.pdf'
'Appendix G.pdf'
'Appendix H.pdf'
'Appendix I.pdf'