Why?
You get to access it offline. Its more readable. You can annotate, leave comments, etc from a PDF Client (Adobe Acrobat Reader anyone?). You can even track your progress inside the document. If your client is really friendly it can even reopen the document from where you left it.
Well for me its merely a matter of convenience. I always prefer a PDF manual instead of an HTML/CHM one for some reason (think: http://docs.python.org/download.html); Does it really matter if PDF is three times the size of the HTML archive? Storage isn’t really a pushing concern these days. Is it?
Idea:
Multiple HTML pages to a Single PDF.
There are a lot of websites which offer to convert an HTML document to PDF on the fly. Doesn’t serve the purpose and I not very big on registering everywhere!
For our little experiment lets pick a URL. I suggest http://www.catb.org/~esr/writings/homesteading/cathedral-bazaar/index.html.
(for further reading go to https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar by Eric S. Raymond).
#1 Download web pages recursively using wget
# create a directory under home; Think: less clutter $ mkdir ~/our_little_experiment; cd ~/our_little_experiment; # download the webpage and recursively download all those webpages which are linked from this page in the current directory. $ wget -v -r http://www.catb.org/~esr/writings/homesteading/cathedral-bazaar/index.html
FINISHED –2012-10-07 20:15:47–
Total wall clock time: 2m 51s
Downloaded: 16 files, 164K in 1.9s (88.7 KB/s)
A little later …
# lets look at the files generated by wget (an awesome tool btw!) $ cd ~/our_little_experiment/www.catb.org/~esr/writings/homesteading/cathedral-bazaar $ ls -1 ar01s02.html ar01s03.html ar01s04.html ar01s05.html ar01s06.html ar01s07.html ar01s08.html ar01s09.html ar01s10.html ar01s11.html ar01s12.html ar01s13.html ar01s14.html ar01s15.html ar01s16.html index.html # index.html is actually chapter 1 so renaming it to ar01s01.html $ mv index.html ar01s01.html
#2 install htmldoc
$ sudo apt-get install htmldoc
#3 create the PDF document
$ htmldoc --webpage -t pdf14 -v -f catb_cathedral_bazaar.pdf *.html
Output: catb_cathedral_bazaar.pdf
Voila!
I’d be happy to guide anyone doing it on w32.
Please don’t use “win” as an abbreviation for Microsoft Windows in GNU software or
documentation. In hacker terminology, calling something a “win” is a form of praise. If you
wish to praise Microsoft Windows when speaking on your own, by all means do so, but not
in GNU software. Usually we write the name “Windows” in full, but when brevity is very
important (as in file names and sometimes symbol names), we abbreviate it to “w”. For
instance, the files and functions in Emacs that deal with Windows start with ‘w32’.– GNU Standards
http://www.gnu.org/prep/standards/standards.html#Trademarks-1
Happy Hacking!
References:






