Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PDFMunge: Improve the display of technical PDFs on eBook readers (felixcrux.com)
41 points by felixc on Jan 30, 2010 | hide | past | favorite | 21 comments


Special bonus tip for HN readers! To get a set of good starting values for cropping the margins, use the existing pdfcrop utility with the --verbose flag.

It will display the existing BoundingBox property as it processes each page. Let it run for a few pages, kill it, and use those numbers as a starting point. They will probably not be as tight as you'd like, since they won't cut out page numbers or headers.


For someone that has a Kindle/KindleDX/Nook: How good is the Kindle/Kindle DX for technical books? I want to buy one (since I am immigrating and will lose all my textbooks).

Is it good for technical books - since the page turn speed is apparently very slow? Will the iPad be better for this?


I have been loving my DX since Xmas, and as a former Sony Reader refugee I cannot tell you how much I do not miss needing to run my technical documents and research papers through this sort of PDF munging in order to get something that is (barely) readable on the smaller e-ink screens. If you read a lot of technical books, papers, or other docs formatted for A4/8.5x11 then do not consider any of the smaller e-ink units.

The page turn speed is not fast, but since I was used to it from my previous e-ink device I don't find it too much of a bother. The thing that you lose from a physical book is the ability to scan quickly to a particular section that you then drill down to the page you want. With e-ink you guess the approximate area and then guess a couple of more times until you get to the right page. The biggest "fix" that could be provided in this case would be for hyperlinks within the doc to work so that you could bounce from the table of contents or index to a specific page. At least with the DX I can actually go to the page I want though, with reformatted docs on a smaller display (like the use case for the OP software) there was no match between the original page numbers and the actual page number on the reader, so it was a real PITA.

Short version: if you read technical docs or papers in PDF format do not consider anything smaller than a DX or iRex.


I own a Kindle 2 and I have also pulled up programming books on the Nook.

They both suck for this application. Because their screen size is smaller than the page of the PDFs, you are left with zooming in and out (a feature which is new to the Kindle). This however leaves something to be desired as far as the reading experience goes.

My coworker owns a Kindle DX and it does much, much better at rendering technical PDFs and or technical books.

I also think the iPad will be very good at reading technical books as well. Their diagrams and code snippets will render beautifully and zooming is an altogether different experience (much more enjoyable).

So, I would definitely go with either the DX or hold out for the iPad.


>Will the iPad be better for this?

I own a DX & I hate to say it, but yeah, probably. The page speed does hurt. You have to decide if it's more important than the e-ink screen.

I personally wouldn't trade my DX, but I think many people will want to watch movies & check email instead of only reading pdfs for extended periods.


Pretty much the only thing that's making me consider switching from my Sony PRS-505 (it's the one pictured in the blog post) to a Kindle DX is the ability to read PDFs without the annoying and buggy reflowing. Cutting pages doesn't do the trick for me though.. I probably *will& get the DX :(


I've had the DX for about six months. Love it. Rotate it into landscape and it does great with technical PDFs.


I've tested the DX only for a few days, but the scaling of PDFs seems to work nicely (including one scanned book, Bondy & Murty's Graph Theory, http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.h... )

On single column book-page stuff, so far I only came across one figure that looked bad in portrait and had to be seen in landscape. Otherwise fine, if your eyes can handle slightly shrunk type.

P.S. by scaling, I mean the fixed scaling the Kindle DX does to fit the PDF on the page. User-controlled font-size change is only available for the mobi/azw format.


Interesting. Some practice with this could make textbooks (especially smaller ones) useable on the regular Kindle / Nook / Sony reader.


That was exactly my use case for creating it in the first place :)


I can't help but chuckle -- this why geeks hate the iPad, and everyone else will love it.


Can't you reflow your pdf before uploading it to the iPad? I thought that would be possible...


Great Idea. Rather than re-flow, it cuts up the pages so they fit in 'landscape' mode.

Reflow works well, but not with diagrams and images.


It would be better to publish as epub. Pragprog does this.


For technical papers in two-column format, I've thought of just chopping each page into four. (Brute force, judiciously applied. ;-)

If you feel like testing that, I'd appreciate to hear about it.


There is a program called pdflrf that does this, and it is far from ideal, since most two-column papers have figures that span both columns. Another tool called PaperCrop does this right, but unfortunately only produces images as output; I have not yet figured out how to reliably join them into e-book format. The non-existence of a good tool to solve this problem is quite annoying...


Just to clarify a point, pdflrf also converts docs to images. It crops & chops, then rasterizes and performs a few image manipulation steps to make the image look better on the Sony screen.


Sigh. Well, no joy on the mac, despite the presumed promise of platform-neutral Python.

Python 2.5.1:

line 8: !DOCTYPE: No such file or directory

line 9: syntax error near unexpected token `newline'

line 9: ` "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>;

Python 3.0:

  File "pdfmunge.py", line 8
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    ^
SyntaxError: invalid syntax


Are you making a joke, or are you being serious? If you're being serious, then it looks like you somehow captured an HTML page instead of the Python program.


No joke, just overtired. Wow, I can't believe I did that.


As another commenter pointed out, it looks like what you've got there is a website, instead of the actual script.

A direct link to the file you need is here: http://cloud.github.com/downloads/felixc/pdfmunge/pdfmunge.p...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: