X-Git-Url: http://git.rohieb.name/www-rohieb-name.git/blobdiff_plain/34ed3458bf9a725a9d2c4883bf2a99a69f16455c..HEAD:/blag/post/optimizing-xsane-s-scanned-pdfs.mdwn diff --git a/blag/post/optimizing-xsane-s-scanned-pdfs.mdwn b/blag/post/optimizing-xsane-s-scanned-pdfs.mdwn index e48c024..5169a60 100644 --- a/blag/post/optimizing-xsane-s-scanned-pdfs.mdwn +++ b/blag/post/optimizing-xsane-s-scanned-pdfs.mdwn @@ -23,7 +23,7 @@ First (non-optimal) solution -------------- At first, I tried to optimize the PDF using [GhostScript][gs]. I -[[use-ghostscript-to-convert-pdf-files|already wrote]] about how GhostScript’s +[[already wrote|use-ghostscript-to-convert-pdf-files]] about how GhostScript’s `-dPDFSETTINGS` option can be used to minimize PDFs by redering the pictures to a smaller resolution. In fact, there are [multiple rendering modes][gs-ps-pdf] (`screen` for 96 dpi, `ebook` for 150 dpi, `printer` for 300 dpi, @@ -334,9 +334,27 @@ in X and Y direction, which was the resolution at which the images were scanned: $ convert image*jpg -density 200x200 document.pdf +*Update:* You can also use the [`-page` parameter][page] to set the page size +directly. It takes a multitude of predefined paper formats (see link) and will +do the pixel density calculation for you, as well as adding any neccessary +offset if the image ratio is not quite exact: + + $ convert image*jpg -page A4 document.pdf + With that approach, I could reduce the size of my PDF from 250 MB with losslessly compressed images to 38 MB with DCT compression. +*Another update (2023):* Marcus notified me that it is possible to use +ImageMagick's `-compress jpeg` option, this way we can leave out the +intermediate step and convert PNM to PDF directly: + + $ convert image*.pnm -compress jpeg -quality 85 output.pdf + +You can also play around with the `-quality` parameter to set the JPEG +compression level (100% makes almost pristine, but huge images; 1% makes very +small, very blocky images), 85% should still be readable for most documents +in that resolution. + Too long, didn’t read ----------------- @@ -368,5 +386,6 @@ document. [scan-to-pdfa]: http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/ "Konrad Voelkel: Linux, OCR and PDF: Scan to PDF/A" [pdf-stream-objects]: http://blog.didierstevens.com/2008/05/19/pdf-stream-objects/ "Didier Stevens: PDF Stream Objects" [pdf-tools]: http://blog.didierstevens.com/programs/pdf-tools/ "Didier Stevens: PDF Tools" +[page]: http://www.imagemagick.org/script/command-line-options.php#page "ImageMagick: Command-line Options" [[!tag PDF note_to_self howto ImageMagic convert file_formats reference longpost]]