How to show a preview of a PDF page in a HTML document?

I need to create a web page showing a preview of a single PDF page. The PDF page is scanned from a document and contains image data.

I know that I could use <iframe …> and let the browser launch a PDF viewer, but I would rather prefer to convert the PDF to PNG (or jpg?) and let the browser display the preview.

All this will be done with a cgi script written in bash on 11.3 (if that matters).

Question: how would you do it? What are the pro’s and cons of the different approaches?

vodoo wrote:

>
> All this will be done with a cgi script written in bash on 11.3 (if
> that matters).
You can use convert (imagemagick)


convert my.pdf my.png

It will produce a png for every page in the pdf.
For more fine grained control “man convert”.

>
> Question: how would you do it? What are the pro’s and cons of the
> different approaches?
>
I think this is a good approach with the cgi script. It makes showing it
independent from the user having a pdf viewer installed.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.3 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

vodoo wrote:
> I need to create a web page showing a preview of a single PDF page. The
> PDF page is scanned from a document and contains image data.

If what you’re showing is basically a scanned image, why not show the
scanned image? What advantage do you get from showing [a further
conversion of] a PDF page?

I think this is a good approach with the cgi script. It makes showing it independent from the user having a pdf viewer installed.

@martin_helm: thank you for the feedback. The png gives me better control to integrate the preview into the webpage the way I want it. Your opinion was very valuable for me. My question was in fact a question on design and not about the conversion process. More on this later.

@djh-novell: I am reluctant to show the scanned PDF just as is for several resons. I have no control what PDF viewer will be used on the client side. As this is a preview I want to control the size of the image as well. And then the png is about half the file size of the equivalent PDF, saving a lot of bandwidth.

As for the conversion process (this is more a report than a question, but feel free to comment):

convert (which I have used before) is using gs (ghostscript) to load the PDF image. gs seems to have some problems reading PDF scans from Canon copiers/scanners. The result is:

   **** Warning:  Generation number out of 0..65535 range, assuming 0.
   **** Warning:  File has an invalid xref entry:  2.  Rebuilding xref table.
Processing pages 1 through 1.
Page 1

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> Canon iR3045                     <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

Googling this shows that it is possibly a gs bug and not a problem of the scan. It could also be a bug in pdftk when it splits multipage scans into single pages (this is what I do; I have not investigated the problem). The scan is read without any problem by okular or evince. I can use pdftk to “repair” the scan. This makes the gs warning go away.

Anyway, gs is producing a file encoded b/w with 1 bit per pixel. This is unuseable for scanned images. Same result when calling gs directly:

gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pnggray -r300 -sOutputFile=my.png my.pdf

A completely different result is achieved using xpdf:

pdftoppm -gray -png -scale-to-x 595 -scale-to-y 842 my.pdf my

The conversion is slower, but there is no warning and the resulting image is of good quality. Bottom line: PDF2PNG conversion can be done and is reducing image file size compared to the original PDF scan.