I am using SuSE-11.1 (32bit). One of the tasks I need to do is to convert articles from the gazette of commerce from PDF to plain text. I tried to do this with pdftotext from the xpdf package. However, the PDF input is difficult to convert. pdftotext is loosing spaces between words and adds additional (wrong) spaces at other places.
Question: do you know of other free open source tools which I could try?
vodoo wrote:
> Thanks for the pointer. I forgot to say: I am one of those old fashioned
> command line guys. My app will run daily as a cron job.
after the cron couldnât you run a sed/script to strip out unneeded
spaces and a spell checker to add spaces into run together wordsâŚ
i have Adobe Reader 8 for Linux installed (9 is available)âŚit has a
button to âSave as TextââŚiâve looked at âacroread -manâ in a
terminal but do not see a command line switch to do the same, but it
MUST be available from somewhere, somehowâŚif so you could pipe
through to happinessâŚ
the stock reader has a -toPostScript switchâŚdo you have something
that converts PS direct to text?
and, there is a save âas rich text formatâ plug-inâŚ
i suspect a visit to the Adobe site and/or community would be worth
your timeâŚ
i bet this problem has been solved before (you might try a google)âŚ
â
goldie
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.
> vodoo wrote:
>> Thanks for the pointer. I forgot to say: I am one of those old fashioned
>> command line guys. My app will run daily as a cron job.
>
>
> after the cron couldnât you run a sed/script to strip out unneeded
> spaces and a spell checker to add spaces into run together wordsâŚ
>
> i have Adobe Reader 8 for Linux installed (9 is available)âŚit has a
> button to âSave as TextââŚiâve looked at âacroread -manâ in a
> terminal but do not see a command line switch to do the same, but it
> MUST be available from somewhere, somehowâŚif so you could pipe
> through to happinessâŚ
>
> the stock reader has a -toPostScript switchâŚdo you have something
> that converts PS direct to text?
>
> and, there is a save âas rich text formatâ plug-inâŚ
>
> i suspect a visit to the Adobe site and/or community would be worth
> your timeâŚ
>
> i bet this problem has been solved before (you might try a google)âŚ
There is a slight difference between the Acrobat plugin for Firefox and the
standalone reader in that the plugin restricts you to saving as pdf while
the standalone reader has the âsave as textâ option. PITA on downloads!
@zmdmw52: pdfedit is a very interesting app. It does a much better job than pdftotext. I still have to figure out how to run the conversion from the command line. This seems possible but Iâm struggling with the syntax.
You might also want to try pdfsam - that has a lot of cmd-line options, IIRC.
Here are 2 picts of the PDF-related packages on the Debian-based Linux Mint 7 (on laptop); many of them should be available for openSUSE as well, Webpin or sofware search on openSUSE should give an indication.
Just thought to post my solution and say thanks for your valuable input. The pointer given by zmdmw52 for pdfedit led me on the right track. The very friendly developers of pdfedit have created a standalone version of pdf_to_text linked to their (much better) libraries. It compiles well on SuSE-11.1 and will make it into the CVS repo of pdfedit soon (well, thatâs what I hope).
hi buddy, that is not a tough work. you just need to download aimage converter which are ubiquitous on the internet. follow the steps in the page and then the convertion might be finished.