openSUSE Forums > Looking For Something Other Than Support? » Need tool to convert PDF to text

Go Back   openSUSE Forums > Looking For Something Other Than Support?
Forums FAQ Members List Search Today's Posts Mark Forums Read


Looking For Something Other Than Support? If you are looking for manuals, books, repositories, hardware, software, etc. this is the place to see if someone can help you find it.

Reply
Page 1 of 2 1 2
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 25-Aug-2009, 05:39
vodoo's Avatar
Busy Penguin
 
Join Date: Jan 2009
Location: Switzerland
Posts: 251
vodoo hasn't been rated much yet
Default Need tool to convert PDF to text

I am using SuSE-11.1 (32bit). One of the tasks I need to do is to convert articles from the gazette of commerce from PDF to plain text. I tried to do this with pdftotext from the xpdf package. However, the PDF input is difficult to convert. pdftotext is loosing spaces between words and adds additional (wrong) spaces at other places.

Question: do you know of other free open source tools which I could try?
Reply With Quote
  #2 (permalink)  
Old 25-Aug-2009, 06:00
caf4926's Avatar
Global Moderator
 
Join Date: Jun 2008
Location: The English Lake District. UK - GMT/BST
Posts: 12,924
caf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputationcaf4926 has a brilliant future with this reputation
Send a message via MSN to caf4926
Default Re: Need tool to convert PDF to text

Not sure - Something I have thought about myself. But you can use Okular and select text in there and paste to a text file.
__________________
Box: openSUSE 11.2 | (KDE4.3.3) | M2N4-SLI | AMD 64 X2 5200+ | nVidia 8500GT | 4GB RAM
Lap: openSUSE 11.2 | Celeron 550 | (KDE4.3.3)"3" | Intel 965 GM | Lenovo R61e | 3GB RAM
Reply With Quote
  #3 (permalink)  
Old 25-Aug-2009, 06:40
vodoo's Avatar
Busy Penguin
 
Join Date: Jan 2009
Location: Switzerland
Posts: 251
vodoo hasn't been rated much yet
Default Re: Need tool to convert PDF to text

Thanks for the pointer. I forgot to say: I am one of those old fashioned command line guys. My app will run daily as a cron job.
Reply With Quote
  #4 (permalink)  
Old 25-Aug-2009, 07:19
goldie
Guest
 
Posts: n/a
Default Re: Need tool to convert PDF to text

vodoo wrote:
> Thanks for the pointer. I forgot to say: I am one of those old fashioned
> command line guys. My app will run daily as a cron job.



after the cron couldn't you run a sed/script to strip out unneeded
spaces and a spell checker to add spaces into run together words..

i have Adobe Reader 8 for Linux installed (9 is available)..it has a
button to "Save as Text"...i've looked at "acroread -man" in a
terminal but do not see a command line switch to do the same, but it
MUST be available from somewhere, somehow...if so you could pipe
through to happiness..

the stock reader has a -toPostScript switch...do you have something
that converts PS direct to text?

and, there is a save "as rich text format" plug-in..

i suspect a visit to the Adobe site and/or community would be worth
your time..

i bet this problem has been solved before (you might try a google)..

--
goldie
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.
Reply With Quote
  #5 (permalink)  
Old 25-Aug-2009, 16:27
Will Honea
Guest
 
Posts: n/a
Default Re: Need tool to convert PDF to text

goldie wrote:

> vodoo wrote:
>> Thanks for the pointer. I forgot to say: I am one of those old fashioned
>> command line guys. My app will run daily as a cron job.

>
>
> after the cron couldn't you run a sed/script to strip out unneeded
> spaces and a spell checker to add spaces into run together words..
>
> i have Adobe Reader 8 for Linux installed (9 is available)..it has a
> button to "Save as Text"...i've looked at "acroread -man" in a
> terminal but do not see a command line switch to do the same, but it
> MUST be available from somewhere, somehow...if so you could pipe
> through to happiness..
>
> the stock reader has a -toPostScript switch...do you have something
> that converts PS direct to text?
>
> and, there is a save "as rich text format" plug-in..
>
> i suspect a visit to the Adobe site and/or community would be worth
> your time..
>
> i bet this problem has been solved before (you might try a google)..


There is a slight difference between the Acrobat plugin for Firefox and the
standalone reader in that the plugin restricts you to saving as pdf while
the standalone reader has the "save as text" option. PITA on downloads!

--
Will Honea
Reply With Quote
  #6 (permalink)  
Old 28-Aug-2009, 04:00
Busy Penguin
 
Join Date: Aug 2008
Location: /Linux/Userland
Posts: 270
zmdmw52 hasn't been rated much yet
Arrow Re: Need tool to convert PDF to text

Quote:
Originally Posted by vodoo View Post
I am using SuSE-11.1 (32bit). One of the tasks I need to do is to convert articles from the gazette of commerce from PDF to plain text. I tried to do this with pdftotext from the xpdf package. However, the PDF input is difficult to convert. pdftotext is loosing spaces between words and adds additional (wrong) spaces at other places.

Question: do you know of other free open source tools which I could try?
Try 'pdfedit' : it has an option to save file as text.
Webpin
__________________

Reply With Quote
  #7 (permalink)  
Old 02-Sep-2009, 09:47
vodoo's Avatar
Busy Penguin
 
Join Date: Jan 2009
Location: Switzerland
Posts: 251
vodoo hasn't been rated much yet
Default Re: Need tool to convert PDF to text

Thanks to everyone who helped.

@zmdmw52: pdfedit is a very interesting app. It does a much better job than pdftotext. I still have to figure out how to run the conversion from the command line. This seems possible but I'm struggling with the syntax.
Reply With Quote
  #8 (permalink)  
Old 02-Sep-2009, 14:52
Busy Penguin
 
Join Date: Aug 2008
Location: /Linux/Userland
Posts: 270
zmdmw52 hasn't been rated much yet
Arrow Re: Need tool to convert PDF to text

Quote:
Originally Posted by vodoo View Post
Thanks to everyone who helped.

@zmdmw52: pdfedit is a very interesting app. It does a much better job than pdftotext. I still have to figure out how to run the conversion from the command line. This seems possible but I'm struggling with the syntax.
You might also want to try pdfsam - that has a lot of cmd-line options, IIRC.

Here are 2 picts of the PDF-related packages on the Debian-based Linux Mint 7 (on laptop); many of them should be available for openSUSE as well, Webpin or sofware search on openSUSE should give an indication.

[1]


[2]
__________________

Reply With Quote
  #9 (permalink)  
Old 03-Sep-2009, 01:38
Puzzled Penguin
 
Join Date: Aug 2008
Posts: 14
ooglie hasn't been rated much yet
Default Re: Need tool to convert PDF to text

Have a look at Convert PDF to Word (DOC) — 100% Free! a free on line service
Reply With Quote
  #10 (permalink)  
Old 03-Sep-2009, 10:12
Busy Penguin
 
Join Date: Aug 2008
Location: /Linux/Userland
Posts: 270
zmdmw52 hasn't been rated much yet
Default Re: Need tool to convert PDF to text

Quote:
Originally Posted by zmdmw52 View Post
You might also want to try pdfsam - that has a lot of cmd-line options, IIRC.

Here are 2 picts of the PDF-related packages on the Debian-based Linux Mint 7 (on laptop); many of them should be available for openSUSE as well, Webpin or sofware search on openSUSE should give an indication.

[1]


[2]
Sorry, second pict is incorrect, but can't edit that post. Will update the correct pict later.
__________________

Reply With Quote
Reply
Page 1 of 2 1 2

Bookmarks


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




 

Search Engine Friendly URLs by vBSEO 3.3.0 RC2