Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Need tool to convert PDF to text

  1. #1
    Join Date
    Jan 2009
    Location
    Switzerland
    Posts
    1,529

    Default Need tool to convert PDF to text

    I am using SuSE-11.1 (32bit). One of the tasks I need to do is to convert articles from the gazette of commerce from PDF to plain text. I tried to do this with pdftotext from the xpdf package. However, the PDF input is difficult to convert. pdftotext is loosing spaces between words and adds additional (wrong) spaces at other places.

    Question: do you know of other free open source tools which I could try?

  2. #2
    Join Date
    Jun 2008
    Location
    The English Lake District. UK - GMT/BST
    Posts
    36,857
    Blog Entries
    20

    Default Re: Need tool to convert PDF to text

    Not sure - Something I have thought about myself. But you can use Okular and select text in there and paste to a text file.
    Tumbleweed_KDE
    My Articles Was I any help? If yes: Click the star below

  3. #3
    Join Date
    Jan 2009
    Location
    Switzerland
    Posts
    1,529

    Default Re: Need tool to convert PDF to text

    Thanks for the pointer. I forgot to say: I am one of those old fashioned command line guys. My app will run daily as a cron job.

  4. #4
    goldie NNTP User

    Default Re: Need tool to convert PDF to text

    vodoo wrote:
    > Thanks for the pointer. I forgot to say: I am one of those old fashioned
    > command line guys. My app will run daily as a cron job.



    after the cron couldn't you run a sed/script to strip out unneeded
    spaces and a spell checker to add spaces into run together words..

    i have Adobe Reader 8 for Linux installed (9 is available)..it has a
    button to "Save as Text"...i've looked at "acroread -man" in a
    terminal but do not see a command line switch to do the same, but it
    MUST be available from somewhere, somehow...if so you could pipe
    through to happiness..

    the stock reader has a -toPostScript switch...do you have something
    that converts PS direct to text?

    and, there is a save "as rich text format" plug-in..

    i suspect a visit to the Adobe site and/or community would be worth
    your time..

    i bet this problem has been solved before (you might try a google)..

    --
    goldie
    Give a hacker a fish and you feed him for a day.
    Teach man and you feed him for a lifetime.

  5. #5
    Will Honea NNTP User

    Default Re: Need tool to convert PDF to text

    goldie wrote:

    > vodoo wrote:
    >> Thanks for the pointer. I forgot to say: I am one of those old fashioned
    >> command line guys. My app will run daily as a cron job.

    >
    >
    > after the cron couldn't you run a sed/script to strip out unneeded
    > spaces and a spell checker to add spaces into run together words..
    >
    > i have Adobe Reader 8 for Linux installed (9 is available)..it has a
    > button to "Save as Text"...i've looked at "acroread -man" in a
    > terminal but do not see a command line switch to do the same, but it
    > MUST be available from somewhere, somehow...if so you could pipe
    > through to happiness..
    >
    > the stock reader has a -toPostScript switch...do you have something
    > that converts PS direct to text?
    >
    > and, there is a save "as rich text format" plug-in..
    >
    > i suspect a visit to the Adobe site and/or community would be worth
    > your time..
    >
    > i bet this problem has been solved before (you might try a google)..


    There is a slight difference between the Acrobat plugin for Firefox and the
    standalone reader in that the plugin restricts you to saving as pdf while
    the standalone reader has the "save as text" option. PITA on downloads!

    --
    Will Honea

  6. #6
    Join Date
    Aug 2008
    Location
    /Linux/Userland
    Posts
    279

    Arrow Re: Need tool to convert PDF to text

    Quote Originally Posted by vodoo View Post
    I am using SuSE-11.1 (32bit). One of the tasks I need to do is to convert articles from the gazette of commerce from PDF to plain text. I tried to do this with pdftotext from the xpdf package. However, the PDF input is difficult to convert. pdftotext is loosing spaces between words and adds additional (wrong) spaces at other places.

    Question: do you know of other free open source tools which I could try?
    Try 'pdfedit' : it has an option to save file as text.
    Webpin
    Linux User 483705 @ http://counter.li.org/

  7. #7
    Join Date
    Jan 2009
    Location
    Switzerland
    Posts
    1,529

    Default Re: Need tool to convert PDF to text

    Thanks to everyone who helped.

    @zmdmw52: pdfedit is a very interesting app. It does a much better job than pdftotext. I still have to figure out how to run the conversion from the command line. This seems possible but I'm struggling with the syntax.

  8. #8
    Join Date
    Aug 2008
    Location
    /Linux/Userland
    Posts
    279

    Arrow Re: Need tool to convert PDF to text

    Quote Originally Posted by vodoo View Post
    Thanks to everyone who helped.

    @zmdmw52: pdfedit is a very interesting app. It does a much better job than pdftotext. I still have to figure out how to run the conversion from the command line. This seems possible but I'm struggling with the syntax.
    You might also want to try pdfsam - that has a lot of cmd-line options, IIRC.

    Here are 2 picts of the PDF-related packages on the Debian-based Linux Mint 7 (on laptop); many of them should be available for openSUSE as well, Webpin or sofware search on openSUSE should give an indication.

    [1]


    [2]
    Linux User 483705 @ http://counter.li.org/

  9. #9

    Default Re: Need tool to convert PDF to text

    Have a look at Convert PDF to Word (DOC) — 100% Free! a free on line service

  10. #10
    Join Date
    Aug 2008
    Location
    /Linux/Userland
    Posts
    279

    Default Re: Need tool to convert PDF to text

    Quote Originally Posted by zmdmw52 View Post
    You might also want to try pdfsam - that has a lot of cmd-line options, IIRC.

    Here are 2 picts of the PDF-related packages on the Debian-based Linux Mint 7 (on laptop); many of them should be available for openSUSE as well, Webpin or sofware search on openSUSE should give an indication.

    [1]


    [2]
    Sorry, second pict is incorrect, but can't edit that post. Will update the correct pict later.
    Linux User 483705 @ http://counter.li.org/

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •