Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: OCR scan to text

  1. #1
    Join Date
    Jun 2008
    Location
    USA
    Posts
    1,125

    Default OCR scan to text

    I have a multipage phone directory that I wanted to put in a database or spreadsheet so that it is searchable. I am trying to scan it and convert it. I can convert it to pdf but cannot convert to text. I am using hplip and xsane. Trying to save as text gives me an error message that gocr is not available. I began to install that via yast but it is over 2,200 files! So, I aborted the install. Next, I tried tesseract which installs but does not seem to run. I deinstalled both.

    Is there an easy way to copy these pages of phone numbers and addresses to make them searchable? The newest forum posting on ocr is at least 2 years old and didn't seem to give me anything I didn't already try. Some postings go back a decade!

    I tried opening the pdf as a word doc. I just thought to try opening it with a spreadsheet. If anyone has had luck in doing this, please share!
    Any sufficiently advanced technology is indistinguishable from magic. - Arthur C. Clarke

  2. #2

    Default Re: OCR scan to text

    I installed tesseract and `tesseract -v` delivered the version, so it runs.

    Code:
     tesseract -v
    tesseract 3.05.01
     leptonica-1.75.3
      libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.5.0 : libopenjp2 2.3.0
    Maybe try again?

  3. #3

    Default Re: OCR scan to text

    I just installed `pdfsandwich` from the publishing repo. The simple command

    Code:
    pdfsandwich -lang deu pdfname.pdf
    produced a sandwich pdf and the OCR really was acceptable. And it is a command line tool, running on multiple threads, hey, that was fast!

    Seems a great and easy to use software.

  4. #4
    Join Date
    Jun 2008
    Location
    USA
    Posts
    1,125

    Default Re: OCR scan to text

    I did get tesseract to install. I might have had a problem confusing it with the first-person shooter of the same name!

    The question remains: how do I get it to work? I see no option in LibreOffice to use it. It is not listed as an extension or on any menu that I saw. The --help provided nothing to me that appears to answer the question of running it, only setting up options. Tried running if from the CLI, but nothing happened. Adding the file name to the CLI only brought up the help list. I was hoping for a GUI or at least a GUI interface to LibreOffice.

    I could not find pdfsandwich.
    Any sufficiently advanced technology is indistinguishable from magic. - Arthur C. Clarke

  5. #5

    Default Re: OCR scan to text

    Quote Originally Posted by Prexy View Post
    I did get tesseract to install. I might have had a problem confusing it with the first-person shooter of the same name!
    Ok, problem solved.

    The question remains: how do I get it to work? I see no option in LibreOffice to use it. It is not listed as an extension or on any menu that I saw. The --help provided nothing to me that appears to answer the question of running it, only setting up options.
    No, it is a command line tool. Read an introduction into OCR.

    Tried running if from the CLI, but nothing happened. Adding the file name to the CLI only brought up the help list.
    `man tesseract` gives the manpage.

    I was hoping for a GUI or at least a GUI interface to LibreOffice.
    Yes, as far as I remember, there once was gImageReader.

    I could not find pdfsandwich.
    https://software.opensuse.org/packag...rm=pdfsandwich

  6. #6
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,823
    Blog Entries
    15

    Default Re: OCR scan to text

    Quote Originally Posted by cookie170 View Post
    Ok, problem solved.



    No, it is a command line tool. Read an introduction into OCR.



    `man tesseract` gives the manpage.



    Yes, as far as I remember, there once was gImageReader.



    https://software.opensuse.org/packag...rm=pdfsandwich
    Hi
    Yes, gImageReader is still there Just in my home repository (been waiting for 3.3.0 to appear and then may push to the Publishing repo), as seen pdfsandwich is already there...

    https://software.opensuse.org/package/gimagereader
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  7. #7

    Default Re: OCR scan to text

    Quote Originally Posted by malcolmlewis View Post
    Hi
    Yes, gImageReader is still there Just in my home repository (been waiting for 3.3.0 to appear and then may push to the Publishing repo), as seen pdfsandwich is already there...

    https://software.opensuse.org/package/gimagereader
    Your rpm doesn't contain a binary. Only doc and icons. Or did I miss something?

  8. #8
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,823
    Blog Entries
    15

    Default Re: OCR scan to text

    Quote Originally Posted by cookie170 View Post
    Your rpm doesn't contain a binary. Only doc and icons. Or did I miss something?
    Hi
    The gtk and/or qt5 package
    https://build.opensuse.org/package/b...USE_Tumbleweed
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  9. #9

    Default Re: OCR scan to text

    Quote Originally Posted by malcolmlewis View Post
    Trying to install the Leap 15 Versions fails with:

    Code:
    libQt5Core.so.5(Qt_5.11)(64bit) benötigt von gimagereader-qt5-3.2.3-1.26.x86_64 wird nirgends zur Verfügung gestellt
    Translation: libQt5Core... isn't available.

    Correct, because Leap 15 comes with:

    Code:
    /usr/lib64/libQt5Core.so.5
    /usr/lib64/libQt5Core.so.5.9
    /usr/lib64/libQt5Core.so.5.9.4
    Would you be so kind and compile again ? Thahaanks!

  10. #10
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,823
    Blog Entries
    15

    Default Re: OCR scan to text

    Quote Originally Posted by cookie170 View Post
    Trying to install the Leap 15 Versions fails with:

    Code:
    libQt5Core.so.5(Qt_5.11)(64bit) benötigt von gimagereader-qt5-3.2.3-1.26.x86_64 wird nirgends zur Verfügung gestellt
    Translation: libQt5Core... isn't available.

    Correct, because Leap 15 comes with:

    Code:
    /usr/lib64/libQt5Core.so.5
    /usr/lib64/libQt5Core.so.5.9
    /usr/lib64/libQt5Core.so.5.9.4
    Would you be so kind and compile again ? Thahaanks!
    Hi
    This thread is about Tumbleweed, hence my link....
    You need the versions from https://download.opensuse.org/reposi...USE_Leap_15.0/

    One of the reasons it's better to start a new thread, even if about the same thing if your on a different release since threads have prefixes
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •