I have a multipage phone directory that I wanted to put in a database or spreadsheet so that it is searchable. I am trying to scan it and convert it. I can convert it to pdf but cannot convert to text. I am using hplip and xsane. Trying to save as text gives me an error message that gocr is not available. I began to install that via yast but it is over 2,200 files! So, I aborted the install. Next, I tried tesseract which installs but does not seem to run. I deinstalled both.
Is there an easy way to copy these pages of phone numbers and addresses to make them searchable? The newest forum posting on ocr is at least 2 years old and didn’t seem to give me anything I didn’t already try. Some postings go back a decade!
I tried opening the pdf as a word doc. I just thought to try opening it with a spreadsheet. If anyone has had luck in doing this, please share!
I did get tesseract to install. I might have had a problem confusing it with the first-person shooter of the same name!
The question remains: how do I get it to work? I see no option in LibreOffice to use it. It is not listed as an extension or on any menu that I saw. The --help provided nothing to me that appears to answer the question of running it, only setting up options. Tried running if from the CLI, but nothing happened. Adding the file name to the CLI only brought up the help list. I was hoping for a GUI or at least a GUI interface to LibreOffice.
The question remains: how do I get it to work? I see no option in LibreOffice to use it. It is not listed as an extension or on any menu that I saw. The --help provided nothing to me that appears to answer the question of running it, only setting up options.
No, it is a command line tool. Read an introduction into OCR.
Tried running if from the CLI, but nothing happened. Adding the file name to the CLI only brought up the help list.
man tesseract gives the manpage.
I was hoping for a GUI or at least a GUI interface to LibreOffice.
Yes, as far as I remember, there once was gImageReader.
Yes, gImageReader is still there Just in my home repository (been waiting for 3.3.0 to appear and then may push to the Publishing repo), as seen pdfsandwich is already there…
hmmm… downloaded and installed both tesseract the game and tesseract-ocr the utility. Also installed gimagereader. Neither will start from the menu or CLI. Going to reboot to see if that means anything, but wanted to post before I lost the thread.
However, I installed the Leap 15 version and it requests the newer version of libQt5Core, as I wrote. Have a look at it. And that said, let’s drop this issue.
I began by looking for the python files and they were not available for Tumbleweed… at least in the repos I have set up. But thank you for the response.
Hi Malcolm,
I am on TW Plasma and wanted to give tesseract a go and saw your packaging of gimagereader-qt5
It failed to install because of libpodofo.so 0.9.6 which is required.
I found out that TW has version 0.9.7 so gimagereader won’t start.
Can you help me out getting it to work?
Let me echo the thankyou to Malcom for gimagereader packaging.
Last night I successfully installed gimagereader on my Lenovo X1 Carbon gen-9 laptop. Granted I have a LEAP-15.3 install, and not Tumbleweed, but never the less I do very much appreciate the work done to first package, and then over many years continue to package this app for openSUSE. I’ve maintained my blog on this here: https://forums.opensuse.org/entry.php/77-openSUSE-12-1-to-openSUSE-Leap-15-3-with-gImageReader-and-Tesseract?bt=1232#comment1232 and I think the approach used for LEAP-15.3 should work for Tumbleweed - only the repositories need to be changed.