openSUSE-12.1 to openSUSE Leap-15.3 with gImageReader and Tesseract

oldcpu · November 27, 2011, 3:42pm

On openSUSE-12.1 for scanning and OCR I have installed Tesseract-ocr and gImageReader . I like the capability to be able to conduct OCR of French, German and English languages, especially since I am an expatriate living abroad.

I blogged about this before with openSUSE-11.4, and posted about it wrt openSUSE-11.3 and 11.4 in these threads

openSUSE forums: OCR and Linux and
obtained help in this thread openSUSE forum help thread: How does one meet a python2-devel dependency requirement on openSUSE-11.3? and
this blog article (further down in the blog) Blog openSUSE forums: new 64-bit openSUSE-11.4 KDE installation on my main PC (Core i7-920)

This time I thought I would re-organize my notes/post, and make it easier for one to see exactly what commands I sent in order to do this installation.

openSUSE-12.1

Program comments

Tesseract

The ‘core’ program behind the OCR software that I use is Tesseract-ocr, where Tesseract is

Tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005. From 2007 it is developed by Google.

I note a packaged version of tesseract-3.00-4.1 is available from here for openSUSE-12.1:


http://download.opensuse.org/repositories/openSUSE:/Factory:/Contrib/openSUSE_12.1

and a tesseract-3.00-5.1 is available from here for openSUSE-12.1:


http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1

gImageReader

For a graphic front end to tesseract, I like to use the python package gImageReader. I have read:

gImageReader is a simple PyGtk front-end to tesseract.

Main features:

Allows the user to select the part of the image they want to be recognized or directly recognize the entire image.

Supports PDF documents.

Allows the user to acquire images from scanning devices.

Recognized text displayed directly next to the image.

Basic editing of output text, including search/replace and removing line breaks on selected text.

Spellcheck enabled for the selected language in the output textfield if corresponding dictionary installed (requires GTKSpell).

User is prompted to install missing spellcheck languages (requires PackageKit or apt-file).

Easily switch between multiple open files.

Attempts to automatically detect all necessary programs, otherwise shows a configuration prompt to the user.

I note a packaged version of python-gimagereader-0.9-2.1 for openSUSE-12.1 is available here:


http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1

and a very slightly older packaged version of python-gimagereader-0.9-1.1 for openSUSE-12.1 is available here:


http://download.opensuse.org/repositories/home:/malcolmlewis:/Python/openSUSE_12.1

**Spell check dictionaries **

In addition various spell check dictionaries are very helpful when running tesseract and gimagereader, to assist in repairing words where the optical character recognition was not ideal. I confess its never been clear to me as to what spell check dictionaries are necessary. I note GTKspell is needed, and I have read reference to ispell, aspell and myspell being needed.

I noted prior to install tesseract and gimagereader, by default on my KDE desktop I had hunspell, hunspell-tools, libaspell15, myspell-american, and ispell installed. YaST also has many additional spell check dictionaries, so I saw no need to add any extra repositories to help install desired packages here (other than the basic OSS repository).

**Commands Used to Install **

In the end I sent the following commands with root permissions (deciding to try deltafox’s repository) :


zypper ar http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1/ deltafox


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

which picked up as dependencies igerman98-doc, librcc0, librcd0, and rcc-runtime. I was careful not to install aspell-ispell as that conflicts with ispell.


zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging

I noted python-imaging picked up the dependencies python-tk, tix, and tk. And I noted that python-gimagereader picked up the dependencies libbonobo, libbonoboui, libgnome, libgnomecanvas-2.0, libgnomeui, libgtkspell0, python-bonobo, python-egg, python-enchant, python-gnome-extras, python-gnomecanvas, python-gtkspell, python-imaging-sane, python-orbit, python-popular.

I had seen reference to other GNU/Linux distributions install leptonica, but as far as I can determine, neither lpetonica-tools nor liblept2 (both packaged for openSUSE) are needed.

I then removed the additional repository that I needed for much of the above.


zypper rr deltafox

This is my standard practise. I keep my respositories, lean and mean, and as soon as the install was complete I removed this respository.

I then typed the command:


gimagereader

and the application ran.

Some Images

Here is an example of the gimagereader GUI, with a French document OCR in progress, where the dictionaries have detected some misspellings
http://thumbnails52.imagebam.com/16155/ef70b1161548780.jpg](ImageBam)

Here is an example of the spell check correction in progress
http://thumbnails27.imagebam.com/16155/a89503161548782.jpg](ImageBam)

Here is a look at the language selections
http://thumbnails24.imagebam.com/16155/6d191a161548784.jpg](ImageBam)

I use this program fairly often.

malcolmlewis · December 2, 2011, 3:14pm

Hi oldcpu
The deltafox repository links to mine the application release is 0.9 and the build is 2.1 since it’s a copy of my build

oldcpu · May 1, 2013, 12:11pm

Malcolmlewis was also kind enough to package gImageReader for openSUSE-12.3.

openSUSE-12.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-12.3, I followed the guide above, but instead for 12.3 modified it to (as root) using the following repositories/commands:


zypper ar http://download.opensuse.org/repositories/home:/Lazy_Kent/openSUSE_12.3/ kent-ocr
zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_12.3/ malcolm
zypper ar http://download.opensuse.org/repositories/home:/vodoo/openSUSE_12.3/ vodoo-ocr

followed by


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging

and then removed the repositories that I added during the install:


zypper rr malcolm
zypper rr kent-ocr
zypper rr vodoo-ocr

and then typed (as a regular user):


gimagereader

and performed an OCR test with a scanned French language page.

Note the above repositories that I used are not official repositories, but rather private repositories of various individuals. The rpms that are on those repositories, while present today, may not be present tomorrow.

oldcpu · December 27, 2013, 1:15pm

openSUSE-13.1 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.1.

openSUSE-13.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.1, I modified the guide above since a number of the packages are now part of the baseline openSUSE repository, plus some names of packages have changed. I sent the following package manager commands to add Malcomlewis’ repository and install the required packages:

First - to add Malcomlewis’ repository :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.1/ malcolm

Then to install the necessary applications :


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging

Then finally remove Malcomlewis’ repository (as I like to keep my repository list lean and fast) :


zypper rr malcolm

I then ran gimage reader with the command (in a terminal/konsole) :


gimagereader

…

Some messages that I noted during the install :


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

.......

The following NEW packages are going to be installed:
  aspell aspell-de aspell-en aspell-fr aspell-spell ispell-french ispell-german myspell-french myspell-german 

The following recommended package was automatically selected:
  aspell-en 

The following package is suggested, but will not be installed:
  aspell-ispell 

9 new packages to install.
Overall download size: 8.3 MiB. After the operation, additional 36.7 MiB will be used

and


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

.......

The following NEW packages are going to be installed:
  docbook_4 gnome-vfs2 gnome-vfs2-lang gstreamer-0_10-plugin-gnomevfs gtkspell-lang iso_ent libbonobo libbonobo-lang libbonoboui 
  libbonoboui-lang libgnome libgnomecanvas-2-0 libgnomecanvas-lang libgnome-lang libgnomeui libgnomeui-lang libgtkspell0 libIDL-2-0 
  liblept3 libtesseract3 libyelp0 orbit2 python-bonobo python-egg python-gimagereader python-gnomecanvas python-gnome-extras 
  python-gtkspell python-imaging python-imaging-sane python-orbit python-poppler python-pyenchant python-tk sgml-skel tesseract 
  tesseract-traineddata-american tesseract-traineddata-french tesseract-traineddata-german tix tk xhost yelp yelp-xsl 

The following recommended packages were automatically selected:
  gnome-vfs2-lang gtkspell-lang libbonobo-lang libbonoboui-lang libgnomecanvas-lang libgnome-lang libgnomeui-lang 
  tesseract-traineddata-american yelp 

44 new packages to install.
Overall download size: 24.4 MiB. After the operation, additional 113.4 MiB will be used

where I obtained one error during the install (which does not appear to matter as near as I can currently determine) :


(30/44) Installing: libgnome-2.32.1-13.1.3 ...........................................................................................[done]
Additional rpm output:

(gconftool-2:7168): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

.

and then when I ran ‘gimagereader’ , while the application appears to run fine I noted the following error in a terminal:


(gimagereader:7195): GLib-GObject-CRITICAL **: g_object_set_qdata: assertion 'G_IS_OBJECT (object)' failed
ERROR:dbus.proxies:Introspect error on :1.98:/org/freedesktop/PackageKit: dbus.exceptions.IntrospectionParserException: Error parsing introspect data: <class 'xml.parsers.expat.ExpatError'>: unbound prefix: line 5, column 4

… but from a functional perspective, the app currently appears to work, albeit I have more testing to do.
.

oldcpu · November 16, 2014, 9:40pm

openSUSE-13.2 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.2.

openSUSE-13.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.2, I modified the guide above since a number of the packages are either no longer required or not included.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.2/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging

I obtained the error ‘python-imaging’ not found in package names. Trying capabilities. So it reads one no longer requires python-imaging.

… and finally to remove the repository :


zypper rr malcolm

Again, this can be launched with the command “gimagewriter”.

Many thanks again to Malcomlewis for packaging gimagereader.
.

oldcpu · April 8, 2016, 8:22pm

openSUSE-Leap-42.1. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.1.

openSUSE Leap-42.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.1, I modified the guide above since the packages are slightly different in Leap.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.1/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gaimagreader with the command “gimagereader-q5 %U”.

I did note the error:


QTextCursor::setPosition: Position '385' out of range

However gimagereader did come up ok, and I was able to conduct an OCR from French (JPEG with text) to English language

Many thanks again to Malcomlewis for packaging gimagereader.

oldcpu · November 26, 2016, 5:20pm

openSUSE-Leap-42.2. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.2. MANY THANKS Malclom !!

openSUSE Leap-42.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.2. This is pretty much identical to that used with openSUSE-Leap-42.1 - with only a repository change (to 42.2).

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.2/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gaimagreader with the command “gimagereader-q5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

oldcpu · September 3, 2017, 12:08am

openSUSE-Leap-42.3. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.3. MANY THANKS Malclom !!

openSUSE Leap-42.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.3. This is pretty much identical to that used with openSUSE-Leap-42.2 - with only a repository change (to 42.3).

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.3/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-q5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

oldcpu · August 27, 2019, 7:50am

openSUSE-Leap-15.1. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.1. MANY THANKS Malclom !!

openSUSE Leap-15.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.1.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.1/ malcolm

To update the new repository.

 zypper update

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

oldcpu · July 4, 2020, 8:58am

Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.2. MANY THANKS Malclom !!

openSUSE Leap-15.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.2.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.2/ malcolm

To update the new repository.

 zypper update

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell myspell-de myspell-fr_FR


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok. … On my PC I tested this with an OCR of a German language document.

Many thanks again to Malcomlewis for packaging gimagereader.

oldcpu · October 14, 2021, 4:38pm

Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.3. MANY THANKS Malclom !!

openSUSE Leap-15.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.3.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.3/ malcolm

To update the new repository.

 zypper update

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell myspell-de myspell-fr_FR


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok. … On my PC I tested this with an OCR of a German language document.

Many thanks again to Malcomlewis for packaging gimagereader.