openSUSE-12.1 to openSUSE Leap-15.3 with gImageReader and Tesseract

On openSUSE-12.1 for scanning and OCR I have installed Tesseract-ocr and gImageReader . I like the capability to be able to conduct OCR of French, German and English languages, especially since I am an expatriate living abroad.

I blogged about this before with openSUSE-11.4, and posted about it wrt openSUSE-11.3 and 11.4 in these threads

This time I thought I would re-organize my notes/post, and make it easier for one to see exactly what commands I sent in order to do this installation.

openSUSE-12.1

Program comments

Tesseract

The ‘core’ program behind the OCR software that I use is Tesseract-ocr, where Tesseract is

Tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005. From 2007 it is developed by Google.

I note a packaged version of tesseract-3.00-4.1 is available from here for openSUSE-12.1:


http://download.opensuse.org/repositories/openSUSE:/Factory:/Contrib/openSUSE_12.1

and a tesseract-3.00-5.1 is available from here for openSUSE-12.1:


http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1

gImageReader

For a graphic front end to tesseract, I like to use the python package gImageReader. I have read:

gImageReader is a simple PyGtk front-end to tesseract.

Main features:

  • Allows the user to select the part of the image they want to be recognized or directly recognize the entire image.
  • Supports PDF documents.
  • Allows the user to acquire images from scanning devices.
  • Recognized text displayed directly next to the image.
  • Basic editing of output text, including search/replace and removing line breaks on selected text.
  • Spellcheck enabled for the selected language in the output textfield if corresponding dictionary installed (requires GTKSpell).
  • User is prompted to install missing spellcheck languages (requires PackageKit or apt-file).
  • Easily switch between multiple open files.
  • Attempts to automatically detect all necessary programs, otherwise shows a configuration prompt to the user.

I note a packaged version of python-gimagereader-0.9-2.1 for openSUSE-12.1 is available here:


http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1

and a very slightly older packaged version of python-gimagereader-0.9-1.1 for openSUSE-12.1 is available here:


http://download.opensuse.org/repositories/home:/malcolmlewis:/Python/openSUSE_12.1

**Spell check dictionaries **

In addition various spell check dictionaries are very helpful when running tesseract and gimagereader, to assist in repairing words where the optical character recognition was not ideal. I confess its never been clear to me as to what spell check dictionaries are necessary. I note GTKspell is needed, and I have read reference to ispell, aspell and myspell being needed.

I noted prior to install tesseract and gimagereader, by default on my KDE desktop I had hunspell, hunspell-tools, libaspell15, myspell-american, and ispell installed. YaST also has many additional spell check dictionaries, so I saw no need to add any extra repositories to help install desired packages here (other than the basic OSS repository).

**Commands Used to Install **

In the end I sent the following commands with root permissions (deciding to try deltafox’s repository) :


zypper ar http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1/ deltafox


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

which picked up as dependencies igerman98-doc, librcc0, librcd0, and rcc-runtime. I was careful not to install aspell-ispell as that conflicts with ispell.


zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging

I noted python-imaging picked up the dependencies python-tk, tix, and tk. And I noted that python-gimagereader picked up the dependencies libbonobo, libbonoboui, libgnome, libgnomecanvas-2.0, libgnomeui, libgtkspell0, python-bonobo, python-egg, python-enchant, python-gnome-extras, python-gnomecanvas, python-gtkspell, python-imaging-sane, python-orbit, python-popular.

I had seen reference to other GNU/Linux distributions install leptonica, but as far as I can determine, neither lpetonica-tools nor liblept2 (both packaged for openSUSE) are needed.

I then removed the additional repository that I needed for much of the above.


zypper rr deltafox

This is my standard practise. I keep my respositories, lean and mean, and as soon as the install was complete I removed this respository.

I then typed the command:


gimagereader

and the application ran.

Some Images

Here is an example of the gimagereader GUI, with a French document OCR in progress, where the dictionaries have detected some misspellings
http://thumbnails52.imagebam.com/16155/ef70b1161548780.jpg](ImageBam)

Here is an example of the spell check correction in progress
http://thumbnails27.imagebam.com/16155/a89503161548782.jpg](ImageBam)

Here is a look at the language selections
http://thumbnails24.imagebam.com/16155/6d191a161548784.jpg](ImageBam)

I use this program fairly often.

Hi oldcpu
The deltafox repository links to mine the application release is 0.9 and the build is 2.1 since it’s a copy of my build :wink:

Malcolmlewis was also kind enough to package gImageReader for openSUSE-12.3.

openSUSE-12.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-12.3, I followed the guide above, but instead for 12.3 modified it to (as root) using the following repositories/commands:


zypper ar http://download.opensuse.org/repositories/home:/Lazy_Kent/openSUSE_12.3/ kent-ocr
zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_12.3/ malcolm
zypper ar http://download.opensuse.org/repositories/home:/vodoo/openSUSE_12.3/ vodoo-ocr

followed by


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging

and then removed the repositories that I added during the install:


zypper rr malcolm
zypper rr kent-ocr
zypper rr vodoo-ocr

and then typed (as a regular user):


gimagereader

and performed an OCR test with a scanned French language page.

Note the above repositories that I used are not official repositories, but rather private repositories of various individuals. The rpms that are on those repositories, while present today, may not be present tomorrow.

openSUSE-13.1 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.1.

openSUSE-13.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.1, I modified the guide above since a number of the packages are now part of the baseline openSUSE repository, plus some names of packages have changed. I sent the following package manager commands to add Malcomlewis’ repository and install the required packages:

First - to add Malcomlewis’ repository :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.1/ malcolm

Then to install the necessary applications :


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging

Then finally remove Malcomlewis’ repository (as I like to keep my repository list lean and fast) :


zypper rr malcolm

I then ran gimage reader with the command (in a terminal/konsole) :


gimagereader

Some messages that I noted during the install :


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

.......

The following NEW packages are going to be installed:
  aspell aspell-de aspell-en aspell-fr aspell-spell ispell-french ispell-german myspell-french myspell-german 

The following recommended package was automatically selected:
  aspell-en 

The following package is suggested, but will not be installed:
  aspell-ispell 

9 new packages to install.
Overall download size: 8.3 MiB. After the operation, additional 36.7 MiB will be used

and


zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

.......

The following NEW packages are going to be installed:
  docbook_4 gnome-vfs2 gnome-vfs2-lang gstreamer-0_10-plugin-gnomevfs gtkspell-lang iso_ent libbonobo libbonobo-lang libbonoboui 
  libbonoboui-lang libgnome libgnomecanvas-2-0 libgnomecanvas-lang libgnome-lang libgnomeui libgnomeui-lang libgtkspell0 libIDL-2-0 
  liblept3 libtesseract3 libyelp0 orbit2 python-bonobo python-egg python-gimagereader python-gnomecanvas python-gnome-extras 
  python-gtkspell python-imaging python-imaging-sane python-orbit python-poppler python-pyenchant python-tk sgml-skel tesseract 
  tesseract-traineddata-american tesseract-traineddata-french tesseract-traineddata-german tix tk xhost yelp yelp-xsl 

The following recommended packages were automatically selected:
  gnome-vfs2-lang gtkspell-lang libbonobo-lang libbonoboui-lang libgnomecanvas-lang libgnome-lang libgnomeui-lang 
  tesseract-traineddata-american yelp 

44 new packages to install.
Overall download size: 24.4 MiB. After the operation, additional 113.4 MiB will be used

where I obtained one error during the install (which does not appear to matter as near as I can currently determine) :


(30/44) Installing: libgnome-2.32.1-13.1.3 ...........................................................................................[done]
Additional rpm output:

(gconftool-2:7168): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

.

and then when I ran ‘gimagereader’ , while the application appears to run fine :slight_smile: I noted the following error in a terminal:


(gimagereader:7195): GLib-GObject-CRITICAL **: g_object_set_qdata: assertion 'G_IS_OBJECT (object)' failed
ERROR:dbus.proxies:Introspect error on :1.98:/org/freedesktop/PackageKit: dbus.exceptions.IntrospectionParserException: Error parsing introspect data: <class 'xml.parsers.expat.ExpatError'>: unbound prefix: line 5, column 4

… but from a functional perspective, the app currently appears to work, albeit I have more testing to do.
.

openSUSE-13.2 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.2.

openSUSE-13.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.2, I modified the guide above since a number of the packages are either no longer required or not included.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.2/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging

I obtained the error ‘python-imaging’ not found in package names. Trying capabilities. So it reads one no longer requires python-imaging.

… and finally to remove the repository :


zypper rr malcolm

Again, this can be launched with the command “gimagewriter”.

Many thanks again to Malcomlewis for packaging gimagereader.
.

openSUSE-Leap-42.1. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.1.

openSUSE Leap-42.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.1, I modified the guide above since the packages are slightly different in Leap.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.1/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gaimagreader with the command “gimagereader-q5 %U”.

I did note the error:


QTextCursor::setPosition: Position '385' out of range

However gimagereader did come up ok, and I was able to conduct an OCR from French (JPEG with text) to English language

Many thanks again to Malcomlewis for packaging gimagereader.

openSUSE-Leap-42.2. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.2. MANY THANKS Malclom !!

openSUSE Leap-42.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.2. This is pretty much identical to that used with openSUSE-Leap-42.1 - with only a repository change (to 42.2).

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.2/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gaimagreader with the command “gimagereader-q5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

openSUSE-Leap-42.3. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.3. MANY THANKS Malclom !!

openSUSE Leap-42.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.3. This is pretty much identical to that used with openSUSE-Leap-42.2 - with only a repository change (to 42.3).

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.3/ malcolm

Then to install the necessary applications :


zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-q5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

openSUSE-Leap-15.1. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.1. MANY THANKS Malclom !!

openSUSE Leap-15.1

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.1.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.1/ malcolm

To update the new repository.

 zypper update 

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok.

Many thanks again to Malcomlewis for packaging gimagereader.

Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.2. MANY THANKS Malclom !!

openSUSE Leap-15.2

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.2.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.2/ malcolm

To update the new repository.

 zypper update 

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell myspell-de myspell-fr_FR


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok. … On my PC I tested this with an OCR of a German language document.

Many thanks again to Malcomlewis for packaging gimagereader.

Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-15.3. MANY THANKS Malclom !!

openSUSE Leap-15.3

For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-15.3.

First - to add Malcomlewis’ repository (the below commands need to be sent with root permissions) :


zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_15.3/ malcolm

To update the new repository.

 zypper update 

Then to install the necessary applications :


zypper in aspell-en aspell-fr ispell-french ispell-german aspell-spell myspell-de myspell-fr_FR


zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french


zypper in gimagereader gimagereader-qt5

… and finally to remove the repository :


zypper rr malcolm

I launched gimagreader with the command “gimagereader-qt5 %U”.

Gimagereader comes up ok. … On my PC I tested this with an OCR of a German language document.

Many thanks again to Malcomlewis for packaging gimagereader.