View RSS Feed

oldcpu's meandering thoughts on Computers, GNU/Linux and openSUSE

openSUSE-12.1 to openSUSE Leap-42.3 with gImageReader and Tesseract

Rating: 2 votes, 5.00 average.
On openSUSE-12.1 for scanning and OCR I have installed Tesseract-ocr and gImageReader . I like the capability to be able to conduct OCR of French, German and English languages, especially since I am an expatriate living abroad.

I blogged about this before with openSUSE-11.4, and posted about it wrt openSUSE-11.3 and 11.4 in these threads


This time I thought I would re-organize my notes/post, and make it easier for one to see exactly what commands I sent in order to do this installation.

openSUSE-12.1

Program comments

Tesseract

The 'core' program behind the OCR software that I use is Tesseract-ocr, where Tesseract is
Tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005. From 2007 it is developed by Google.
I note a packaged version of tesseract-3.00-4.1 is available from here for openSUSE-12.1:
Code:
http://download.opensuse.org/repositories/openSUSE:/Factory:/Contrib/openSUSE_12.1
and a tesseract-3.00-5.1 is available from here for openSUSE-12.1:
Code:
http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1
gImageReader

For a graphic front end to tesseract, I like to use the python package gImageReader. I have read:
gImageReader is a simple PyGtk front-end to tesseract.

Main features:
  • Allows the user to select the part of the image they want to be recognized or directly recognize the entire image.
  • Supports PDF documents.
  • Allows the user to acquire images from scanning devices.
  • Recognized text displayed directly next to the image.
  • Basic editing of output text, including search/replace and removing line breaks on selected text.
  • Spellcheck enabled for the selected language in the output textfield if corresponding dictionary installed (requires GTKSpell).
  • User is prompted to install missing spellcheck languages (requires PackageKit or apt-file).
  • Easily switch between multiple open files.
  • Attempts to automatically detect all necessary programs, otherwise shows a configuration prompt to the user.
I note a packaged version of python-gimagereader-0.9-2.1 for openSUSE-12.1 is available here:
Code:
http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1
and a very slightly older packaged version of python-gimagereader-0.9-1.1 for openSUSE-12.1 is available here:
Code:
http://download.opensuse.org/repositories/home:/malcolmlewis:/Python/openSUSE_12.1
Spell check dictionaries

In addition various spell check dictionaries are very helpful when running tesseract and gimagereader, to assist in repairing words where the optical character recognition was not ideal. I confess its never been clear to me as to what spell check dictionaries are necessary. I note GTKspell is needed, and I have read reference to ispell, aspell and myspell being needed.

I noted prior to install tesseract and gimagereader, by default on my KDE desktop I had hunspell, hunspell-tools, libaspell15, myspell-american, and ispell installed. YaST also has many additional spell check dictionaries, so I saw no need to add any extra repositories to help install desired packages here (other than the basic OSS repository).

Commands Used to Install

In the end I sent the following commands with root permissions (deciding to try deltafox's repository) :

Code:
zypper ar http://download.opensuse.org/repositories/home:/deltafox/openSUSE_12.1/ deltafox
Code:
zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
which picked up as dependencies igerman98-doc, librcc0, librcd0, and rcc-runtime. I was careful not to install aspell-ispell as that conflicts with ispell.

Code:
zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging
I noted python-imaging picked up the dependencies python-tk, tix, and tk. And I noted that python-gimagereader picked up the dependencies libbonobo, libbonoboui, libgnome, libgnomecanvas-2.0, libgnomeui, libgtkspell0, python-bonobo, python-egg, python-enchant, python-gnome-extras, python-gnomecanvas, python-gtkspell, python-imaging-sane, python-orbit, python-popular.

I had seen reference to other GNU/Linux distributions install leptonica, but as far as I can determine, neither lpetonica-tools nor liblept2 (both packaged for openSUSE) are needed.

I then removed the additional repository that I needed for much of the above.
Code:
zypper rr deltafox
This is my standard practise. I keep my respositories, lean and mean, and as soon as the install was complete I removed this respository.

I then typed the command:
Code:
gimagereader
and the application ran.

Some Images

Here is an example of the gimagereader GUI, with a French document OCR in progress, where the dictionaries have detected some misspellings


Here is an example of the spell check correction in progress


Here is a look at the language selections


I use this program fairly often.

Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to Digg Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to del.icio.us Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to StumbleUpon Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to Google Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to Facebook Submit "openSUSE-12.1 to openSUSE Leap-42.3  with gImageReader and Tesseract" to Twitter

Updated 06-Sep-2017 at 13:52 by oldcpu

Categories
Uncategorized

Comments

  1. malcolmlewis's Avatar
    Hi oldcpu
    The deltafox repository links to mine the application release is 0.9 and the build is 2.1 since it's a copy of my build
  2. oldcpu's Avatar
    Malcolmlewis was also kind enough to package gImageReader for openSUSE-12.3.

    openSUSE-12.3

    For information, to setup gImageReader to read/OCR German and French in opnSUSE-12.3, I followed the guide above, but instead for 12.3 modified it to (as root) using the following repositories/commands:

    Code:
    zypper ar http://download.opensuse.org/repositories/home:/Lazy_Kent/openSUSE_12.3/ kent-ocr
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_12.3/ malcolm
    zypper ar http://download.opensuse.org/repositories/home:/vodoo/openSUSE_12.3/ vodoo-ocr
    followed by

    Code:
    zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    zypper in tesseract tesseract-traineddata-deu tesseract-traineddata-eng tesseract-traineddata-fra python-gimagereader python-imaging
    and then removed the repositories that I added during the install:
    Code:
    zypper rr malcolm
    zypper rr kent-ocr
    zypper rr vodoo-ocr

    and then typed (as a regular user):
    Code:
    gimagereader
    and performed an OCR test with a scanned French language page.

    Note the above repositories that I used are not official repositories, but rather private repositories of various individuals. The rpms that are on those repositories, while present today, may not be present tomorrow.
    Updated 08-Apr-2016 at 11:25 by oldcpu
  3. oldcpu's Avatar
    Quote Originally Posted by oldcpu
    Malcolmlewis was also kind enough to package gImageReader for openSUSE-12.3.
    openSUSE-13.1 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.1.

    openSUSE-13.1

    For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.1, I modified the guide above since a number of the packages are now part of the baseline openSUSE repository, plus some names of packages have changed. I sent the following package manager commands to add Malcomlewis' repository and install the required packages:

    First - to add Malcomlewis' repository :
    Code:
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.1/ malcolm
    Then to install the necessary applications :
    Code:
    zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging
    Then finally remove Malcomlewis' repository (as I like to keep my repository list lean and fast) :
    Code:
    zypper rr malcolm
    I then ran gimage reader with the command (in a terminal/konsole) :
    Code:
    gimagereader
    ................

    Some messages that I noted during the install :
    Code:
    zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    
    .......
    
    The following NEW packages are going to be installed:
      aspell aspell-de aspell-en aspell-fr aspell-spell ispell-french ispell-german myspell-french myspell-german 
    
    The following recommended package was automatically selected:
      aspell-en 
    
    The following package is suggested, but will not be installed:
      aspell-ispell 
    
    9 new packages to install.
    Overall download size: 8.3 MiB. After the operation, additional 36.7 MiB will be used
    and
    Code:
    zypper in myspell-french myspell-german aspell aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    
    .......
    
    The following NEW packages are going to be installed:
      docbook_4 gnome-vfs2 gnome-vfs2-lang gstreamer-0_10-plugin-gnomevfs gtkspell-lang iso_ent libbonobo libbonobo-lang libbonoboui 
      libbonoboui-lang libgnome libgnomecanvas-2-0 libgnomecanvas-lang libgnome-lang libgnomeui libgnomeui-lang libgtkspell0 libIDL-2-0 
      liblept3 libtesseract3 libyelp0 orbit2 python-bonobo python-egg python-gimagereader python-gnomecanvas python-gnome-extras 
      python-gtkspell python-imaging python-imaging-sane python-orbit python-poppler python-pyenchant python-tk sgml-skel tesseract 
      tesseract-traineddata-american tesseract-traineddata-french tesseract-traineddata-german tix tk xhost yelp yelp-xsl 
    
    The following recommended packages were automatically selected:
      gnome-vfs2-lang gtkspell-lang libbonobo-lang libbonoboui-lang libgnomecanvas-lang libgnome-lang libgnomeui-lang 
      tesseract-traineddata-american yelp 
    
    44 new packages to install.
    Overall download size: 24.4 MiB. After the operation, additional 113.4 MiB will be used
    where I obtained one error during the install (which does not appear to matter as near as I can currently determine) :
    Code:
    (30/44) Installing: libgnome-2.32.1-13.1.3 ...........................................................................................[done]
    Additional rpm output:
    
    (gconftool-2:7168): GConf-WARNING **: Client failed to connect to the D-BUS daemon:
    Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
    .

    and then when I ran 'gimagereader' , while the application appears to run fine I noted the following error in a terminal:
    Code:
    (gimagereader:7195): GLib-GObject-CRITICAL **: g_object_set_qdata: assertion 'G_IS_OBJECT (object)' failed
    ERROR:dbus.proxies:Introspect error on :1.98:/org/freedesktop/PackageKit: dbus.exceptions.IntrospectionParserException: Error parsing introspect data: <class 'xml.parsers.expat.ExpatError'>: unbound prefix: line 5, column 4
    ... but from a functional perspective, the app currently appears to work, albeit I have more testing to do.
    .
    Updated 08-Apr-2016 at 11:25 by oldcpu
  4. oldcpu's Avatar
    openSUSE-13.2 gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-13.2.

    openSUSE-13.2

    For information, to setup gImageReader to read/OCR German and French in opnSUSE-13.2, I modified the guide above since a number of the packages are either no longer required or not included.

    First - to add Malcomlewis' repository (the below commands need to be sent with root permissions) :
    Code:
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/Miscellanous/openSUSE_13.2/ malcolm
    Then to install the necessary applications :
    Code:
    zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    Code:
    zypper in tesseract tesseract-traineddata-german tesseract-traineddata-american tesseract-traineddata-french python-gimagereader python-imaging
    I obtained the error 'python-imaging' not found in package names. Trying capabilities. So it reads one no longer requires python-imaging.

    ... and finally to remove the repository :
    Code:
    zypper rr malcolm
    Again, this can be launched with the command "gimagewriter".

    Many thanks again to Malcomlewis for packaging gimagereader.
    .
    Updated 08-Apr-2016 at 11:26 by oldcpu
  5. oldcpu's Avatar
    openSUSE-Leap-42.1. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.1.

    openSUSE Leap-42.1

    For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.1, I modified the guide above since the packages are slightly different in Leap.

    First - to add Malcomlewis' repository (the below commands need to be sent with root permissions) :
    Code:
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.1/ malcolm
    Then to install the necessary applications :

    Code:
    zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    Code:
    zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
    zypper in gimagereader gimagereader-qt5
    ... and finally to remove the repository :

    Code:
    zypper rr malcolm
    I launched gaimagreader with the command "gimagereader-q5 %U".

    I did note the error:
    Code:
    QTextCursor::setPosition: Position '385' out of range
    However gimagereader did come up ok, and I was able to conduct an OCR from French (JPEG with text) to English language

    Many thanks again to Malcomlewis for packaging gimagereader.
    Updated 08-Apr-2016 at 11:26 by oldcpu
  6. oldcpu's Avatar
    openSUSE-Leap-42.2. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.2. MANY THANKS Malclom !!


    openSUSE Leap-42.2


    For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.2. This is pretty much identical to that used with openSUSE-Leap-42.1 - with only a repository change (to 42.2).


    First - to add Malcomlewis' repository (the below commands need to be sent with root permissions) :
    Code:
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.2/ malcolm
    Then to install the necessary applications :


    Code:
    zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell

    Code:
    zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french 
    zypper in gimagereader gimagereader-qt5

    ... and finally to remove the repository :


    Code:
    zypper rr malcolm

    I launched gaimagreader with the command "gimagereader-q5 %U".


    Gimagereader comes up ok.


    Many thanks again to Malcomlewis for packaging gimagereader.
  7. oldcpu's Avatar
    openSUSE-Leap-42.3. gimagereader - Once again, Malcolmlewis was also kind enough to package gImageReader for openSUSE-Leap-42.3. MANY THANKS Malclom !!

    openSUSE Leap-42.3

    For information, to setup gImageReader to read/OCR German and French in opnSUSE-Leap-42.3. This is pretty much identical to that used with openSUSE-Leap-42.2 - with only a repository change (to 42.3).

    First - to add Malcomlewis' repository (the below commands need to be sent with root permissions) :
    Code:
    zypper ar http://download.opensuse.org/repositories/home:/malcolmlewis:/openSUSE_General/openSUSE_Leap_42.3/ malcolm
    Then to install the necessary applications :
    Code:
    zypper in aspell-de aspell-en aspell-fr ispell-french ispell-german aspell-spell
    Code:
    zypper in tesseract-ocr tesseract-ocr-traineddata-german tesseract-ocr-traineddata-english tesseract-ocr-traineddata-french
    Code:
    zypper in gimagereader gimagereader-qt5
    ... and finally to remove the repository :
    Code:
    zypper rr malcolm
    I launched gimagreader with the command "gimagereader-q5 %U".


    Gimagereader comes up ok.


    Many thanks again to Malcomlewis for packaging gimagereader.