Results 1 to 9 of 9

Thread: OCR program?

  1. #1

    Default OCR program?

    Is there a good OCR program that works well in openSUSE?

  2. #2

    Default Re: OCR program?

    Quote Originally Posted by 6tr6tr View Post
    Is there a good OCR program that works well in openSUSE?
    This may help
    https://forums.opensuse.org/blogs/ol...-tesseract-77/
    People who do not break things first will never learn to create anything

  3. #3
    Join Date
    Aug 2008
    Location
    Brazil
    Posts
    2,908

    Default Re: OCR program?

    After much testing, I ended up using online OCR services, like Free online OCR

    At the time it worked much better than any other standalone app, don't know if things improved nowadays.

  4. #4

    Default Re: OCR program?

    I would strongly recommend Cuneiform for OCR.

    There are two options:
    1. OCRFeeder with the Cuneiform engine for Linux (click on "show unstable packages")
    2. The original Cuneiform OCR program for Windows, running under Wine. This produces EXCELLENT results and runs quite well with Wine. It looks like it's still available here: Index of ftp://mrclon.lianet.ru/Soft/CuneiForm

  5. #5

    Default Re: OCR program?

    Quote Originally Posted by rahim123 View Post
    I would strongly recommend Cuneiform for OCR.

    There are two options:
    1. OCRFeeder with the Cuneiform engine for Linux (click on "show unstable packages")
    2. The original Cuneiform OCR program for Windows, running under Wine. This produces EXCELLENT results and runs quite well with Wine. It looks like it's still available here: Index of ftp://mrclon.lianet.ru/Soft/CuneiForm
    Thanks! I'd prefer not to put WINE on my system. So I'll look into the first option.

  6. #6
    Join Date
    Jun 2008
    Location
    Rural Australia
    Posts
    289

    Default Re: OCR program?

    Am NON-Technical :-O


    As can see opensuse package versions of gocr or ocrad do they achieve similar results ?


    Paul


    Code:
    linux-xfp4:~ # zypper se ocr
    Loading repository data...
    Reading installed packages...
    
    S | Name        | Summary                                                           | Type   
    --+-------------+-------------------------------------------------------------------+--------
      | gocr        | Optical Character Recognition Program                             | package
      | gocr-gui    | Optical Character Recognition Program - Basic Graphical Interface | package
      | ocrad       | Optical Character Recognition Program                             | package
      | ocrad-devel | Development files for GNU ocrad                                   | package
    linux-xfp4:~ # zypper info gocr
    Loading repository data...
    Reading installed packages...
    
    
    Information for package gocr:
    
    Repository: openSUSE-12.1-Oss
    Name: gocr
    Version: 0.49-3.1.2
    Arch: x86_64
    Vendor: openSUSE
    Installed: No
    Status: not installed
    Installed Size: 904.0 KiB
    Summary: Optical Character Recognition Program
    Description: 
    GOCR is an optical character recognition program. It reads images in
    many formats and outputs a text file. It is also able to recognize
    and translate barcodes.
    linux-xfp4:~ # zypper info ocrad
    Loading repository data...
    Reading installed packages...
    
    
    Information for package ocrad:
    
    Repository: openSUSE-12.1-Oss
    Name: ocrad
    Version: 0.21-12.1.2
    Arch: x86_64
    Vendor: openSUSE
    Installed: No
    Status: not installed
    Installed Size: 290.0 KiB
    Summary: Optical Character Recognition Program
    Description: 
    GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature
    extraction method. It reads images in pbm (bitmap), pgm (greyscale) or ppm
    (color) formats and produces text in byte (8-bit) or UTF-8 formats.
    Also includes a layout analyser able to separate the columns or blocks of text
    normally found on printed pages.
    Ocrad can be used as a stand-alone console application, or as a backend to
    other programs.
    linux-xfp4:~ #

    .

  7. #7
    Join Date
    Jun 2008
    Location
    West Yorkshire, UK
    Posts
    3,450

    Default Re: OCR program?

    Not in my experience; go to software.opensuse.org and search for tesseract; you will find several 'unstable' versions - I installed the one in the LazyKent repository which should also install yagf. If not install that as well.

    You then need the relevant traineddata; in my case it was eng.traineddata. I Googled for the latest 3.02 version, downloaded it and unzipped it. You have a directory /usr/share/tessdata; within the unzipped files you will find a folder called 'tessdata'; simply copy the files in this folder into your /usr/share/tessdata directory (using su-- to acquire root privileges).

    (Some of the traineddata packages are in software.opensuse.org but I couldn't find the one I needed.)

    When you open yagf for the first time change the setting for the OCR engine to tesseract.

    I got results many times better from tesseract than I have ever got from gocr or ocrad. I haven't tried cuneiform, the other engine supported by yagf.

    (Cue some comments from someone with experience of both tesseract and cuneiform.)

  8. #8
    Join Date
    Jun 2008
    Location
    West Yorkshire, UK
    Posts
    3,450

    Default Re: OCR program?

    Thought I'd try to answer my own question so I scanned in the same page at 300dpi greyscale and tried gocr, cuneiform and tesseract.

    gocr found all the text but much of its output was incomprehensible; cuneiform read the middle of the page very well but failed to read any of the top and bottom paragraphs. Tesseract read the whole page with relatively few errors, most of them obvious substitutions.

  9. #9

    Default Re: OCR program?

    Quote Originally Posted by john_hudson View Post
    Thought I'd try to answer my own question so I scanned in the same page at 300dpi greyscale and tried gocr, cuneiform and tesseract.

    gocr found all the text but much of its output was incomprehensible; cuneiform read the middle of the page very well but failed to read any of the top and bottom paragraphs. Tesseract read the whole page with relatively few errors, most of them obvious substitutions.
    Thanks for doing that test! Is there an official release of tesseract or only unstable versions?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •