Results 1 to 3 of 3

Thread: OCR-Enginge Tesseract: how to automate text recognition on a large ammount of files

  1. #1

    Default OCR-Enginge Tesseract: how to automate text recognition on a large ammount of files

    Hi there - hello community,



    i have a large ammount of files that i want to parse; they look like these ones: See a example:

    http://www.foundationfinder.ch/ShowD...ge=&Type=Image
    http://www.foundationfinder.ch/ShowD...age=&Type=Html


    well i guess that using Image:CR::Tesseract could be interesting! I think i parse this with tesseract! ( Image:CR::Tesseract - search.cpan.org )

    PHP Code:
        use Image::OCR::Tesseract 'get_ocr';

        
    my $image './hi.jpg';

        
    my $text get_ocr($image); 
    what do you think!?

  2. #2
    Join Date
    Oct 2008
    Location
    near Munich
    Posts
    507

    Default Re: OCR-Enginge Tesseract: how to automate text recognition on a large ammount of files

    I would write a small bash script.
    Shouldn't be more then 3 oder 4 lines if you have all images in one folder.

  3. #3

    Default Re: OCR-Enginge Tesseract: how to automate text recognition on a large ammount of files

    hello Fruchtratte

    many thanks for the quick reply. Indeed i have all files in a folder. Tesseract is supposed to be one of the three most powerful OCR-engines. I am a bit unfamiliar with TA. But I try to write the script.

    BTW - which one to take - the google ocr tesseract or the Perl one ( Image:CR::Tesseract - search.cpan.org ).

    Note: The google-one should fit into OpenSuse 11.3 with ease - at least i guess so!

    love to hear from you.
    DB1

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •