Assistive technology

Assistive technology help

I need to take .doc(word) to like file to( .pdf)

put the (.pdf) trough optical character recognition to text file(.txt) keeping pages and paragraphs.

File(.txt) to digital audio file split are paged by chapter/page/paragraphs

os suse 11.1

I got the .doc to go to pdf
so how do you get a ocr program to pdf to text

OpenOffice>Export to PDF - Kooka (or other scanner program using SANE) - GOCR - OpenOffice to tidy up text file - not sure how you will do the last bit - depends on what is required.

yes word is text!
but some of the word in the files are in a pic box!
that is where I am having troubles!

kc0hwa wrote:

>
> yes word is text!
> but some of the word in the files are in a pic box!
> that is where I am having troubles!

AH! I was wondering why you didn’t just save the pdf as text directly but
the presence of text as part of an image makes for a whole new ball game if
you want to extract the text from an image. The only solutions I can think
of all involve an ocr of the source as an image, not as a text doc so this
will be an interesting answer!


Will Honea

1.) how do you grid like kurzweil in linux

2.) I can not bring a pdf in to ocr! kooka

3.) In my look up on this, I saw! IRS is know use insted of OCR any one know any thing on this!

re 2. AFAIK you can only scan single pages in Linux, not bulk. So you either need to print out the PDF and scan each page separately or convert the PDF to single images.

In practice, to extract the text from a PDF I would never go this route; I would simply extract the text directly as a text file and any images as separate images and then reconstitute them.

There is now the option in OpenOffice of adding the Sun PDF extension which allows you to load a PDF in Draw and create an ODT file directly from it.

So one reason why you may be having difficulties is that there is no longer any reason for most people to take the route you are taking.

how to take pdf to a picture(ex jpg png exex…)
Im coping what text that are in the documents to a txt and just taking the pic in the file a moving to pdf
now pdf to txt!
so pdf–> (pic) → ocr or IRS —> txt

The simplest way of creating an image from a PDF is to open it in a viewer and take a screenshot. How sharp this will be for using with an OCR will depend on your screen resolution. Alternatively, print it out and then scan it as an image and not as text.