How to best scan a book into JPEG?

Joined
Oct 29, 2003
Messages
945
I have the Canon CanoScan 9000F scanner. After thinking and debating whether to scan into JPEG or PDF first, I decided that I'm going to scan into JPEG first. Now I'm trying to figure out what are the optimal settings to use. I'm using the Canon MP Navigator EX software to scan. I've selected the magazine document type even though it's a book, but there are color graphics and images. I will avoid using Canon's OCR feature. Would you guys recommend that I let the software auto detect the document size? Ultimately I will OCR the JPEGs. Should I scan at 300dpi or higher? I've already tried out a few scans at 300dpi and the images seem really sharp, but I don't know if I should go higher or not if I want to OCR them with both OneNote or Acrobat X Professional. I think I read somewhere that the higher the dpi the better the OCR quality will be. Comments?

I've checked off the descreen setting to reduce moire, unsharp mask sharpening, reduce show-through, correct slanted document, and even the detect the orientation of text documents and rotate images options.

I'm also curious if there are other software that's worth using with my CanoScan 9000F? Like I said, I want to scan some of the pages of the book into JPEG first. I'm not sure if there are other software that's compatible with the CanoScan 9000F and produce higher quality images than the included software.

At some point, I would like to create a PDF file. I do have Adobe Acrobat X Professional. How would I create a PDF document, import the JPEGs, and then OCR them? If the OCR job isn't good, is there a way to manually assist the software and correct the mistakes?

Is it even possible to use Acrobat X with my scanner and scan into other formats beside PDF and utilize Acrobat's OCR technology?

Thanks.
 
What is OCR?

If you like the quality at 300, keep it. Make test at more and see.
Than use a batch converter to jpeg and test the quality first.

Than make that pdf.

note, I'm not a pro on this.
 
What is OCR?

If you like the quality at 300, keep it. Make test at more and see.
Than use a batch converter to jpeg and test the quality first.

Than make that pdf.

note, I'm not a pro on this.

OCR = Optical Character Recognition. (Google??)
He wants to convert it to text and then save in a PDF, not leave it as an image.

Op, try getting a result, come back with the issues.
 
First link on Google.ca is Wikipedia.
http://en.wikipedia.org/wiki/OCR
4th option down:
"Optical character recognition, conversion of images of text into characters."


A search for "ocr scanner" gives tons of info.
 
Do you cut off the bindings and scan page by page? or do it manually each page.. sounds tedious
 
If the bindings can be removed there are auto scanners.
Otherwise you have to do whatever is necessary to get the page flat and lined up.
 
Acrobat can OCR the text for you, just make sure the pages are clean. If you have a auto-loading scanner, it can pull the pages in for you.
 
Acrobat OCR is not that accurate. With a mix of text and technical diagrams, it gave me gobbly gook -screwing up formatting. Abby FineReader is better.
 
http://www.pcworld.com/article/2684...dows-phone-now-makes-editable-word-files.html

Unfortunately I don't have a Windows Phone device, but it seems like Office Lens is exactly what I'm looking for. Fortunately what I intend to scan are all computer generated text. No handwriting so I expect OCR accuracy to be high with the best software.

Basically what I'm looking to do is what Office Lens is doing as described in the article linked above. I know that OCR software will allow me to copy and paste the text on to another program afterwards. Are there any OCR software that will also allow me to copy and paste any graphics or art to a Word Document? I'm amazed that Office Lens will also try to recreate the arts and graphics on to a Word Document. Of course one's mileage may vary.
 
Back
Top