What's the best way to scan over network into a dedicated box and have OCR on it?

RavinDJ

Supreme [H]ardness
Joined
Apr 9, 2002
Messages
4,456
What's the best way to scan over network into a dedicated box and have OCR on it?

Have you guys ever done it?

I have a new client who called me asking me if I can create some sort of an easy system where they can have a medium-grade scanner ($200-$600 on the scanner) that will scan 5, 10, 20, or up to 50 documents at once, send them over the NETWORK to the server, and then run some sort of a database indexing script or application on the files being scanned so that they can be OCR'ed. That way, they could look up old documents by customer name, number, address, etc. This would be used on invoices, estimates, superbills, EOB's, etc.

I'd like this to be automatic, since THEY would like this to be automatic.

Thanks!!!
 
Well...

It wouldn't be too hard to set up the idea, actually. It comes down to software limitations for actual implementation.

You have the scanner scan to a networked drop-box on the network, it's just having (from the drop box) some program process the new data which wrenches the process. From a networking standpoint, I'd assume there's likely a script or a function that might be able to do this, but it's terribly complex and requires program launches and functions therein executing based on the presence of new data. I'd say human intervention is still necessary at this juncture to make that a reality at all.

Hire a guy to do the OCR but the scanning can at least be partially automated.
 
So here's what I came up with already (let me know what you think)... based on equipment that I have in my office and the research that I've done online.

I have the HP LaserJet M2727 MFP All-In-One in my office...

http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/18972-18972-3328064-12004-3328082-3377075.html

I played around with the scanner feature (which I never needed to do until now). It does allow network scanning to a machine that has the HP software installed. So, all I need to do is have a machine tucked away somewhere and then just go up to the scanner, put in documents into the ADF (Automatic Document Feeder; it'll handle up to 50 pages), and hit SCAN and then select the machine that you want to SCAN TO (you can have multiple machines as multiple destinations). Then, the documents gets scanned into the machine's directory that you have pre-selected.

The tough part is the OCR... I think you are right - you do have to hire someone to do the actual OCR. But, I'm going to do some research on the OCR software by Iris LINK http://www.irislink.com/c2-532-189/OCR-Software---Product-list.aspx

I think it's the ReadIris Pro - the DESKTOP version came with my machine. But, I'll do a 30-day trial of the CORPORATE version. According to http://www.irislink.com/c2-523-189/...-Edition---Ocr-software-for-high-volumes.aspx the Corporate edition has "Automated Document Processing
Readiris Pro 11 Corporate Edition is equipped with professional options allowing automated document processing." The DESKTOP version only has manual (you have to do it yourself).

Also, what DPI should I scan the documents with? 150DPI seems too small for good recognition; 300DPI seems okay, but the highest recognition is at 600DPI. But, EVEN THEN... I scanned a document and it did not recognize ALL of the text. Why is that? Is the technology in OCR still in its early development stages? I thought it's been out for a long time now...

I'll post as soon as I do a trial of the CORPORATE edition...
 
i dunno about the ocr portion, but generaly when one of our clients want something like this we go to a commercial printer/copier/scanner

http://usa.kyoceramita.com/KMAGloba...ucts_printers_details.jsp?pid=17762&cid=10570

something like that, but they never have needed OCR, so with that said i would contact a local dealer or reseller of a unit like that and see how much something like that costs.

you setup a share on the server, Scans, then you program in the users name on the scanner software, so when the user scans the machines has Numbers 0 - whatever with the users name next to it. that scans to Scans\Username and the user gets access to it.

http://www.nuance.com/omnipage/ they make great ocr software, ive used it on like one computer scanner setup and it greats, looks like it can be good for high amounts
 
ftp://ftp.scansoft.com/files/support/manuals/op16guide.pdf check out page 81, that software looks great still, batch manager with folder watching.

so you can get a big printer unit like above which may sell the client if they have shitty copiers and printers. each user gets their own document scan, setup the Batch Manager for each user to watch their scans directory, scan it and rename it to name_document and that would be pimp.

then if users need to share stuff make a public directory that everyone can get to and make a same batch manager.
 
I scanned a document and it did not recognize ALL of the text. Why is that? Is the technology in OCR still in its early development stages? I thought it's been out for a long time now...

Just for fun, count the number of letters on an A4 or 8.5x11 page full of text. Then assume 99.9% accuracy. See how many errors that permits.

Unfortunately, OCR needs exceptional accuracy to get down to even one error per page.
 
the software that comes with hp for the ocr is horrible. that omnipage stuff is the best ive used for OCR but none are 100% accurate.

after reading your posts again, you could probably just get away with one of those scanners like you posted, install it to the server, configure it to scan to a shared folder, and have that Omnipage do the batch manager for folder spying.

how big is the client? if its only a few computers doing that, and some manual work for them to put the pdfs/and ocr'd docs into a client folder would be the easiest.
 
Back
Top