Another Weblog

A golbew gone wrong

SharePoint and Document Scanning II October 24, 2008

Filed under: Uncategorized — kjmkehoe @ 10:34 am
Tags: , ,

Following on from my previous post on this subject, I promised an update once I’d seen the Kofax demo.

Kofax works in a similar manner to the other more advanced solutions I’ve looked at.  I did learn a thing or two from this demo though that I thought I’d mention:
1. To improve the metadata on the document, use a reference database to ensure you’ve got the correct information.  So for example, if the Client Number is a piece of metadata and you have an Access database with all your Client numbers, you can connect to this db and retrieve additional Client information (e.g. name, address or whatever else you want to map from the db)
2. OCR is possible but limited.  For the forms my client wants to scan, all of the information is handwritten.  While the various advanced solutions I’ve looked at all use OCR, there is a lot of work to get it right and it’s only 60-80% correct at best.  So this is not suitable for bulk scanning on the scale I’m looking at (200,000 pages).  The one case in which the accuracy of OCR could be improved is where the data on the document is written in cells or boxex.  OCR has a better chance of determining this.

For bulk scanning, we’re outsourcing this to a 3rd party who will bring scanners & people to get it done.  All we need to do is ensure that the metadata used during the scanning is mapped into SharePoint correctly….but that’s a post for another day.

 

One Response to “SharePoint and Document Scanning II”

  1. chuck jackson Says:

    fortunately for anydoc & our customers, for many years we have been able to generated 98-99.9% accuracy levels by using some very reasonably priced OCR engines. Handprint (ICR) is more difficult, of course, but there are many ways to increase its overall accuracy, as well, including the reference db capability (in item #1 above) that we have used within our solutions for at least 10 years. many of our customers process 10MM to 50+ MM (million) documents per year. if any of them got the results that you believe are attainable, they would have reverted back to key-from-image a long time ago.

    spend some time with anydoc’s solutions, our website, and/or our webinars. Kofax is not the gold standard of structured and unstructured processing for data and document capture.

    chuck jackson, ceo


Leave a Reply