Cloud Based Targeted OCR Service

Cloud Based Targeted OCR Service

Cloud Based Targeted OCR Service

Founded in 1987 Blutex manages millions of records and provides office supplies throughout the UK. Initially supplying local businesses, Blutex now service National and Multi-National companies whilst still maintaining the relationships with local businesses.

A key part of their business is digital imaging/document scanning to store physical documents. This might include older documents that need to be digitised, or newer physical contracts that then need to be converted into digital assets. Whether it’s the conversion of outmoded printed collateral or the creation of new material, their digital imaging solutions can scan, convert or store any digital documentation in to a secure digital format that is accessible anywhere in the world.

As part of their offering Blutex wanted to offer a more automated, reliable and efficient OCR service that could extract key bits of data, index the images and allow customers to search the data and view the documents on demand.

Blutex contacted Impact and together we came up with a commercial solution, the key features were;

  • A cloud based solution that can be accessed anywhere at any time, via a browser, without the need to install any software locally.
  • A secure SFTP location in which to bulk upload scanned documents.
  • A template facility so different types of documents can be scanned and key data extracted e.g. Invoice Date, Delivery Date, Account Reference.
  • A secure web interface so that Blutex and it's customers can access the system.

How does it work?

The key element to the solution lies in a clever bit of software that we wrote running on the cloud.  Each type of document that is going to be uploaded e.g. Delivery Note, Invoice, Medical Record, Payment Plan, needs a template creating so the system knows what data to extract and where it might be.

The template builder allows you to define where data is, what format it is expected in e.g. Date, Numbers, Alpha, Special characters.  You can also specify if you want it to automatically change a 'O' (oh) into a '0' (zero) if it's a number field or change an I (eye) into a 1 (one) if it's a number field.

Another key feature is the ability to specify common interpretation errors, so if the system always thinks that the word THREE is spelt THR33, maybe due to a poor font, poor scanning or poor document quality you can configure a set of replacement rules so that once it has captured some text it will convert the word THR33 to THREE then run any other rules.  This can also be done post processing, so if a lot of errors have been reported, the documents can be batched to be reprocesed using the new rules.

Building a template for your document

 

Upload Your Files

Once a template has been defined the software runs as a service waiting for documents to appear in one of it's many inboxes.  Any FTP client or browser can be used to do this, it can be 1 or 10,000 files.

 

Upload Files To The Inbox

As files are uploaded they get identified e.g. the text at x,y says Invoice, then the system targets the document trying to find the requested data items.

The system has a clever algorithym that can compensate for skewed or slipped scanning, it will search around the area looking for a suitable match.

As it locates each bit of data, it gets stored in a database along with the relevant image location.  The document is then moved from the inbox to the outbox, the service will keep on running 24/7 until all inboxes are clear.

Search and Browse Your Data

As soon as the document has been processed it becomes available in the cloud portal for customers to search, browse, view, download or print.

If the system could not extract some data, then this is reflected by the process status, the file can either be re-processed, the template modified and then reprocessed, or re-scanned and uploaded, if the file name stays the same the data is just updated.

Since developing this service the system has processed millions of documents and has saved all stakeholders significant amounts of time.

The system is also capable of storing the documents on other cloud storage bins e.g. Amazon S3

Interested?

If you are interested in learning more about this service or you have a similar requirements please get in touch, we love a chat!

 

Project info

  • Cloud Based Targeted OCR Service
  • Bespoke Software Development, Cloud Portal

Active Andalucia

Active Andalucia
www.active-andalucia.com
Multilingual website for outdoor pursuits center in Spain.

Just wanted to say a huge thanks for the work you have put in to creating our new web site, we are really pleased with the results.

I know we have not been the easiest of clients to work with as we had no clear direction in the beginning and then so many unplanned changes but thank you for your patience and it has paid off as we now have what we think is the best site in the “activity business" it is vibrant, easy to navigate, engaging and informative.  Your advise and guidance through this process for us has been amazing and fingers crossed this is the start to a great new venture for us.

We have a few other projects we are working on and look forward to talking to you about these soon.

Deborah Kelly, Marketing Director