Cloud Based Targeted OCR Service
Founded in 1987 Blutex manages millions of records and provides office supplies throughout the UK. Initially supplying local businesses, Blutex now service National and Multi-National companies whilst still maintaining the relationships with local businesses.
A key part of their business is digital imaging/document scanning to store physical documents. This might include older documents that need to be digitised, or newer physical contracts that then need to be converted into digital assets. Whether it’s the conversion of outmoded printed collateral or the creation of new material, their digital imaging solutions can scan, convert or store any digital documentation in to a secure digital format that is accessible anywhere in the world.
As part of their offering Blutex wanted to offer a more automated, reliable and efficient OCR service that could extract key bits of data, index the images and allow customers to search the data and view the documents on demand.
Blutex contacted Impact and together we came up with a commercial solution, the key features were;
- A cloud based solution that can be accessed anywhere at any time, via a browser, without the need to install any software locally.
- A secure SFTP location in which to bulk upload scanned documents.
- A template facility so different types of documents can be scanned and key data extracted e.g. Invoice Date, Delivery Date, Account Reference.
- A secure web interface so that Blutex and it's customers can access the system.
How does it work?
The key element to the solution lies in a clever bit of software that we wrote running on the cloud. Each type of document that is going to be uploaded e.g. Delivery Note, Invoice, Medical Record, Payment Plan, needs a template creating so the system knows what data to extract and where it might be.
The template builder allows you to define where data is, what format it is expected in e.g. Date, Numbers, Alpha, Special characters. You can also specify if you want it to automatically change a 'O' (oh) into a '0' (zero) if it's a number field or change an I (eye) into a 1 (one) if it's a number field.
Another key feature is the ability to specify common interpretation errors, so if the system always thinks that the word THREE is spelt THR33, maybe due to a poor font, poor scanning or poor document quality you can configure a set of replacement rules so that once it has captured some text it will convert the word THR33 to THREE then run any other rules. This can also be done post processing, so if a lot of errors have been reported, the documents can be batched to be reprocesed using the new rules.
Building a template for your document
Upload Your Files
Once a template has been defined the software runs as a service waiting for documents to appear in one of it's many inboxes. Any FTP client or browser can be used to do this, it can be 1 or 10,000 files.
As files are uploaded they get identified e.g. the text at x,y says Invoice, then the system targets the document trying to find the requested data items.
The system has a clever algorithym that can compensate for skewed or slipped scanning, it will search around the area looking for a suitable match.
As it locates each bit of data, it gets stored in a database along with the relevant image location. The document is then moved from the inbox to the outbox, the service will keep on running 24/7 until all inboxes are clear.
Search and Browse Your Data
As soon as the document has been processed it becomes available in the cloud portal for customers to search, browse, view, download or print.
If the system could not extract some data, then this is reflected by the process status, the file can either be re-processed, the template modified and then reprocessed, or re-scanned and uploaded, if the file name stays the same the data is just updated.
Since developing this service the system has processed millions of documents and has saved all stakeholders significant amounts of time.
The system is also capable of storing the documents on other cloud storage bins e.g. Amazon S3
If you are interested in learning more about this service or you have a similar requirements please get in touch, we love a chat!
- Cloud Based Targeted OCR Service
- Bespoke Software Development, Cloud Portal
Palacio Consulting Ltd
Palacio Consulting Ltd
I have worked with Impact Technology for a number of years now on at least 3 of my projects. I have found them to be very professional with great attention to detail, enhancing the brief with their technical knowhow & experience. I have found them to be very focussed to the customer's needs and can recommend the Impact Team to deliver your project.
Mark Palacio, Managing Director