Artificial intelligence has found its way into our everyday lives and is in the headlines with quite spectacular successes. At the same time, it is rightly emphasised that current AI is, however, invariably limited to clearly defined special tasks. The AI that examines MRI images for breast cancer cannot translate texts. It is therefore also called “weak AI”. In simplified terms, the procedures used today use statistical methods to define imaging patterns – only faster and better than humans are able to do. Not really exciting …

And yet there is one specific event that I remember as “that’s when I learned something from our AI for the first time”. It was some time ago and we had just gone into production with – a service for recognising information from scanned invoices. Our service had failed to recognise the invoice number on an invoice. Although the word “invoice number” followed by an ID was clearly written on the invoice, our service delivered: “no invoice number available”. Where was the error?

Rechnung mit markierten Merkmalen Rechnungsnummer, Rechnungsdatum und Gesamtbetrag

If you look at the illustration, it might be obvious. I, however, had to look several times: The unusual thing about this invoice is that the invoice number consists exclusively of letters! And our service had not seen anything like that “in its life”. Until then, we had not used a single invoice for learning that contained an invoice number consisting of nothing but letters. A small test confirmed the assumption: If you replace even one letter of the invoice number with a number, then our service delivers the changed invoice number correctly, and with a probability of 99%.

This was remarkable and instructive in many ways. First of all, this behaviour demonstrated that not only is the identifier “invoice number” used as a criterion in our AI model, but that there are even situations in which the model nevertheless rejects the ID. And this example also shows how important it is to find a representative sample for training the models.

But the next question was whether we should teach our model that at all. We already knew that it was extremely rare. But is this even a valid notation? The assessment was complicated by the fact that this invoice was obviously a fake invoice. Our research revealed that only uniqueness is stipulated for the invoice number, but this can also be achieved with letters.

In the meantime, we know that there are invoice numbers at Amazon that consist exclusively of letters. So, this really does happen in reality and our service has also learned this in the meantime.

I remember this as an exciting experience. At first, I simply thought it was a mistake, and a serious one at that – to finally realise that our model had actually reacted correctly – and had given us a lot to think about. We had learned something!

BLU DELTA is a product for the automated capture of financial documents. With BLU DELTA, partners, but also finance departments, accountants and tax advisors of our customers can directly relieve their employees of the time-consuming and mostly manual capture of documents by using BLU DELTA AI and Cloud.

BLU DELTA is a product of Blumatix Intelligence GmbH, which advises and supports companies in the field of artificial intelligence and software development.

Author: Hans-Peter Haberlandner is a serial entrepreneur, Head of Artificial Intelligence at Blumatix Intelligence GmbH and one of the initiators of the BLU DELTA product.