Extract references and amounts from remittance advices

Automating companies with RPA and AI

https://mccminnovations.com - info@mccminnovations.com

Challenges

  • Extract reference numbers and amounts from documents with unlimited formats.
  • Input documents can be PDF, Excel, CSV, Word or TIF files.
  • References and amounts can be localised anywhere in the document.
  • References and amounts do not follow strict format rules.
  • Deal with scanned documents, rotated texts and oclusions.
  • Output structured data as result.

Case 1

Input document

Out[3]:
<matplotlib.image.AxesImage at 0x7f4f513ec2e8>

Please AI, extract text from the input document

Out[3]:
<matplotlib.image.AxesImage at 0x7f0950eae710>

Please AI, find out references and amounts

Out[4]:
<matplotlib.image.AxesImage at 0x7f09516dbda0>

Finally, get structured data as result

Processing file Slides_Resources/sample1.pdf
Execution time: 2.357419729232788
Out[5]:
{'References': [{'page': 0,
   'references': [{'id_ref': '9039954321', 'amount': 468.98},
    {'id_ref': '9039115468', 'amount': 2145.01},
    {'id_ref': '9039365987', 'amount': 721.34}],
   'total_page_amount': 3335.33}],
 'Total amount ref': 3335.33,
 'Total amount max': 3335.33,
 'Total amount keyword': 3335.33,
 'Final Total': 3335.33,
 'Confidence': 100.0}

Case 2

Input document

Out[6]:
<matplotlib.image.AxesImage at 0x7f095164ce48>

Please AI, extract text from the input document

Out[7]:
<matplotlib.image.AxesImage at 0x7f0951636048>

Please AI, find out references and amounts

Out[8]:
<matplotlib.image.AxesImage at 0x7f0951589b70>

Finally, get structured data as result

Processing file Slides_Resources/sample2.pdf
Execution time: 2.5578198432922363
Out[9]:
{'References': [{'page': 0,
   'references': [{'id_ref': '9039569321', 'amount': 474.48},
    {'id_ref': '9039111258', 'amount': 532.08},
    {'id_ref': '9039332845', 'amount': 709.44},
    {'id_ref': '9039326588', 'amount': 550.08},
    {'id_ref': '9039649875', 'amount': 284.4}],
   'total_page_amount': 2550.48}],
 'Total amount ref': 2550.48,
 'Total amount max': 2550.48,
 'Total amount keyword': 2550.48,
 'Final Total': 2550.48,
 'Confidence': 100.0}

Case 3

Input document

Out[10]:
<matplotlib.image.AxesImage at 0x7f095156cdd8>

Please AI, extract text from the input document

Out[11]:
<matplotlib.image.AxesImage at 0x7f09514cf160>

Please AI, find out references and amounts

Out[12]:
<matplotlib.image.AxesImage at 0x7f09514a4ac8>

Finally, get structured data as result

Processing file Slides_Resources/sample3.pdf
Execution time: 2.433074951171875
Out[13]:
{'References': [{'page': 0,
   'references': [{'id_ref': '9039123456', 'amount': 670.98},
    {'id_ref': '9039123457', 'amount': 332.47},
    {'id_ref': '9039123321', 'amount': 679.5},
    {'id_ref': '9039123654', 'amount': 350.23},
    {'id_ref': '9039123568', 'amount': 352.0},
    {'id_ref': '9039312458', 'amount': 349.16},
    {'id_ref': '9039485689', 'amount': 1352.96},
    {'id_ref': '9039411533', 'amount': 335.3}],
   'total_page_amount': 4422.6}],
 'Total amount ref': 4422.6,
 'Total amount max': 4422.6,
 'Total amount keyword': 4422.6,
 'Final Total': 4422.6,
 'Confidence': 100.0}

Automating companies with RPA and AI

https://mccminnovations.com - info@mccminnovations.com