Our client compiles data that is sourced from the public domain and published by government municipalities for public consumption.
The data is lifted from public PDFs, stored in a central database, and served to users in the form of dashboards. The client needed an entirely different approach to data processing and management, and a solution that would provide best in class processes for PDF processing, data storage and business intelligence.
Our solution for this client entailed two parallel work streams, the first being the conceptualizing of the future state end-to-end process and the other to configure the corresponding technical components this would require, using the fraXses data fabric.
We deployed fraXses on-premises to host and virtualize the final data set, so that content is delivered via the fraXses visualization tool.
While the current end-to-end processes are loosely coupled, the future state of these processes entails tight integration with document management software.
Enrichment and classification will move to machine learning on the Basis Tech NLP engine.
Essentially, business analysts will be able to run these procedures from a data workbench in an automated fashion, which would not be possible without the metadata catalogue approach and automation we have provided for this client.