During their activities, companies often accumulate very large amounts of documents stored in multiple folders. Consequently, it may quickly become very tedious to find the document relevant for a specific need or to get an exhaustive overview of the information capitalized.
Thus this project aims at facilitating and accelerating the screening of large documentary bases, by combining automated categorization and geolocation of their elements, interactive visualization and programmable updates.
Our prototype solution is a web-based application composed of a file system crawler (FsCrawler), an indexation database (ElasticSearch) and a customizable dashboard (Kibana). Besides the documents metamodel is enriched through a ElasticSearch python client.
The resulting tool, named File Screener, appears an an efficient and flexible way to live-monitor and query document storage systems. Dedicated ingestion pipelines can be effortlessly adapted to integrate geoscience-related and company-based knowledge. The prototype was successfully applied on file aggregates from exploration studies in Canada and Australia, and easily adjusted to the needs of an oil& gas partner.