Data crunch toolbox - Assisted screening of document collections

Portfolio categories
Assisted screening of document collections image


Browsing massive and imperfectly sorted collections of files and documents to retrieve key information in a more efficient way is an increasingly common need in the industry. In many occasions, these datasets have been accumulated during decades and quite messily capitalized. On the other hand, standard applications for documentary management suffer from a lack of interactivity and flexibility, usually leading to complex archiving procedures and uneasy knowledge exploration.


To address this problem of data management, we implemented 2 distinct workflows: the first one is an integrated framework using ELK stack with a database / indexation technique for quick searches / dashboards and views. The second one is a Python Toolbox containing only the developed python tools.

We focus on four different goals: (1) automate the screening of large accumulations of files to generate an overview of the available knowledge, (2) accelerate the categorization of geoscience documents in different thematic classes, (3) assist in the association of the available reports with the corresponding wells in mature fields studies and regional syntheses, and (4) facilitate the extraction of images of interest and related information from these reports.

With our solution, we achieved huge efficiency gains unlocked on knowledge mining in massive document collections. The flexible Python toolbox has been validated on more than 50 000 documents. In addition, the Integrated solution for visual exploration has been validated for selected subsets (of more than 500 documents).

Our technology can be easily and quickly customized. It is possible to adjust current workflows with specific new keywords, and new matching criteria. Furthermore, new workflows can also be created with different document classes or with an extraction of other elements of interest.

Join TELLUS Share and...

TELLUS Share logo
  1. Access detailed technical information on the TELLUS TOOLS prototypes, benefit from live demos and open discussions
  2. Listen, drive and follow IFPEN initiatives on the digital transformation of geosciences
  3. Receive quarterly newsletters for worldwide scientific intelligence on this fast-paced field
  4. ...

Check out all benefits of TELLUS Share  membership