Enhancing geoscience documents mining with llm through graphrag integration and agentic architecture

Mining information from geoscience documents presents numerous challenges due to the specialized terminology and intricate relationships inherent to the field. Traditional language models often struggle to efficiently retrieve and interpret this complex information. Large Language Models (LLMs) like GPT-4 have shown significant capabilities in contextual comprehension but face limitations in specialized domains such as geosciences. Recent advancements, such as Retrieval-Augmented Generation (RAG) and its extension GraphRAG, aim to improve the quality of responses by integrating external knowledge bases and graph-based structures that capture complex entity relationships.

Our objective is to develop a robust chatbot that enhances the retrieval and use of geoscientific information. This involves integrating cutting-edge RAG techniques, graph-enhanced retrieval, and agentic architectures (based on the ReAct framework) to address challenges in geoscience data mining. The ultimate goal is to improve LLM performances in processing specialized geoscientific texts.

To achieve this, a set of progressive enhancements was implemented within an agentic architecture :
Classic RAG : Using a baseline approach that combines LLMs with external knowledge bases.
Reranking Mechanism: Refining the relevance of retrieved documents to improve response quality.
GraphRAG : Incorporating graph-based retrieval systems to capture complex relationships in geoscientific texts.
The chatbot leverages the ReAct framework for dynamic interaction and reasoning.
The solution was tested using reports from the PilotStrategy project (Horizon 2020), focusing on CO₂ storage and geological studies.

The chatbot was evaluated across three configurations: Classic RAG, RAG with Reranking, and GraphRAG. The performance was benchmarked using a comprehensive set of technical questions. The results showed that the GPT-4o with GraphRAG configuration significantly outperformed the others, achieving an accuracy score of 85/100 — 17% higher than the reranking configuration. The superior performance was attributed to the model’s ability to understand and process complex relationships, extract precise information, and interpret technical terminology effectively.

Future work will focus on :
Testing GraphRAG with fine-tuned geoscience-specific LLMs.
Exploring the use of expert-generated graphs to further enhance retrieval.
Extending the methodology to other technical domains to validate its generalizability and scalability.

Key features and benefits

Smarter Retrieval via Knowledge Graphs

This component of the chatbot integrates GraphRAG, a graph-based retrieval mechanism, to capture the intricate relationships and specialized terminology found in geoscientific documents. It leverages a knowledge graph to enhance the contextual understanding of user queries, enabling the retrieval of more accurate and contextually appropriate information. This method significantly enhances the model’s ability to interpret complex interrelationships, typical of geoscientific datasets.

Reasoning and Acting with Agentic AI (ReAct Framework)

Leveraging the ReAct framework, component introduces an agentic architecture that combines reasoning and acting capabilities. The architecture allows the chatbot to dynamically interact with external knowledge sources, enabling real-time data retrieval and effective reasoning for complex queries. This setup ensures a seamless flow from user queries to refined, contextually accurate answers, particularly when handling the multifaceted nature of geoscience topics.

Explore other demo projects related to

Large Language Models

Join TELLUS Share and...

Access detailed technical information on the TELLUS TOOLS prototypes, benefit from live demos and open discussions
Listen, drive and follow IFPEN initiatives on the digital transformation of geosciences
Receive quarterly newsletters for worldwide scientific intelligence on this fast-paced field
...

Check out all benefits of TELLUS Share membership

Contact us

Enhancing geoscience documents mining with llm through graphrag integration and agentic architecture

Key features and benefits

Explore other demo projects related to

Join TELLUS Share and...

The TELLUS ecosystem themes of interest