
Mining information from geoscience documents presents numerous challenges due to the specialized terminology and intricate relationships inherent to the field. Traditional language models often struggle to efficiently retrieve and interpret this complex information. Large Language Models (LLMs) like GPT-4 have shown significant capabilities in contextual comprehension but face limitations in specialized domains such as geosciences. Recent advancements, such as Retrieval-Augmented Generation (RAG) and its extension GraphRAG, aim to improve the quality of responses by integrating external knowledge bases and graph-based structures that capture complex entity relationships.
Our objective is to develop a robust chatbot that enhances the retrieval and use of geoscientific information. This involves integrating cutting-edge RAG techniques, graph-enhanced retrieval, and agentic architectures (based on the ReAct framework) to address challenges in geoscience data mining. The ultimate goal is to improve LLM performances in processing specialized geoscientific texts.
To achieve this, a set of progressive enhancements was implemented within an agentic architecture :
Classic RAG : Using a baseline approach that combines LLMs with external knowledge bases.
Reranking Mechanism: Refining the relevance of retrieved documents to improve response quality.
GraphRAG : Incorporating graph-based retrieval systems to capture complex relationships in geoscientific texts.
The chatbot leverages the ReAct framework for dynamic interaction and reasoning.
The solution was tested using reports from the PilotStrategy project (Horizon 2020), focusing on CO₂ storage and geological studies.
The chatbot was evaluated across three configurations: Classic RAG, RAG with Reranking, and GraphRAG. The performance was benchmarked using a comprehensive set of technical questions. The results showed that the GPT-4o with GraphRAG configuration significantly outperformed the others, achieving an accuracy score of 85/100 — 17% higher than the reranking configuration. The superior performance was attributed to the model’s ability to understand and process complex relationships, extract precise information, and interpret technical terminology effectively.
Future work will focus on :
Testing GraphRAG with fine-tuned geoscience-specific LLMs.
Exploring the use of expert-generated graphs to further enhance retrieval.
Extending the methodology to other technical domains to validate its generalizability and scalability.