
Clustering well-log data into electrofacies using machine learning is a well established problem in petrophysics and data science, gaining relevance in the last decade. However, many publications mention challenges related to the dataset, cluster optimization and management of spatial autocorrelation.
Our objective is to propose an automated workflow responding to these shortcomings with a purely unsupervised approach, cluster-number optimization, and outlier removal based on spatial autocorrelation.
We follow a five-step process.
1. Preprocessing: The well-log data are rescaled between 0 and 1, and, if too large to handle, reduced to key features using principal component analysis (PCA).
2. Clustering: A Gaussian Mixture Model (GMM) is applied to the preprocessed data varying the number of clusters, each giving an individual clustering result.
3. Internal validation: The clustering result is assessed using silhouette analysis, which furnishes performance scores for each clustering. The optimal number of clusters is selected based on the silhouette score.
4. Outlier removal: Outliers are iteratively removed by relaxation, relabeling datapoints which are likely misclassified based on their spatial neighbors' classifications.
5. External validation: If a ground truth is available, the relaxed classification is evaluated using standard clustering-distance measures, such as purity and Rand scores.
We notably tested our methodology on the public dataset from well Lauda-1 located in the Northern Carnarvon Basin in Australia, which includes a ground truth based on an expert classification. Our methodology discerned 6 classes as optimal and shows an improvement in external validation scores of about 5 % compared to a classical unrelaxed Kmeans algorithm.
To validate our methodology further, more datasets (particularly accompanied by ground-truth classifications) will be tested, additional clustering methods will be explored and preprocessing denoising methods will be evaluated to improve final results. Moreover, a possible improvement of the approach could be to include horizontal spatial autocorrelation in addition to autocorrelation in depth, which would entail the extension of the relaxation step to several wells.