基于曲线拟合和动态阈值的海洋数据异常检测方法

宋巍; 丁水鑫; 董明媚; 岳心阳; 杨扬; 张文博

doi:10.12362/j.issn.1671-6647.20240201001

基于曲线拟合和动态阈值的海洋数据异常检测方法

An Anomaly Detection Method for Ocean Data Based on Curve Fitting and Dynamic Thresholding

摘要

摘要: 针对当前海洋数据质量控制方法的准确率不足和未充分考虑要素间关联的局限性等问题，提出了一种基于曲线拟合和动态阈值的海洋数据异常检测方法，用于提升对海温剖面数据的异常数据检出能力。该方法融合了密度聚类算法、曲线拟合技术和自适应阈值调整策略。首先，运用密度聚类算法对海温数据进行分割，筛选出最大簇集，该集群被假设为近似正常数据集。随后，利用曲线拟合技术对最大簇中的海温与水深数据进行建模，生成相应的曲线函数，进而计算原始数据与预测值之间的残差。最后，计算不同水深下的最大温差，并建立温差与水深的关系模型，用于动态调整判别阈值，从而实现对异常数据点的精确检测。实验结果表明，相比目前的业务化质控方法和机器学习技术，本文提出的方法在西太平洋3个区域的定点海温剖面观测数据上均表现出显著优势，F1分数高达99.53%。该方法的应用不仅可提高海温数据异常检测的精确度，也可为海洋科学研究提供有力的技术支撑。

Abstract: To address the limitations of current ocean data quality control methods in terms of accuracy and their failure to fully consider the interdependencies among data factors, this paper proposes an ocean data anomaly detection method based on curve fitting and dynamic thresholding. This method enhances anomaly detection capabilities in ocean temperature profile data by integrating density clustering algorithms, curve fitting techniques, and an adaptive threshold adjustment strategy. Firstly, the ocean temperature data is segmented using a density clustering algorithm, and the largest cluster is selected, assumed to be an approximate dataset of normal data. Secondly, a model for ocean temperature and depth data within this largest cluster is created using curve fitting techniques, utilizing corresponding curve functions. The residuals between the original data and the predicted values are subsequently calculated. Finally, the maximum temperature differences at different depths are calculated, and a relationship model between temperature difference and depth is established to dynamically adjust the discrimination threshold. This approach enables precise detection of anomalous data points. Experimental results demonstrate that, compared to current operational quality control methods and machine learning techniques, the proposed method shows significant advantages in detecting anomalies in ocean temperature profile observations across 3 regions in the West Pacific, achieving an F1 score of up to 99.53%. The application of this method not only improves the accuracy of anomaly detection in ocean temperature data but also provides robust technical backing for marine scientific research.

HTML全文

参考文献(25)

施引文献

资源附件(0)