Abstract:
Following the development of marine ecological environment research, it requires more observations to examine the spatial and temporal changes in the structure of plankton communities. However, there are some limitations of current artificial intelligence-based classification algorithms for plankton, such as multi-layer convolutional neural networks (CNN), in capturing the morphological diversity of plankton, and this results in a relatively low efficiency and accuracy. This paper proposes an innovative algorithm based on the lightweight visual transformation model (Mobile Vision Transformer, MobileViT) to address this challenge. It replaces standard convolutions with depthwise separable convolutions to reduce the number of parameters, and this method effectively decreases model complexity. At the same time, it utilizes a local perception attention mechanism to lower the complexity of attention calculations, and thereby enhances the model’s computational efficiency. Individual images of plankton collected from the South Pacific Ocean in 2018 are utilized to construct a dataset encompassing nine plankton categories. The results show that the base model achieves a weighted average accuracy of 92.04% when applied to a manually classified validation set. A step-by-step probability filter at the tail of the model is applied to eliminate misclassifications to further improve classification performance, and it results in an increased weighted average precision of 96.93%. Additionally, we test the improved model using individual images of plankton from another profile in the same oceanic region as the test set, and the model achieves a top1 accuracy of 93.77% on the test set. This paper provides a more efficient and accurate method for marine plankton classification.