Rail track surface defect detection algorithm integrating MobileNetv3 and Transformer

Shanping Ning; Wenxing Wu; Yunlin Wu; Mingzhen Jiang

doi:10.1117/12.3038836

21 October 2024 Rail track surface defect detection algorithm integrating MobileNetv3 and Transformer

Shanping Ning, Wenxing Wu, Yunlin Wu, Mingzhen Jiang

Author Affiliations +

Proceedings Volume 13401, International Conference on Automation and Intelligent Technology (ICAIT 2024); 1340108 (2024) https://doi.org/10.1117/12.3038836
Event: 2024 International Conference on Automation and Intelligent Technology (ICAIT 2024), 2024, Wuhan, China

Abstract

Surface damage on railway rails is a critical factor affecting train safety. To address the low efficiency and poor precision in detecting small defects like cracks and rust, a rail damage detection algorithm combining MobileNetv3 and Transformer is proposed. This algorithm embeds spatial coordinate information into channel attention and integrates the CA-Benck module into MobileNetv3 to enhance feature extraction and generalization. Then, Utilizing MobileNetv3, CA-Bneck, and the Transformer encoding module, we crafted a streamlined backbone network dubbed MobileNetV3-CATr. Finally, a BiFPN-Lite module is added to capture more defect features without increasing complexity. Through the YOLO Head, it outputs rail damage information. Experiments on our rail dataset show that our model achieves a mAP of 93.8% at 19.5 FPS, outperforming YOLOv5 by 6.5%, enabling high-precision detection of rail surface damage.

1. INTRODUCTION

Railways are key to China’s development and defense. By 2035, the network will expand to 200,000 km[1]. More trains mean more wear on rails, affecting safety and comfort. Accurate detection of rail damage is crucial for safety and reducing accidents[2].

Scholars globally have explored magnetic flux leakage, sensors, and electromagnetic detection for track damage, but high hardware costs limit their use. With deep learning’s rise in computer vision, new object detection frameworks are revolutionizing this field. Algorithms fall into two categories: two-stage (e.g., Faster R-CNN) and one-stage (e.g., YOLO), offering novel approaches for railway safety. Ye Yanfei et al. [3] enhanced YOLOv5 with attention mechanisms, hitting 76.9% mAP. He Qing et al. [4] improved YOLOv3 for B-mode ultrasound, incorporating SE attention and extra layers, achieving 92.3% accuracy. Yang Jiajia et al. [5] introduced a lightweight YOLOX-based method with MobileNetv3, boosting detection to 92.17%, accuracy to 90.92%, speed to 115 fps, and reducing model size by 80%.

In conclusion, despite advancements in current rail surface defect recognition technology, it still faces issues such as slow detection speed, high costs, and misdetections and missed detections, making it difficult to meet application requirements. This paper innovatively proposes a detection algorithm that combines MobileNetv3 and Transformer, integrating CA-Bneck, Transformer encoding, and BiFPN-Lite modules to construct an efficient feature extraction network. By utilizing the YOLO Head to output detection results, it significantly enhances the accuracy and efficiency of detecting small defects on railway rails.

2. ESTABLISHMENT OF A DETECTION MODEL FOR SURFACE DEFECTS ON RAILWAY RAILS

2.1

Detection model combining MobileNetV3 and Transformer

By integrating the parallel design of MobileNetV3 and Transformer, our model seamlessly fuses local and global features, significantly bolstering its performance. As illustrated in Figure 1, the framework comprises a backbone feature network, SPPF module, and YOLO Head. Specifically, the backbone network, dubbed MobileNetV3-CATr, integrates MobileNetV3, CA-Bneck, and Transformer modules to extract crucial rail defect features. Initially, the raw rail defect image is fed into MobileNetV3-CATr for feature extraction. Next, the SPPF module leverages varying max pooling kernels to process the defect feature maps, further enhancing feature fusion and addressing multi-scale defects. Finally, a BiFPN-Lite module is employed, and the YOLO Head precisely outputs defect details like category, location, and confidence level.

Figure 1.

Overall Diagram of the Detection Model

2.2

MobileNetV3-CATr

The MobileNetV3-CATr integrates MobileNetV3, the CA-Benck module, and the Transformer encoding module. By introducing the CA-Benck module to replace the original SE module, MobileNetV3 can more precisely focus on the damaged areas on the rail surface during feature extraction, significantly improving the accuracy of surface defect detection. Furthermore, to enhance the feature extraction capability, a Transformer encoding module is added to the end of MobileNetV3 to strengthen the capture of global information, further improving the model’s ability to recognize small defect features.

2.2.1

MobileNetV3-CA

MobileNetv3’s Bneck module with SE enhances performance but lacks positional info. CA-Bneck replaces SE with CA, considering channel & positional info. This reduces info loss, captures spatial dependencies, and precisely locates defects, improving damage detection accuracy[6].

Figure 2 displays the CA module integrating horizontal & vertical features to generate direction-aware maps. By decoupling pooling into horizontal & vertical, it encodes features independently, creating a direction-sensitive map preserving positional data. This mimics X AVG and Y AVG pooling in the CA Block. Spatial dimensions are merged, compressed via 1×1 convolutions, and fused using h-sigmoid activation, ensuring output maps retain directional sensitivity and match input channels. The module’s output is:

Figure 2.

Structure of CA-Bneck

In the equation, represents the size of the input feature map, , represent the weights for the two spatial directions respectively.

2.2.2

MobileNetv3-CA-Transformer

The images of surface defects on railway rails often suffer from sparsity issues, leading to poor detection performance for subtle defects with high similarity. By introducing the Transformer module, we enhance the model’s ability to capture global information while paying closer attention to the defect areas on the rail surface. This allows the model to further learn from regions with weaker defect features, thereby reducing the occurrence of misdetections and missed detections of surface defects on railway rails.

Figure 3.

Structure Diagram of the Transformer Module

In the context of rail surface defect detection, the self-attention and multi-head attention mechanisms of the Transformer ensure that the model focuses on defect information. By integrating the Transformer module at the end of MobileNetv3-CA, we leverage low-resolution feature maps to reduce computational and storage costs. This not only improves the mean average precision (mAP) but also reduces the network size, enabling it to excel in detecting dense and large defects, effectively enhancing the efficiency of rail surface defect detection. As shown in Figure 4, the Transformer module comprises two sublayers: a multi-head attention layer and a multi-layer perceptron (MLP). The former enhances attention to the current pixel and its contextual semantics, while the latter serves as a fully connected layer providing the functionality of a feedforward neural network. The module also includes LayerNorm and Dropout layers to enhance network integration and prevent overfitting[7].

Figure 4.

Diagram of the BiFPN-Lite Network Structure

To utilize the Transformer module, this paper flattens backbone feature maps and applies linear positional encoding. Multi-Head Self-Attention (MHSA) dynamically aggregates information via Q, K, V interactions[8], concatenating multiple heads as:

In the equation: Concat represents a tensor-level operation; X^out is a linear transformation matrix; H_m indicates that the result of the m-th self-attention is obtained through the Scaled dot-product attention in MHSA.

Finally, after the MHSA operation, two additional linear transformations are performed to obtain a feature map that focuses more on defects.

2.3

BiFPN-Lite module

The BiFPN-Lite module leverages top-down and bottom-up pathways to capture high-level semantic and low-level location information, enhancing multi-scale defect detection on rail surfaces. As seen in Figure 4, the SPPF module trains a 13×13 feature map with multi-scale techniques, resulting in 52×52 and 26×26 scales. BiFPN-Lite equally weights and normalizes features of these sizes during concatenation. The fused 13×13, 26×26, and 52×52 feature layers are then fed into the YOLO Head. These multi-scale detection heads decode to produce precise prediction boxes.

3. EXPERIMENTAL RESULTS AND ANALYSIS

3.1

Data collection and experimental environment

An experimental site selected a 10KM railway track segment and used an intelligent inspection vehicle to capture numerous high-quality images of track surface defects. Figure 5 displays an original image of the defects.

Figure 5.

Original Image of Rail Surface Defects

The hardware configuration used in this experiment comprises a 13th Gen Intel(R) Core(TM) i7-13700 processor, an NVIDIA RTX A4000 graphics card, and 16GB of RAM. The software environment consists of Windows 11 operating system, Python 3.8 as the programming language, CUDA version 13.0, and the deep learning framework is Pytorch 1.12.1. The training iteration count is set to 200, with image Mosaic augmentation employed and its coefficient set to 0.5. Distinct from image classification, surface defect detection requires not only predicting the correct type of target but also locating its position information. The following metrics are used to evaluate the performance of defect detection:

(1) Mean Average Precision (mAP) is the primary metric used in this paper to evaluate the model’s performance. It is the average of Average Precisions (APs).
(2) Recall is used to measure the model’s ability to correctly detect targets.
(3) Precision is used to measure the accuracy of the model’s predictions.

3.2

Analysis of experimental results

To validate the detection performance of the proposed method for rail surface defects, comparative experiments were conducted on a self-constructed dataset with the improved schemes outlined in Table 1. According to the experimental results in Table 1, the introduction of the Transformer module effectively extracts contextual defect information from the rail surface, reducing the occurrence of misdetections and missed detections for some defects and enhancing the overall detection performance of the model. The CA-Bneck module enables the model to focus on both spatial and channel information of defects, allowing the backbone feature extraction network to better fit the correlation between spatial and channel information of defects, thereby facilitating the extraction of feature information for rail surface defects. The mAP value of the proposed model in this paper increased by 9.5% compared to the original model, indicating that MobileNetV3-CATr and BiFPN-Lite have a positive effect on the extraction of defect features on the rail surface. This demonstrates the positive interaction among various modules and their ability to improve detection accuracy.

Table 1.

Comparative Experiments Among Different Models

Scheme	CA	Transformer	BiFPN-Lit	mAP/%	AP/%	Recall/%
1	×	×	×	84.3	44.3	88.9
2	√	×	×	86.8	45.7	89.8
3	×	√	×	89.1	51.6	90.1
4	×	×	√	86.7	49.4	86.7
5	√	√	×	90.2	58.4	91.2
6	√	×	√	91.5	54.8	92.7
7	√	√	√	93.8	60.1	95.4

To further emphasize the effectiveness of our model, we’ve visualized select experimental results. As depicted in Table 2, our proposed algorithm rarely misses defects. Unlike the original model, which is limited to detecting large, clear targets, our enhanced algorithm adeptly detects even blurry and dark defects. Moreover, in cases where two defects overlap, our algorithm can distinguish and precisely locate the two nearby targets, including smaller ones within larger ones, thereby greatly enhancing defect detection accuracy and minimizing missed detections.

Table 2.

Visualization Results of Different Schemes

3.3

Experimental comparison of different algorithms

To validate the performance of the proposed method for railway rail surface defect detection, this paper conducts a comparative analysis with other detection methods. As shown in Table 3, the proposed algorithm demonstrates superior performance overall. Compared to mainstream one-stage object detection algorithms such as YOLOv3, YOLOv4, and YOLOv5, it requires the least number of parameters (Params) and achieves higher accuracy. In terms of detection speed, the proposed algorithm achieves 20.1 FPS on devices with lower configurations, which can basically meet the requirements of online real-time detection.

Table 3.

Comparison Results of Different Algorithms

Method	Backbone Network	Params/MB	FPS	mAP/%
SSD	Vgg16	71.5	61.7	88.6
YOLOv3	Darknet53	82.7	59.6	89.1
YOLOv4	CSPDarknet53	72.6	52.4	89.7
YOLOv5	CSPBottleneck+Focus	70.2	51.3	90.2
MobileNetV1-YOLOv5	MobileNetV1	39.1	46.1	82.9
MobileNetV2-YOLOv5	MobileNetV2	38.9	45.5	83.7
MobileNetV3-YOLOv5	MobileNetV3	40.9	42.3	84.3
The algorithm in this paper	MobileNetV3-CATr	65.3	19.5	93.8

4. CONCLUSION

This paper proposes an advanced defect detection algorithm for railway rail surfaces, addressing issues of low accuracy and efficiency. The algorithm integrates MobileNetV3 and Transformer to enhance performance. A new backbone network, MobileNetV3-CATr, reduces complexity while maintaining accuracy, improving detection speed. Incorporating the CA module enhances the network’s ability to extract rail surface defect features. Furthermore, the Transformer module extracts contextual semantic information, boosting detection accuracy. BIFPN-Lite replaces PANet, efficiently fusing features across scales, enhancing defect detection accuracy. Experimental results on a self-built database show that our algorithm outperforms others in terms of accuracy and speed for railway rail surface defect detection.

ACKNOWLEDGMENTS

Author Contributions: Conceptualization, Shanping Ning.; methodology, Shanping Ning.; validation, Shanping Ning., Wenxing Wu. and Yunlin Wu.; formal analysis, Shanping Ning.; resources, Mingzhen Jiang.; data curation, Shanping Ning.; writing—original draft preparation, Shanping Ning.; writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding: This research was funded by Special funds for scientific and technological innovation strategy in Guangdong Province in 2024(Program No. pdjh2024b573), and University Student Scientific and Technological Innovation Project (Program No. GDCP-ZX-2023-031-N6)

REFERENCES

[1]

Choi J-Y, Han J-M., “Deep Learning (Fast R-CNN)-Based Evaluation of Rail Surface Defects,” Applied Sciences., 14 (5), 1874 (2024). https://doi.org/10.3390/app14051874 Google Scholar

[2]

W. Liu, X. Qing and J. Zhou, “A novel image segmentation algorithm based on visual saliency detection and integrated feature extraction,” in 2016 International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 1 –5 (2016). https://doi.org/10.1109/CESYS.2016.7889899 Google Scholar

[3]

Ye Yanfei, Cheng Li, Hou Xiangyi, “Recognition and Classification of B-Scan Images of Internal Rail Defects Based on Improved YOLO v5,” Foreign Electronic MeasurementTechnology, 42 (12), 70 –76 (2023). Google Scholar

[4]

He Qing, Chen Zhengxing, Wang Qihang, et al., “Research on Recognition of B-Scan Images of Rail Defects Based on Improved YOLO V3. Journal of the China Railway Society,” 44 (12), 82 –88 (2022). Google Scholar

[5]

Yang Jiajia, Xu Guiyang, Bai Tangbo, “Lightweight Rail Surface Defect Detection Algorithm Based on Improved YOLOX. Railway Engineering,” 63 (7), 34 –39 (2023). Google Scholar

[6]

Luo H, Cai L, Li C., “Rail Surface Defect Detection Based on an Improved YOLOv5s.Applied Sciences,” 13 (12), 7330 (2023). https://doi.org/10.3390/app13127330 Google Scholar

[7]

Si C, Luo H, Han Y, Ma Z., “Rail-STrans: A Rail Surface Defect Segmentation Method Based on Improved Swin Transformer,” Applied Sciences., 14 (9), 3629 (2024). https://doi.org/10.3390/app14093629 Google Scholar

[8]

Wang S, Yan B, Xu X, Wang W, Peng J, Zhang Y, Wei X, Hu W., “Automated Identification and Localization of Rail Internal Defects Based on Object Detection Networks,” Applied Sciences., 14 (2), 805 (2024). https://doi.org/10.3390/app14020805 Google Scholar

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Shanping Ning, Wenxing Wu, Yunlin Wu, and Mingzhen Jiang "Rail track surface defect detection algorithm integrating MobileNetv3 and Transformer", Proc. SPIE 13401, International Conference on Automation and Intelligent Technology (ICAIT 2024), 1340108 (21 October 2024); https://doi.org/10.1117/12.3038836

Access the abstract

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Object detection

Defect detection

Detection and tracking algorithms

Feature extraction

Head

Performance modeling

1.

INTRODUCTION

2.

ESTABLISHMENT OF A DETECTION MODEL FOR SURFACE DEFECTS ON RAILWAY RAILS

2.1