-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
496 general conduct research into detection methods for lane recognition #502
base: main
Are you sure you want to change the base?
Changes from 11 commits
3584e38
353ec70
0b59c5f
1c2c5b9
5d69d59
296e29d
6e7e011
50c85d5
2fa31df
8d12f90
e83a53b
319747a
de012b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,136 @@ | ||||||||||||||||||
# Summary of Research for lane detection models | ||||||||||||||||||
|
||||||||||||||||||
- [Overall](#overall) | ||||||||||||||||||
- [Lane Detection on TuSimple](#lane-detection-on-tusimple) | ||||||||||||||||||
- [Lane Detection on CULane](#lane-detection-on-culane) | ||||||||||||||||||
- [Lane Detection on CurveLanes](#lane-detection-on-curvelanes) | ||||||||||||||||||
|
||||||||||||||||||
## Overall | ||||||||||||||||||
|
||||||||||||||||||
Lane Detection is a computer vision task that involves identifying the boundaries of driving lanes in a video or image of a road scene. The goal is to accurately locate and track the lane markings in real-time, even in challenging conditions such as poor lighting, glare, or complex road layouts. | ||||||||||||||||||
[1] | ||||||||||||||||||
|
||||||||||||||||||
The website PapersWithCode [1] gives several benchmarks for different lane detection models. In the following the best perfoming model of the three most common Datasets are presented. The models evaluation is based on the calculation of accuracy, precision, Recall, and F1-measure. | ||||||||||||||||||
The formulas are presented in the image below. [3] | ||||||||||||||||||
|
||||||||||||||||||
![Evaluation metric formulas for lane detection models](/assets/perception/Formula_for_Modelevaluation_lane_detection.png) | ||||||||||||||||||
|
||||||||||||||||||
## Lane Detection on TuSimple | ||||||||||||||||||
|
||||||||||||||||||
The TuSimple dataset consists of 6,408 road images on US highways. The resolution of image is 1280×720. The dataset is composed of 3,626 for training, 358 for validation, and 2,782 for testing called the TuSimple test set of which the images are under different weather conditions. [2] | ||||||||||||||||||
|
||||||||||||||||||
The best performing model on TuSimple is called **SCNN_UNet_Attention_PL**. [3] | ||||||||||||||||||
|
||||||||||||||||||
This study proposes a pipeline for lane detection through self | ||||||||||||||||||
pre-training with masked sequential autoencoders and finetuning segmentation with customized PolyLoss. In the first | ||||||||||||||||||
stage, the images are randomly masked as the inputs, and the | ||||||||||||||||||
neural network model is pre-trained with reconstructing the | ||||||||||||||||||
complete images as the objective. In the second stage, the pretrained neural network model weights are transferred to the | ||||||||||||||||||
segmentation neural network model with the same backbone | ||||||||||||||||||
and only the structure of the output layer is adjusted. In this | ||||||||||||||||||
phase, continuous image frames without any masking are | ||||||||||||||||||
served as inputs. The neural network weights are further | ||||||||||||||||||
updated and fine-tuned by minimizing PolyLoss with the | ||||||||||||||||||
backpropagation mechanism. | ||||||||||||||||||
|
||||||||||||||||||
### SCNN_UNet_Attention_PL Advantages | ||||||||||||||||||
|
||||||||||||||||||
- Robustness: The model is robust against difficult scenarios such as glare, shadows or obscured lines, which is ideal for simulating realistic road conditions. | ||||||||||||||||||
|
||||||||||||||||||
- Training efficiency: Due to its fast convergence, it can be trained and implemented with moderate resource requirements. | ||||||||||||||||||
|
||||||||||||||||||
- Continuous lane tracking: By processing multiple image sequences, the model provides reliable recognition results in dynamic driving scenarios. | ||||||||||||||||||
|
||||||||||||||||||
### SCNN_UNet_Attention_PL Disadvantages | ||||||||||||||||||
|
||||||||||||||||||
- No implementation or code found | ||||||||||||||||||
|
||||||||||||||||||
## Lane Detection on CULane | ||||||||||||||||||
|
||||||||||||||||||
CULane is a large scale challenging dataset for academic research on traffic lane detection. It is collected by cameras mounted on six different vehicles driven by different drivers in Beijing. More than 55 hours of videos were collected and 133,235 frames were extracted. | ||||||||||||||||||
The dataset is divided into 88880 images for training set, 9675 for validation set, and 34680 for test set. The test set is divided into normal and 8 challenging categories. [4] | ||||||||||||||||||
|
||||||||||||||||||
The best performing model on CULane is called **CLRerNet-DLA34**. [5] | ||||||||||||||||||
|
||||||||||||||||||
The model uses the best-performing CLRNet for row-based lane representation. This means the lane instance is represented as a set of x-coordinates at the fixed rows. It combines this approach with a new metric, LaneIoU. LaneIoU considers local angle variations in the lane to improve accuracy | ||||||||||||||||||
when comparing model predictions to actual lane positions. | ||||||||||||||||||
|
||||||||||||||||||
### CLRerNet-DLA34 Advantages | ||||||||||||||||||
|
||||||||||||||||||
- improved accuracy: With the LaneIoU metric, the model can detect tilted or curved lanes more precisely. In testing, CLRerNet achieved higher F1 scores than other models, especially in challenging conditions like curves or difficult lighting | ||||||||||||||||||
|
||||||||||||||||||
### CLRerNet-DLA34 Disadvantages | ||||||||||||||||||
|
||||||||||||||||||
- not perfect: Although CLRerNet shows significant improvement in performance, there still is a gap between | ||||||||||||||||||
the best CLRerNet model’s performance (81.43%) and the | ||||||||||||||||||
oracle-confidence case (98.47%) | ||||||||||||||||||
|
||||||||||||||||||
## Lane Detection on CurveLanes | ||||||||||||||||||
|
||||||||||||||||||
CurveLanes is a new benchmark lane detection dataset with 150K lanes images for difficult scenarios such as curves and multi-lanes in traffic lane detection. It is collected in real urban and highway scenarios in multiple cities in China. | ||||||||||||||||||
It is the largest lane detection dataset so far and establishes a more challenging benchmark for the community. [6] | ||||||||||||||||||
|
||||||||||||||||||
The best performing model on CurveLanes is **CondLSTR (ResNet-101)** [7] | ||||||||||||||||||
|
||||||||||||||||||
### CondLSTR (ResNet-101) | ||||||||||||||||||
|
||||||||||||||||||
The paper "Generating Dynamic Kernels via Transformers for Lane Detection" presents a novel approach to lane detection, using dynamic convolutional kernels generated by transformers. | ||||||||||||||||||
This method achieves greater flexibility by adapting kernels to specific spatial features in images, which enhances detection accuracy in challenging environments. | ||||||||||||||||||
The dynamic kernel approach allows the model to perform well in varied lane geometries and lighting conditions. This framework demonstrates competitive performance on standard lane detection benchmarks, highlighting the potential of transformer-based architectures in dynamic feature extraction for computer vision tasks. | ||||||||||||||||||
|
||||||||||||||||||
![Image-CondLSTR](../../../assets/perception/CondLSTR(RESNet101)_Model_Overview.jpg) | ||||||||||||||||||
|
||||||||||||||||||
For more details, view the paper [here](https://openaccess.thecvf.com//content/ICCV2023/papers/Chen_Generating_Dynamic_Kernels_via_Transformers_for_Lane_Detection_ICCV_2023_paper.pdf). | ||||||||||||||||||
|
||||||||||||||||||
**model explanation:** | ||||||||||||||||||
|
||||||||||||||||||
- Input Encoding: The model begins by encoding the input lane images using a CNN backbone, capturing basic spatial features. | ||||||||||||||||||
- Transformer Encoder: A transformer layer processes these features to capture long-range dependencies, which is particularly useful for lanes with varying curves and patterns. | ||||||||||||||||||
- Dynamic Kernel Generation: Based on the encoded features, the model generates dynamic convolutional kernels. These kernels are adaptive, varying across spatial regions of the image, allowing the model to emphasize lane-relevant features more effectively. | ||||||||||||||||||
- Convolutional Processing: The generated kernels are applied to process lane features dynamically, refining details at multiple levels, such as curves, visibility under different lighting, and occlusions. | ||||||||||||||||||
- Output Layer: Finally, the model outputs lane detection maps, indicating lane positions on the input image. | ||||||||||||||||||
|
||||||||||||||||||
**output:** | ||||||||||||||||||
|
||||||||||||||||||
![Comparison of CondLSTR model performance](/assets/perception/Comparison_of_models_CondLSTR(RESNet101).jpg) | ||||||||||||||||||
|
||||||||||||||||||
**advantages:** | ||||||||||||||||||
|
||||||||||||||||||
- Dynamic Kernel Generation: The key strength of this model is its ability to generate dynamic convolutional kernels through transformers. This makes the model highly adaptable to varying lane structures, such as curves, occlusions, and intersections, compared to traditional static kernels | ||||||||||||||||||
|
||||||||||||||||||
- Handling Complex Lane Topologies: The transformer-based dynamic kernels capture global lane structure across the image, which is crucial for detecting lanes with complex patterns (e.g., forks, dense lanes) that static methods struggle with | ||||||||||||||||||
|
||||||||||||||||||
- Improved Robustness: Due to the dynamic nature of the kernels, the model performs better under challenging conditions like occlusions and varying lane configurations, which often confuse traditional methods | ||||||||||||||||||
|
||||||||||||||||||
**disadvantages:** | ||||||||||||||||||
|
||||||||||||||||||
- Computational Complexity: While the model offers significant performance improvements, the use of transformers and dynamic kernel generation may result in higher computational cost and slower inference times, especially in real-time applications | ||||||||||||||||||
|
||||||||||||||||||
- Dependence on High-Quality Data: Like many deep learning models, this approach relies on large, well-labeled datasets for training. In environments where high-quality lane annotation is difficult or expensive to acquire, the model may struggle | ||||||||||||||||||
|
||||||||||||||||||
## Summary | ||||||||||||||||||
|
||||||||||||||||||
| Model | Test Acc (%) | Precision | Recall | F1-Measure | Release | Dataset | | ||||||||||||||||||
|--------|-------------|-----------|---------|------------|---------|----------| | ||||||||||||||||||
| SCNN_UNet_Attention_PL | 98.36 | 0.937 | 0.911 | 0.924 | 2022 | TuSimple | | ||||||||||||||||||
| CLRerNet-DLA34 | N/A | 0.917 | 0.818 | 0.814 | 2023 | CULane | | ||||||||||||||||||
| CondLSTR(ResNet-101) | N/A | 0.913 | 0.858 | 0.885 | 2023 | CurveLanes | | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
![Comparison_on_dataset](/doc/assets/perception/Comparison_of_CurveLane_Dataset.jpg) | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix incorrect image path The image path uses -![Comparison_on_dataset](/doc/assets/perception/Comparison_of_CurveLane_Dataset.jpg)
+![Comparison of models on CurveLane dataset](/assets/perception/Comparison_of_CurveLane_Dataset.jpg) 📝 Committable suggestion
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
It should be mentioned that the paper by Robin Karlsson et. al was also read, but it was not worth comparing it here as it was a selfsupervised learning model and was not trained on the above datasets but on a smaller dataset and used more sensors and cameras | ||||||||||||||||||
(6 cameras, 5 radars and 1 lidar) than allowed in the Qualify. | ||||||||||||||||||
But for interrested people you can find the paper [here](https://arxiv.org/pdf/2304.13242v2) | ||||||||||||||||||
Comment on lines
+122
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Improve clarity of excluded research The explanation for excluding Robin Karlsson's paper needs restructuring: -It should be mentioned that the paper by Robin Karlsson et. al was also read, but it was not worth comparing it here as it was a selfsupervised learning model and was not trained on the above datasets but on a smaller dataset and used more sensors and cameras
-(6 cameras, 5 radars and 1 lidar) than allowed in the Qualify.
+The self-supervised learning model proposed by Karlsson et al. was excluded from comparison because:
+1. It uses a different training approach (self-supervised learning)
+2. It was trained on a smaller, incomparable dataset
+3. Its hardware requirements (6 cameras, 5 radars, and 1 lidar) exceed the project constraints 📝 Committable suggestion
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
## Decision | ||||||||||||||||||
|
||||||||||||||||||
We chose to implement the CLRerNet-DLA34 and the CondLSTR(ResNet-101) because they have already implemented code bases to save time. | ||||||||||||||||||
Comment on lines
+126
to
+128
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Enhance the decision section with technical justification The current decision rationale focuses solely on code availability. Consider adding:
|
||||||||||||||||||
|
||||||||||||||||||
[1]: https://paperswithcode.com/task/lane-detection#datasets | ||||||||||||||||||
[2]: https://paperswithcode.com/dataset/tusimple | ||||||||||||||||||
[3]: https://arxiv.org/pdf/2305.17271v2 | ||||||||||||||||||
[4]: https://paperswithcode.com/dataset/culane | ||||||||||||||||||
[5]: https://arxiv.org/pdf/2305.08366v1 | ||||||||||||||||||
[6]: https://paperswithcode.com/dataset/curvelanes | ||||||||||||||||||
[7]: https://openaccess.thecvf.com//content/ICCV2023/papers/Chen_Generating_Dynamic_Kernels_via_Transformers_for_Lane_Detection_ICCV_2023_paper.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Explain missing test accuracy values
The summary table shows "N/A" for test accuracy of CLRerNet-DLA34 and CondLSTR. Please: