Citrus detection algorithm in natural environment based on Dense-TRU-YOLO
DOI:
https://doi.org/10.25165/ijabe.v18i1.8866Keywords:
citrus, picking robot, Dense-TRU-YOLO, Denseblock, UNet -FPNAbstract
Accurate detection of citrus in the natural orchard is crucial for citrus-picking robots. However, it has become a challenging task due to the influence of illumination, severe shading of branches and leaves, as well as overlapping of citrus. To this end, a Dense-TRU-YOLO model was proposed, which integrated the Denseblock with the Transformer and used UNet++network as the neck structure. First of all, the Denseblock structure was incorporated into YOLOv5, which added shallow semantic information to the deep part of the network and improved the flow of information and gradients. Secondly, the deepest Cross Stage Partial Connections (CSP) bottleneck with the 3 convolutions module of the backbone was replaced by the CSP Transformer with 3 convolutions module, which increased the semantic resolution and improved the detection accuracy of occlusion. Finally, the neck of the original network was replaced by the combined structure of UNet++ feature pyramid networks (UNet++-FPN), which not only added cross-weighted links between nodes with the same size but also enhanced the feature fusion ability between nodes with different sizes, making the regression of the network to the target boundary more accurate. Ablation experiments and comparison experiments showed that the Dense-TRU-YOLO can effectively improve the detection accuracy of citrus under severe occlusion and overlap. The overall accuracy, recall, mAP@0.5, and F1 were 90.8%, 87.6%, 90.5%, and 87.9%, respectively. The precision of Dense-TRU-YOLO was the highest, which was 3.9%, 6.45%, 1.9%, 7.4%, 3.3%, 4.9%, and 9.9% higher than that of the YOLOv5-s, YOLOv3, YOLOv5-n, YOLOv4-tiny, YOLOv4, YOLOX, and YOLOF, respectively. In addition, the reasoning speed was 9.2 ms, 1.7 ms, 10.5 ms, and 2.3 ms faster than that of YOLOv3, YOLOv5-n, YOLOv4, and YOLOX. Dense TRU-YOLO is designed to enhance the accuracy of fruit recognition in natural settings and boost the detection capabilities for small targets at extended ranges. Keywords: citrus, picking robot, Dense-TRU-YOLO, Denseblock, UNet++-FPN DOI: 10.25165/j.ijabe.20251801.8866 Citation: Zheng T X, Zhu Y L, Liu S Y, Li Y F, Jiang M Z. Detection of citrus in the natural environment using Dense-TRUYOLO. Int J Agric & Biol Eng, 2025; 18(1): 260–266.References
Ross J, Davis V, Foste C, Ray T. Agricultural Statistics. 2020. Available: http://www.nass.usda.gov. Accessed on [2023-05-14].
Guo J, Gao Z, Xia J, Ritenour M A, Li G, Shan Y. Comparative analysis of chemical composition, antimicrobial and antioxidant activity of citrus essential oils from the main cultivated varieties in China. Lebensmittel-Wissenschaft & Technologie, 2018; 97: 825–839.
Gonzalez-de-Santos P, Fernández R, Sepúlveda D, Navas E, Emmi L, Armada M. Field robots for intelligent farms - Inhering features from industry. Agronomy, 2020; 10(11): 1638.
Mehta S S, MacKunis W, Burks T F. Robust visual servo control in the presence of fruit motion for robotic citrus harvesting. Computers and Electronics in Agriculture, 2016; 123: 362–375.
Mehta S S, Burks T F. Vision-based control of robotic manipulator for citrus harvesting. Computers and Electronics in Agriculture, 2014; 102: 146–158.
Gan H, Lee W S, Alchanatis V, Ehsani R, Schueller J K. Immature green citrus fruit detection using color and thermal images. Computers and Electronics in Agriculture, 2018; 152: 117–125.
Lu J, Hu X W. Detecting green citrus fruit on trees in low light and complex background based on MSER and HCA. Transactions of the CSAE, 2017; 33(19): 196–201. (in Chinese)
Zhao C Y, Lee W S, He D J. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove. Computers and Electronics in Agriculture, 2016; 124: 243–253.
Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016; pp.779–788. doi: 10.1109/CVPR.2016.91.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single shot multibox detector. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Springer, 2016; pp.21–37. doi: 10.1007/978-3-319-46448-0_2.
Girshick R. Fast R-CNN. In: IEEE International Conference on Computer Vision, Santiago, 2015; pp.1440-1448. doi: 10.1109/ICCV.2015.169.
Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017; 39(6): 1137–1149.
He K M, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020; 42(2): 386–397.
Liu G X, Nouaze J C, Touko Mbouembe P L, Kim J H. YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3. Sensors, 2020; 20(7): 2145.
Yang C H, Xiong L Y, Wang Z, Wang Y, Shi G, Kuremot T, et al. Integrated detection of citrus fruits and branches using a convolutional neural network. Computers and Electronics in Agriculture, 2020; 174: 105469.
Zheng T X, Jiang M Z, Li Y F, Feng M C. Research on tomato detection in natural environment based on RC-YOLOv4. Computers and Electronics in Agriculture, 2022; 198: 107029.
Jocher G, Stoken A, Borovec J, NanoCode012, Stan C, Liu C Y, et al. ultralytics/yolov5: v3.1 - Bug fixes and performance improvements. 2020. Available: https://zenodo.org/records/4154370. Accessed on [2023-06-21].
Yan B, Fan P, Lei X Y, Liu Z J, Yang F Z. A real-time apple targets detection method for picking robot based on improved YOLOv5. Remote Sensing, 2021; 13(9): 1619.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015, Springer, 2015; pp.234–241. doi: 10.1007/978-3-319-24574-4_28.
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu: IEEE, 2017; pp.2261–2269. doi: 10.1109/CVPR.2017.243.
Srinivas A, Lin T Y, Parmar N, Shlens J, Abbeel P, Vaswani A. Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville: IEEE, 2021; pp.16514–16524. doi: 10.1109/CVPR46437.2021.01625.
Han G J, He M, Gao M Z, Yu J Y, Liu K P, Qin L. Insulator breakage detection based on improved YOLOv5. Sustainability, 2022; 14(10): 6066.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 2017; 30(4): 6000–6010. doi: 10.5555/3295222.3295349.
Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North, Minneapolis, Minnesota, 2019; 1: 1423.doi: 10.18653/v1/n19-1423.
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. arXiv: Computation and Language, 2005; In Press. doi: 10.48550/arXiv.2005.14165.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv:2010.11929, 2020; doi: 10.48550/arXiv.2010.11929.
Zhang Z X, Lu X Q, Cao G J, Yang Y T, Jiao L C, Liu F. ViT-YOLO: Transformer-based YOLO for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021; pp.2799–2808. doi: 10.1109/ICCVW54120.2021.00314.
Liu S, Qi L, Qin H F, Shi J P, Jia J Y. Path aggregation network for instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018; pp.8759–8768. doi: 10.1109/CVPR.2018.00913.
Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017; pp.936–944. doi: 10.1109/CVPR.2017.106.
Zhou Z W, Siddiquee M M R, Tajbakhsh N, Liang J M. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 2020; 39(6): 1856–1867.
Downloads
Published
How to Cite
Issue
Section
License
IJABE is an international peer reviewed open access journal, adopting Creative Commons Copyright Notices as follows.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).