Review of deep learning-based weed identification in crop fields

: Automatic weed identification and detection are crucial for precision weeding operations. In recent years, deep learning (DL) has gained widespread attention for its potential in crop weed identification. This paper provides a review of the current research status and development trends of weed identification in crop fields based on DL. Through an analysis of relevant literature from both within and outside of China, the author summarizes the development history, research progress, and identification and detection methods of DL-based weed identification technology. Emphasis is placed on data sources and DL models applied to different technical tasks. Additionally, the paper discusses the challenges of time-consuming and laborious dataset preparation, poor generality, unbalanced data categories, and low accuracy of field identification in DL for weed identification. Corresponding solutions are proposed to provide a reference for future research directions in weed identification.


Introduction
The world's population is expected to reach 10 billion by 2050, leading to increased demand for agricultural production [1] .Weed infestation is a major challenge for agricultural production [2] , as weeds compete with crops for resources and act as intermediate hosts for pests and diseases, resulting in significant yield losses [3,4] .To reduce weed infestation, various methods have been used, including manual, chemical [5,6] , and mechanical weeding [7,8] .However, each method has its limitations, and there is an urgent need for automated precision weed control systems that can accurately target weeds in the field while minimizing herbicide use and promoting environmentally friendly agriculture.
One critical step in automated precision weeding is the accurate identification and detection of weeds.Machine learning (ML) [9] , including support vector machines, multilayer perceptrons, and random forests, has been widely used for weed detection [10] based on shape [11] , color [12][13][14] , and texture [15,16] features.However, the similarity in appearance between crops and weeds presents a significant challenge in weed identification.Deep learning (DL) [17] , a branch of ML that enables high-level abstraction and representation learning of raw data through multi-layer non-linear transformations, has shown potential in addressing this challenge.Compared to traditional ML, DL eliminates the need for manual feature selection and transformation in data processing, allowing for the automatic extraction of higher dimensional feature discriminations from raw data [18] .
In order to analyze the potential of DL technology in weed recognition applications, this paper aims to provide a comprehensive summary, overview, analysis, and outlook on the application of DL in the field of weed recognition.The current state of research at home and abroad is summarized, and the key technologies involved in weed recognition are described.Technical tasks of classification, detection, and segmentation in weed recognition by DL are discussed, including the analysis of data acquisition, dataset preparation, and weed recognition models.This comprehensive literature survey can serve as a reference for subsequent research on precision weed control.weed identification, a comprehensive search and statistical analysis of Chinese and English literature was conducted.The research question was centered on the application of DL techniques in weed identification, detection, localization, and classification.Keywordbased searches were performed in domestic and international databases such as Google Scholar and China National Knowledge Infrastructure (CNKI) for English and Chinese journal articles and conference papers using the keywords ("deep learning" or "convolutional neural network") + ("weed classification" or "weed identification" or "weed detection" or "weed localization").Figure 1 presents the statistical results, which show the number of papers up to 2021. Figure 1 depicts the number of research papers using DL for weed identification from 2014 to 2021.The graph reveals that international research in this area began in 2014, while Chinese research began later, in 2017.Before 2017, there were few publications in this field, both at home and abroad.However, from 2018 onwards, the number of papers in English increased significantly, indicating a positive trend.Furthermore, research applications in this area in China began to rise after 2019, highlighting the growing attention given to the use of DL in weed recognition.

International relevance review
Su et al. [19] conducted a review paper on the challenges and applications of spectral imaging techniques in crop weed identification.However, the majority of the articles reviewed utilized traditional ML methods and not DL techniques.
Kamilaris and Prenafeta-Boldu [20] surveyed 40 research efforts that applied DL techniques in agriculture, including weed detection.Their results showed that DL outperformed commonly used image processing techniques in terms of classification or regression performance.Wang et al. [21] provided an overview of research progress in ground-based machine vision and image processing techniques for weed detection.The authors outlined the four steps of weed detection, including preprocessing, segmentation, feature extraction, and classification.The authors noted that differentiating between crops and weeds that share similar characteristics remains a challenge in weed detection.The study compared traditional ML and DL techniques for weed detection and discussed challenges and solutions encountered by researchers in field weed detection, such as leaf shading and overlapping, varying light conditions, and different growth stages.

Chinese relevance review
In 2019, two studies were conducted in China investigating the application of DL in agriculture.Lyu et al. [22] from South China Agricultural University conducted an overview of 65 DL research papers published in China's agriculture sector from 2014 to 2019, which revealed that 80% of the research objects were plants, with plant classification and weed identification being the most popular research topics.Another 2019 study by Weng et al. [23] from Tsinghua University compared traditional plant phenotyping methods with DL techniques in plant identification and weed detection.They analyzed the research findings and summarized the advantages and disadvantages of DL and traditional ML methods, concluding that DL-based agricultural plant phenotyping methods perform better in terms of plant identification accuracy and real-time performance.
In a 2020 study by Li's team [24] , they combined agricultural information imaging-aware data sources with DL techniques to provide an overview of the latest research on the application of DL in plant recognition and detection.They found that the accuracy of recognition models built using CNN for higher-order feature extraction as input is significantly higher than that of similar models built using traditional image color and texture features.However, the authors highlighted the importance of data acquisition and the choice of data expansion as these factors are closely related to the diversity of datasets covered, which can directly affect the results of network training.The authors also conducted a comparative study of network architectures and recommended the need for continued development of dataset construction and DL model design, comparison, and optimization for specific research subjects.

Introduction to deep learning
Deep learning refers to a neural network-based ML technique that is effective at processing complex and large-scale data [25] .It has made significant breakthroughs in various fields, including computer vision, speech recognition, and natural language processing [26] .DL differs from traditional ML in several ways.Firstly, DL network contains many more hidden layers, up to hundreds or thousands, which emphasizes the depth of the model.Secondly, DL uses feature learning to extract useful data features automatically from the original data and form higher-level features by combining lower-level features.This process allows DL to learn features directly from big data, enabling a better description of the rich inherent information in the data.Finally, DL takes an "end-toend" approach to data processing, which simplifies the data processing by designing and building the right number of neuronal computation nodes and multi-level computing structures.
DL has led to remarkable progress in the field of artificial intelligence, with various network models developed, including multilayer perceptron [27] , convolutional neural network (CNN) [28] , deep confidence network [29] , recurrent neural network [30] , generative adversarial network (GAN) [31] , transformer network, and graph convolutional network (GCN) [32] .Among these networks, CNN is commonly applied in image processing and computer vision, GAN is significant in data generation and augmentation, and GCN holds potential in analyzing graph-structured data.

Convolutional Neural Network
Convolutional Neural Network (CNN) is a prevalent DL model used in various applications, including image recognition, video analysis, and natural language processing.As shown in Figure 2, the CNN architecture consists of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer [28] .The CNN applies convolutional operations to extract local features from the input data, and multiple convolutional and pooling layers abstract and represent the input data layer by layer.This process is achieved through the use of convolutional kernels, which extract feature information from the input data at different levels and degrees of abstraction.The convolutional layer of the CNN uses multiple convolution kernels to extract features by connecting each neuron of a feature mapping to a region of neighboring neurons in the previous layer.This layer employs a parameter-sharing mechanism to extract the same features at different locations throughout the image, reducing the number of training parameters and improving the generalization ability of the model.The pooling layer, located after the convolutional layer, down samples the input data, reducing the dimensionality of the image matrix and preserving the features.The fully connected layer, which follows several alternating combinations of convolutional and pooling layers, produces global semantic information by connecting each neuron to all neurons in the previous layer.This transformation reduces multidimensional features into one-dimensional features, which are passed to the final classifier to produce the final classification result.
CNN is highly effective at extracting features from images, and the use of multiple convolutional and pooling layers enables the model to learn to generalize features across different regions of the input, improving its performance in classification tasks.Furthermore, the ability to extract features at different levels and with different degrees of abstraction allows for layer-by-layer abstraction and representation of the input data for better classification, recognition, and other tasks.

Generative Adversarial Network
In recent years, Generative Adversarial Network (GAN) has emerged as a powerful tool for generating data.GAN consists of two separate neural networks: generator and discriminator [33] .The generator network produces synthetic samples that resemble real data, while the discriminator network distinguishes between real data and generated data.The generator network takes random noise as input and produces synthetic samples that are intended to be similar to real data.The discriminator network takes both real and generated samples as input and attempts to accurately identify whether each sample is real or fake.The two networks are trained in an adversarial manner, wherein the generator attempts to deceive the discriminator into believing that its generated samples are real, while the discriminator attempts to identify the generated samples as forgeries.In other words, the generator strives to produce better samples over time, and the discriminator aims to enhance its ability to differentiate between real and synthetic samples.
GAN has garnered significant attention in the domain of image composition tasks due to its ability to produce synthetic data that emulate real data distributions.Conventional DL models necessitate a substantial amount of authentic data for training, which can be expensive and may suffer from data imbalance issues.The utilization of GAN can alleviate this problem by generating copious amounts of synthetic data to supplement and balance the dataset.This not only augments the quantity of data but also enhances the diversity of the dataset, thereby improving the generalization performance of the model.

Graph Convolutional Network
A graph convolutional network is a DL model designed for graph-structured data.The fundamental idea of GCN is to extend the convolution operation of grid-structured data to graph-structured data, which can effectively capture the structural information of graphs [34] .There are two primary approaches to implementing graph convolution operations in GCN: the spectral-based approach and the spatial-based approach.The spectral-based approach converts graph convolution operations into matrix multiplication operations using the graph Laplacian matrix, which can be efficiently computed using the Fast Fourier Transform (FFT).In contrast, spatial-based methods utilize fixed or learnable weight matrices to directly aggregate the features of neighboring nodes.
Compared to traditional ML models for modeling graphstructured data, GCN offers several advantages.Firstly, GCN can capture graph structure information by aggregating features of neighboring nodes, which is particularly useful in tasks such as node classification, link prediction, and graph classification.Secondly, GCN can handle graphs with different sizes and topologies, making them suitable for modeling complex and irregular data structures.Additionally, GCN can learn hierarchical representations of graphs by stacking multi-layer graph convolution operations and non-linear activation functions.
In agriculture, GCN has been applied to solve weed detection and classification problems.GCN can model the relationships between adjacent pixels in weed images and capture the unique features of different types of weeds.Several studies have been conducted using GCN to solve weed classification tasks.GCN shows great potential for weed detection and classification tasks in agriculture, and further research in this area is expected to lead to more efficient and accurate methods for agricultural weed management.

Weed detection methods overview
The general workflow of weed identification based on DL is depicted in Figure 3.A typical DL process for weed identification involves four major steps: data acquisition, dataset preparation, DL model construction, training, and tuning.This section presents an analysis of relevant publications based on the technical approach illustrated in Figure 3, with a focus on data sources that encompass self-acquisition and the use of public datasets, and dataset preparation processes, including image pre-processing, training data generation, data labeling, and dataset partitioning.Moreover, this section outlines DL models for different technical tasks, such as image classification models, target detection models, and target segmentation models.accuracy rates in image recognition.While they require more time and effort in model training compared to traditional image processing methods, the investment is worthwhile given the reliability and processing speed they offer.Adequate labeled data forms the basis for training DL models, and the more data used for training, the better the prediction accuracy of the model [35] .The dataset used for model training typically consists of thousands of original or pre-processed images, making data acquisition the first step in identifying weeds using DL.Two main ways of acquiring agricultural data are self-collection and the use of publicly available databases.The size of the dataset is positively correlated with the complexity of the problem under study; that is, the more classes of objects and the smaller the differences between classes, the larger the amount of data required for training.Given the many different types of weeds in the field and the small differences between them and crop seedlings, weed identification demands a large amount of data.4.1.1Self-Collection Self-collection of data allows for targeted collection of image or video data based on the specific problem being studied, such as the subject of the study, the data model, and the size of the dataset.The collection of images of crops and weeds in the field should take into account the growth state of the plants at different times and consider as many environmental factors as possible during photography.Images of weeds in the field should be taken from various angles under different lighting conditions to ensure the diversity of the data, forming the basis for producing the dataset.
Different imaging sensors, including RGB cameras, multispectral cameras, hyperspectral imagers, near-infrared spectrometers, and infrared thermal imagers, can produce images with varying data patterns [36] .As listed in Table 1, these sensors have been used in various studies to collect data for weed identification purposes.Automated image acquisition platforms such as drones and unmanned vehicles are also commonly used in agricultural image data acquisition, providing advantages such as large range and high quality [37,38] .Several studies have used such platforms to capture images of weeds and crops, including multirotor UAVs [39][40][41] , helicopters [42] , and drones equipped with cameras [43] .These platforms have greatly increased the efficiency of the image acquisition process.

Reference
Imaging sensor Data acquisition platform Image characteristics Adhikari et al. [44] RGB camera Handheld 350 RGB images of line-transplanted paddy fields, 760 RGB images of row-transplanted paddy fields Gao et al. [45] Nikon D7200 SLR camera (RGB) Handheld 652 RGB images of sugar beet fields under different lighting conditions Ma et al. [46] Canon IXUS 1000 HS camera (RGB) Handheld Rice field scenes to detect the location of crops and weeds Teimouri et al. [35,47] Mobile phone cameras, consumer-grade cameras, spot grey industrial cameras (RGB) Handheld 9649 RGB image samples of 18 weed species Yu et al. [48] Sony Cyber-Shot, Canon EOS Rebel T6 digital camera (RGB) Handheld Images of perennial ryegrass Farooq et al. [49][50][51] Hyperspectral imaging system, Sequoia multispectral sensor N/A Hyperspectral and multispectral data for four weed species: Hyme, Alli, Azol, and Hyac Huang et al. [39][40][41] RGB camera DJI Phantom 4 multi-rotor UAV RGB images of weeds and early crop tillering Petrich et al. [42] Sony alpha 7 RII (RGB) HiSystems MK ARF-OktoXL 4S12 helicopter Images of the plant Colchicum autumnale Osorio et al. [43] Parrot Sequoia multispectral camera Mavic Pro drone Images of lettuce fields Note: References [49-51] do not explicitly mention the dataset acquisition platform, so the data at the corresponding position in  Moreover, an increasing number of researchers have made their image datasets publicly available.For example, Chebrolu et al. [52] developed a dataset of sugar beet fields containing weed images captured by a farm robot equipped with a four-channel multispectral camera and an RGB-D sensor.The dataset can be downloaded from http://www.ipb.uni-bonn.de/data/sugarbeets2016/.Sudars et al. [53] provided an open weed detection dataset that includes 1118 manually annotated images of six food crops and eight weed species, with 7853 annotations in total.Additionally, Leminen Madsen et al. [54] released a public dataset for plant detection and classification (OPPD), consisting of 7590 RGB images of 47 plant species collected in Denmark, which is available for further use.

Dataset Preparation
The raw data obtained from various sources may not always be suitable for DL models, and a series of processing steps such as image pre-processing, data enhancement, and data labeling are required to prepare the data according to the training requirements of the network model, as shown in Figure 4. Image pre-processing is a crucial step in DL that can enhance the model's performance and contribute to the accuracy and reliability of the dataset.Commonly employed pre-processing techniques for weed images include background removal, resizing, green component segmentation, motion blur removal, noise removal, extraction of color vegetation indices, and changes in the color model [55] .These pre-processing steps enable better preparation of the dataset and lead to improved model performance.

Data Enhancement
The success of DL models relies on a large amount of highquality training data.However, obtaining enough data or data of high quality can be challenging at times, making data augmentation techniques a common tool.Data augmentation is a technique that enhances the training dataset by performing transformations on existing data to generate new data samples, thereby improving the generalization capability of the model.Commonly used data augmentation methods include random flipping, rotation, cropping, deformation scaling, adding noise, and color scrambling.These techniques enable better utilization of existing data and can contribute to improved model performance.
Generative networks have a wide range of applications in image data enhancement as they can generate new data by learning the probability distribution of the input data [56] .This is particularly useful in cases where obtaining high-quality training data is challenging.By generating new images with diversity from random noise inputs, generative networks can increase the size of the training data set and improve the generalization of the model.Furthermore, they can also generate more challenging images by synthesizing the input images with random noise, thereby improving the robustness of the model.
In tasks such as image segmentation and target detection, generative networks can perturb or distort the input image to generate images with different labels and locations.This increases the diversity and richness of the data and helps improve the performance and generalization of DL models.Overall, the application of generative networks to image data enhancement is a promising area of research with significant potential to advance the field of DL.

Data Annotation
Data annotation is a crucial component of most AI algorithms, and the more accurate the data annotation and the larger the amount of data annotated, the better the algorithm's performance.The purpose of data annotation is to pre-label the images that need to be recognized and distinguished by the computer so that the computer can continuously learn the features of these images and eventually achieve autonomous recognition.Common image recognition tasks include image classification, target detection, and target segmentation [57] , and different annotation techniques need to be used for each of these tasks.
1) Image classification is an essential task in AI, which aims to identify and distinguish between specific target classes.The classification annotation technique involves selecting appropriate labels from a given set of labels to assign to the annotated object.Typically, there is only one label assigned to an image.For instance, to classify images of crops and weeds, the annotator must assign the class label "maize" to an image of a maize seedling and "weed" to an image of a weed; 2) Target detection.The main annotation method for target detection is boundary annotation, which can be subdivided into two forms: rectangular and polygonal frames.Rectangular box annotation is currently the most widely used image annotation method, which can quickly frame the target object in a relatively simple and convenient way in the image or video data.Polygonal annotation refers to the use of polygonal boxes to mark out irregular objects in an image, which can provide more accurate framing for irregular objects compared to rectangular box annotation.For example, Figure 5b shows a rectangular box annotation for a crop seedling, while Figure 5c shows a polygonal annotation for a crop seedling; 3) Target segmentation.Target segmentation usually refers to semantic segmentation, where the task is to classify all the pixel points in an image, so the annotation requires each pixel to be classified with its corresponding class label, as shown in Figure 5d.

Dataset Partitioning
The partitioning of datasets is a crucial aspect of building and optimizing DL models and holds significant practical importance.Typically shown in Figure 6, the original dataset is divided into a training set, a validation set, and a test set in a certain proportion [58] .The training set is utilized for building the model, the validation set is utilized for determining the network structure or parameters that control the complexity of the model, and the test set is used to evaluate the performance of the final optimal model.When partitioning the dataset, certain principles should be followed.Firstly, the training set should be as large as possible to ensure the model's generalization ability.Secondly, the validation and test sets should be as independent as possible to avoid overly optimistic performance of the model on these sets.Finally, the dataset partitioning should be as random as possible to avoid the influence of specific patterns in the dataset on the model's training and testing.
Alternatively, if the dataset's size is small, cross-validation may be considered.Cross-validation involves dividing the dataset into non-overlapping subsets and utilizing one of the subsets at a time as the test set and the remaining subsets as the training and validation sets.By averaging the results of multiple cross-validations, the model's performance can be more accurately assessed.

Weed identification models
The main technical tasks of DL applied to the field of weed image processing are image classification, target detection, and target segmentation.To accomplish these tasks, researchers have developed various DL algorithms that are specifically designed for each task.

Image classification
Image classification is a crucial area of research in DL for image processing, with annual competitions held in computer vision to assess progress in this field.CNN is particularly popular for this task due to its ability to generate an effective representation of the original image.Popular CNN architectures such as AlexNet, VGG16, InceptionNet, and ResNet are widely used for weed image classification and recognition, with over 50% of research in this area utilizing them.
As listed in Table 3, researchers have proposed various methods to improve the accuracy of weed identification using DL.For instance, some studies have combined CNN with transfer learning or traditional ML classifiers to achieve high accuracy rates.Others have used techniques such as DCGAN networks, migration learning, and weighted cross-entropy loss functions to enhance recognition algorithms.Additionally, researchers have developed their own self-built CNN or lightweight convolutional networks to identify crops and weeds.
Overall, the studies have yielded promising results, with some achieving high accuracy rates of up to 99.29% for identifying crops and weeds.These findings suggest that DL has great potential for improving weed identification and could lead to more efficient and effective weed control in agriculture.

Target Detection
Target detection is based on image classification to achieve target localization in an image, giving the specific spatial location and boundaries of the target, so the development of image classification also drives the progress of target detection.Common target detection algorithms can be divided into two categories: one is a two-step target detection algorithm based on a region generation network, the first step extracts the possible sub-regions of the target in the image, the second step takes all the sub-regions as input, uses CNN for feature extraction, and finally performs detection classification and border regression correction, typical representatives are R-CNN, Fast R-CNN, Faster R-CNN series.The other category is a one-step algorithm based on border regression, which directly treats border prediction as regression prediction, without extracting candidate regions in advance, and the original image is used as input to directly output the prediction result.As listed in Table 3, various studies have been conducted to detect weeds in different crops and environments using different techniques.
Various studies have been conducted to improve target detection accuracy in different agricultural settings.Kavir Osorio et al. [43] designed three weed detection algorithms based on SVM-HOG, YOLO-V3, and Mask R-CNN models to estimate the weed cover of lettuce.Zhang et al. [59] used VGG-16, ResNet-50, and ResNet-101 as feature extraction networks for Faster R-CNN models to detect weeds in oilseed rape fields.Xu et al. [60] addressed the low weed recognition rate in Xinjiang cotton fields by comparing the detection effects of Faster R-CNN models of four feature extraction networks.mbining CNN with transfer learning 6 types of weeds in rice seedling fields AlexNet, VGG16, GoogLeNet VGG16 got an accuracy of 97.48% Peng et al.
VGG16-SGD achieved the highest accuracy with an average F value of 0.977 Huang et al. (2019) [15,41] Comparison of OBIA and CNN Images of rice fields taken with a drone AlexNet, VGGNet, GoogLeNet, ResNet VGGNet was significantly more accurate and efficient Espejo-Garcia et al. ( 2019) [64] Expanding data with DCGAN + transfer learning Tomatoes, lobelia GAN-Xception with ImageNet pre-training F1 value of 99.07% on test set Espejo-Garcia et al. ( 2021) [65] CNN with traditional machine learning classifiers Tomatoes, cotton, lobelia, velvet grass DenseNet and support vector machines F1 value of 99.29% Chen et al.
(2021) [68] Lightweight convolutional networks 8 field weeds and seedling maize Self-built CNN Average test recognition accuracy of 98.63% Target Detection Osorio et al.
(2020) [43] Blend detected target crops with NDVI background subtractor to detect weeds Home-made dataset SVM-HOG, YOLOv3, Mask R-CNN Improved accuracy compared to manual calculation Zhang et al.

detection networks based on YOLOv3
Accuracy of 95%, recall rate of 89% Ahmad et al.
(2020) [45] Using data augmentation to simplify image acquisition and labeling

Sugar beet and beaten bowl flowers
YOLOv3, YOLO-tiny, Improved YOLOtiny Improved YOLO-tiny improved detection speed and accuracy Image Segmentation Yu et al. (2020) [73] Comparing DL model and OTSUbased threshold segmentation Cabbage and weeds Mask R-CNN Mask R-CNN got a better effect with a pass rate of 81% Champl et al.(2020) [74] Motorised weeding robot with Machine Vision Maize, Soybean, Ryegrass, Quinoa, etc. Mask R-CNN (ResNet-50) Precise weed control by locating plants through their center of gravity Quan et al.

Oilseed rape and weed SegNet
Combination of SegNet with SVM classifier works best with 96% accuracy Asad and Victor (2020) [77] Comparison  [46] SegNet based on full convolution with transfer learning Rice seedling and weed (cichlid) As listed in Table 3, researchers have applied the Mask R-CNN network to various agricultural tasks, including cabbage and weed recognition, weed localization in motorized weeding robots, and detection of weeds in maize and oilseed rape fields.They have used ResNet-50, ResNet-101, VGG16, and SegNet as feature extraction networks to achieve precise segmentation of crops and weeds.The ML classifiers used to improve the accuracy of image segmentation in oilseed rape fields include SVM.The pixel labeling process can be accelerated by using SegNet based on ResNet-50.The improved Res-UNet model has also been used for high-accuracy image segmentation of sugar beet and weed data collected by farm information collection robots.

Discussion
DL excels in crop weed recognition detection.Compared to traditional image processing methods, DL eliminates the need for complex and inefficient manual feature extraction and only requires cropping the input image to a suitable size for target recognition, greatly reducing recognition time while ensuring accuracy.However, by summarizing the literature on DL for weed recognition and detection in recent years, some issues persist in this area of research, which are detailed in this section along with suggested solutions, in hopes of aiding future research and development of accurate weed recognition.
1) In the field of weed identification, an important future research direction is dataset processing.Creating datasets is a laborious and time-consuming process, and reducing the workload of data acquisition and annotation is a significant challenge in DL research.For data acquisition, one approach is to capture images of only one type of weed under consistent lighting conditions and use additional lighting or shading to reduce manual input.Alternatively, GANs or data augmentation techniques such as Mixup and Mosaic can be employed to expand the dataset.Semi-supervised or weakly supervised learning can be used for data annotation, which can reduce the time and effort required for manual annotation.For model training, Few-shot Learning can be utilized, which can learn to classify new categories using existing category information.The primary idea is to select a small number of samples from existing data, known as support and query sets, and teach the model to learn from these samples to classify new categories.Small sample learning can be accomplished through Meta-Learning methods, which train a model to adapt quickly to new tasks by continuously adjusting its parameters during the training process.This allows it to adapt more efficiently to new classes and tasks; 2) Another challenge in the field of weed recognition is the poor generality of the dataset.Currently, weed datasets only cover the growth period of a particular crop, and the lack of a common large dataset, as well as variations in light and shadow during image acquisition and differences in growth stages of crops and weeds, can all affect the final training effect.Therefore, it is essential to construct a large benchmark dataset by collecting images of various crops and weeds from different geographical locations, weather conditions, and growth stages; 3) Furthermore, the collected data may exhibit class imbalances, with substantial differences in the accuracy of identification between classes, leading to overfitting.To address this problem, appropriate data redistribution methods, cost-sensitive learning methods, or class-balanced classifiers can be utilized to improve classification accuracy.
4) The field operation environment is complex, and the actual recognition rate is low.Real fields are subject to bumps, high winds, and occlusions, resulting in blurred images, and clustered or obscured targets, which are unavoidable in practical applications.To realize the speed and effectiveness of the algorithm in real field conditions, multiple vision devices and sensors can be used to coordinate operations.A central system can regulate the data transmitted by each device to obtain more accurate positioning and solve the problem of in-row weeds between rows where occlusion exists not being fully identified and located.The fusion of multiple sensors and machine vision can be used to analyze crop and weed growth and target weed control.Moreover, more suitable recognition algorithms should be explored.For instance, adding attention mechanisms to enhance feature extraction, or introducing and improving algorithms such as Vision Transformer.The problem of reduced recognition accuracy due to blurred data, changes in light and shadow, and overlapping shadows caused by external factors in the field, and how to reduce the need for hardware performance can also be addressed through the optimization of DL algorithm models; 5) Currently, DL convolutional neural network algorithms are widely used for recognition and detection.To a certain extent, the deeper the number of layers in the network, the higher the recognition accuracy.However, it also means that the algorithm is more difficult to deploy to mobile devices.Hence, exploring the balance between lightness and depth is also a direction for future research.

Conclusions
This review provides a comprehensive review of the current status and progress of research on weed identification in crop fields based on DL.Through an analysis and comparison of relevant literature, the paper summarizes the current state of DL applications in weed identification, the technical routes and methods, and suggests directions and challenges for future research.
This paper begins by reviewing the current status of DL-based weed identification in crop fields and its development history and basic principles.It then introduces the basic concepts and characteristics of DL models, including CNN, GAN, and GCN.
Furthermore, the paper analyses the technical routes and methods of DL-based research on weed identification in crop fields, which includes data acquisition, dataset preparation, and DL models.It highlights the characteristics and applicable scenarios of different DL models such as image segmentation, target detection, and classification, and compares and evaluates them.
This paper concludes by discussing future directions and challenges in DL-based research on weed identification in crop fields, such as the lack of datasets, model robustness, generalization capability, and deployment.It proposes solutions such as data augmentation, small sample learning, and lightweight models, and discusses priorities for future research.
In conclusion, the development and application of deep learning-based weed identification technology have significantly advanced in recent years, with promising results in terms of accuracy and efficiency.However, in order to further enhance the performance of such technology, it is crucial to ensure a comprehensive and diverse dataset for training and testing.This includes the importance of data gathering at various times of the year, in various lighting and growth stages, as this will provide a more holistic understanding of weed growth patterns and characteristics.To achieve this, the implementation of robotics technology, specifically a robot with the ability to geo-reference and analyze weeds throughout their growth stages, could provide a highly efficient and effective solution.Additionally, the incorporation of multiple camera viewpoints and angles to compensate for occlusion could further improve the accuracy and reliability of the weed identification process.Furthermore, Furthermore, DL algorithms require further optimization to enhance model performance and robustness.Overall, the continuous exploration and innovation in this field hold great potential for advancing precision agriculture practices and contributing to sustainable food production.

Figure 1
Figure 1 Statistics on the number of papers using deep learning methods for weed detection

Figure 2
Figure 2 Basic structure of Convolutional Neural Network for weed identification

Figure 3
Figure 3 Roadmap for DL-based weed identification 4.1 Data Acquisition DL techniques have gained significant attention for their high

Figure 4
Figure 4 Basic flow of dataset preparation for DL models

Figure 5
Figure 5 Four types of annotation examples for DL-based weed identification

Figure 6
Figure 6 Basic flow of dataset partitioning for DL models

Table 1
datasets for crops, weeds, and other plant images.As listed in Table2, some commonly used datasets are Plant Village, Syngenta Crop Challenge, Flavia Leaf, Leafsnap, LifeCLEF, MalayaKew, and Plant Photo Bank of China (PPBC).
is N/A.4.1.2Public datasets The use of publicly available datasets can significantly reduce the workload by saving manpower and resources involved in data acquisition.A wide range of open databases are available for agriculture, including

Table 3 Research studies on deep learning for weed identification
Image segmentationImage segmentation is a process of partitioning an image into distinct regions based on certain criteria.Three types of image segmentation are semantic segmentation, instance segmentation, and panoramic segmentation.Semantic segmentation involves assigning a category label to each pixel in the image to label different objects with semantic information.The goal is to segment the image into different regions based on semantic categories, including background and discrete objects.Instance segmentation is an extension of semantic segmentation that further distinguishes objects within the same class of things.It requires accurately identifying different objects and labeling the semantic information in a complex background.Classical DL-based segmentation methods include FCN, Mask R-CNN, SegNet, and UNet.