Fast extraction of winter wheat planting area in Huang-Huai-Hai Plain using high-resolution satellite imagery on a cloud computing platform

: To extract regional winter wheat planting area using higher-resolution satellite imagery still faces many challenges due to large data size and long processing time in traditional remote sensing classification. Google Earth Engine (GEE), a cloud computing analysis platform based on global geospatial analysis, provides a new opportunity for rapid analysis of remote sensing data. In this study, high-quality Landsat-8 imagery was used to extract the winter wheat planting area from the Huang-Huai-Hai Plain in China. The random forest algorithm was used to identify and map the winter wheat sown in 2019 and harvested in 2020, and Sentinel-2 imagery was used to verify the results. The spectral indices, texture, and terrain features of the image were derived, and their contribution to the classification accuracy of winter wheat was evaluated by scoring. Then the top nine features were selected to form an optimal feature subset. Comparing the set of thirty-four features and the optimized feature subset as the input variables of the random forest classifier, the results show that the accuracy difference between the two feature classification schemes is small, but the classification effect of all feature sets is slightly better than the optimal feature subset. The overall classification accuracy of sample plots verification was 86%-95%, the Kappa coefficient was between 0.70 and 0.85, and the percentage error of the total area was 5.42%. The research demonstrates a reliable method for mapping a wide range of winter wheat planting area, and provides a good prospect for exploring the precise mapping of other crops, which is of great significance to crop monitoring and agricultural development. Fast extraction of winter wheat planting area in using high-resolution satellite imagery on a cloud computing platform.


Introduction
Wheat is an important food crop for mankind, and it is widely planted around the world. Its production is of great significance to global food security. In recent years, China's wheat planting area has been ranked third in the world [1] , and the Huang-Huai-Hai Plain (HHHP) is the largest wheat planting area in China. According to statistics, the planting area of winter wheat in the HHHP has accounted for an average of 67.8% of the country in the past 10 years [2] . The planting and production of winter wheat in this area are closely related to national food security. Therefore, the large-scale extraction of the spatial planting distribution of winter wheat is essential for government departments to macro-control food production information, and is essential for guiding agricultural production and formulating relevant agricultural policies [3] . At present, the widely used monitoring method by remote sensing technology is mainly single-machine classification [4] , that is, the experimental steps are completely carried out in a local operating program. Generally, when single-machine classification of remote sensing images is applied to a large area, it is necessary to download the data covering the research area to a local folder, and the amount of image data is as high as hundreds or thousands of scenes. A program runs locally to preprocess the imagery and perform other operations. The processing of images is complicated and cumbersome, the workload is large and consumes a lot of storage space and time cost [5] . Moreover, the classification speed is relatively slow, and it requires high computing performance and storage capacity of the computer [6][7][8] . Local computing cannot meet the identification and mapping of large-scale regional crops [4] . Therefore, this research was based on the Google Earth Engine (GEE) cloud computing platform and selected the HHHP as an example area to realize the regional-scale higher-resolution winter wheat identification.
There have been many studies on crop identification and classification using remote sensing technology [9,10] , among which there was a large of survey experiments to identify large-scale wheat areas. However, most of them use the imagery collected by detection instruments such as MODIS and AVHRR with lower spatial resolutions (The spatial resolution of MODIS data is 250 m, 500 m, and 1 km, and that of AVHRR data is 1.1 km) [11][12][13] . Most regions of China belong to the cultivation mode of individual farmers, which leads to the phenomenon of field landscape fragmentation, complex planting structure and diverse crop types. There is a seriously mixed pixel phenomenon in low-resolution images [14,15] , which cannot provide fine spatial distribution of winter wheat. The multi-spectral images collected by Landsat series satellites and Sentinel-2 satellites have higher spectral resolution (The highest spatial resolutions of their bands are 30 m and 10 m respectively), and can better extract spatial information of ground objects.
They have been widely used in crop identification and monitoring [16][17][18] , but their coverage is relatively small compared to large-scale crop area extraction studies, and many studies are limited to administrative divisions at or below the provincial level [19][20][21] .
For example, Zhang et al. [21] used Sentinel-2 imagery to extract winter wheat planting areas in four counties in China using a random forest algorithm. Ali et al. [22] mapped the winter wheat areas in the Bekaa plain, the primary wheat production area in Lebanon. These wheat area mapping studies using high-resolution images have achieved good extraction results in a small area, but there were few studies to achieve fine spatial mapping of winter wheat on a large scale. Currently, many global and national-scale mapping products for land cover and land use are available, such as the 30 m fine resolution global land cover type map from Landsat satellite(FROM-GLC) [23] , and based on Sentinel-2 data the 10 m finer resolution global land cover observation map (FROM-GLC10) [24] , these remote sensing data sets provide useful information for studying the spatial distribution and management of land cover and land use on a global scale [22] , but their main purpose is to list land cover types (such as farmland, forest, grassland, etc.) extensively, and cannot provide detailed information about specific crops.
In recent years, the emergence of the GEE cloud computing platform has made it a reality to monitor crop planting information in large-scale areas. GEE provides a concurrent processing method, has the most advanced cloud computing capabilities, and can quickly process a large number of images [5,25] . Li et al. [26] combined Landsat and Sentinel imagery to draw a 30 m planting intensity map for provincial administrative areas in 2018. The recognition of wheat, rice, soybean and other crops based on the GEE cloud platform [4,5,26] . The rice map generated by Dong et al. [27] used phenological information has achieved high accuracy, the accuracy of producer (user) is 73% (92%), and they provide a reference for the study of large-scale regional winter wheat identification using Landsat-8 image data; Xiong et al. [28] combined the 2016 Sentinel-2 and Landsat-8 data to draw an annual farmland area map with a spatial resolution of 30 m in Africa, and the prospect of their later work is to draw the growth distribution map of a specific crop; You et al. [26] drew an annual map of maize, soybeans, and rice in 2017-2019, the three northeastern provinces of China, with an overall accuracy of 81% to 86%, which provides guidance for the study of crops in other large-scale regions; He et al. [4] extracted the area of winter wheat and rapeseed in Jiangsu province based on GEE. Xu et al. [5] drew the distribution map of winter wheat in Shandong province based on the GEE platform, and the random forest classifier has been proven to have significant advantages in large-scale crop classification, but in their study, the area of interest was limited to provincial administrative units. At present, studies related to rapid mapping of wheat planting area in the Huang-Huai-Hai Plain are relatively lacking, and experiments and research are urgently needed.
Therefore, the purpose of this study was to make use of GEE platform's cloud computing capability and random forest algorithm to achieve rapid identification and mapping of winter wheat in large-scale regions on higher-resolution imagery. Spectral bands and indices, texture, and topographic feature sets of Landsat-8 imagery were constructed to evaluate the contribution of each feature to classification accuracy. The Sentinel-2 data were used to verify the effectiveness of two classification schemes, classification with all feature sets and classification with preferred feature subsets, in a sample-square manner.

Study area
The Huang-Huai-Hai Plain is in the northern part of China (113°04´E-122°43´E, 31°23´N-42°37´N) and composed of large alluvial plains of the lower Yellow River and the Huai River, and the terrain is flat. It spans seven provinces including Beijing, Tianjin, Hebei, Shandong, Henan, Anhui, and Jiangsu. It has jurisdiction over 57 prefectures and 361 counties (cities, districts), with a total land area of approximately 620 000 km 2 . With reference to "Natural Agricultural Zoning of China" and "Climate Zoning of China" (Resources and Environment Science and Data Center: http://www.resdc.cn/), the HHHP is divided into three sub-regions: north, west, and east of the HHHP (Figure 1). The HHHP is located in a warm temperate continental monsoon climate zone with a large latitude span and uneven distribution of climate resources. The annual accumulated temperature (≥10°C) is 3600°C-4900°C, the annual accumulated sunshine hours are 2300-2800 h, and the annual precipitation is 600-800 mm [29,30] , with more in the south than in the north [31] . The four seasons change significantly. HHHP is a double-cropping system of winter wheat (October 2019-June 2020) and summer maize (June 2020-October 2020). Winter wheat is usually sown in mid-to-late October and harvested in mid-to-early June in China.
Note: Three colors represent three sub-areas, and the provincial borders are marked by black vector borders Figure 1 Map of the study area and the results of the divisions

Datasets
The Landsat-8 satellite was launched by NASA in 2013. The Operational Land Imager (OLI) it carries has 9 bands. Except for band 8 which is a panchromatic band with a spatial resolution of 15 m, the other bands have a spatial resolution of 30 m (Table 1). The revisit period of the satellite is 16 d, and the imaging width is 185 km×185 km [32] . Due to the large span of the study area, the phenological periods of winter wheat in the three sub-study areas were significantly different. Therefore, all alternative images from January 30, 2020, to May 10, 2020, were selected for subsequent extraction research of winter wheat planting areas through GEE cloud platform. The multispectral imager (MSI) carried by the Sentinel-2 satellite is the only satellite with three red-side bands, which is very effective for vegetation observation [33] . The atmospherically corrected surface reflectance (SR) product of Landsat-8 OLI sensor, and the Level-2A product of Sentinel-2 data were archived in the GEE data catalog. The imagery has been orthorectified and atmospheric correction preprocessing [22,34] .
In this study, sentinel-2 images from March 25 to April 15, 2020, were selected to verify the spatial distribution of winter wheat area extraction.
There are many mountains and forests in the northeastern part of HHHP.
Digital elevation data (SRTMGL-003, spatial resolution is 30 m) was used for terrain features [35] . Elevation data from STRMGL-003 were produced by the Shuttle Radar Topography Mission (STRM) and provided by NASA Jet Propulsion Laboratory (JPL).
Due to the wide geographical area covered by the HHHP, the latitudinal span is large and the climate varies greatly, which leads to the great variation of winter wheat growth in the study area. Therefore, this study divided the study area into three sub-regions based on the "Natural Agricultural Zoning of China" and "Climate Zoning of China" (Resources and Environment Science and Data Center, http://www.resdc.cn/). The "Natural Agricultural Zoning of China" divides China into thirty-eight natural agricultural zones based on temperature and humidity zones, and the temperature zones reflect the sum of daily average temperature accumulation during the crop growing period, that is, the distribution of cumulative temperature, which varies in different latitudes and can significantly affect the crop growth. The "Climate Zoning of China" is based on the heat and moisture indexes. The two zoning maps are combined to subdivide the HHHP, and then the area of winter wheat in the sub-regions is extracted and mapped separately.

Data preprocessing
Both Landsat-8 and Sentinel-2 images need to undergo pre-processing steps such as cloud removal, cloud-free image synthesis, mosaic, and clipping. A total of 1046 Landsat-8 SR images were covered in the study area and growth period, of which 325 images were selected for this study, and their cloud coverage was less than 20%. The quality assessment band (QA) of Landsat-8 data was used to mask cloud, shadow, snow, and ice, so as to reduce the pollution of cloud, shadow and snow to images. Finally, the values of each pixel were arranged in order and the median value was taken as the final value. The vector boundary of the study area was used for clipping to obtain a high-quality Landsat image covering the study area.
The level-2A product of Sentinel-2 contains three QA bands, of which the QA60 band contains cloud mask information, which can be used to remove cirrus and thick clouds in the image [36] . In this study, the imagery in the key phenological period is acquired, and the cloud amount is controlled within 20%, and the median value of all values is taken to constitute a higher-quality image. The spatial resolution of Sentinel-2 is 10 m, 20 m, and 60 m, respectively, for different bands. Due to the red edge band that is more effective for vegetation identification, this study uses the nearest neighbor resampling algorithm to resample it to 10 m, and the resolution is 10 m wavelength band (of 9 spectral bands) as the input of the classifier to identify winter wheat.
In this study, imagery retrieval, image preprocessing, band resampling, feature index calculation, image classification and accuracy verification were all realized by writing code online in GEE.

Candidate feature description
Thirty-four features were used to identify winter wheat, including seven spectral band reflectance features (30 m spatial resolution of the original spectral band B1-B7), five commonly used vegetation indexes (Table 2), eighteen texture features, and four terrain features, to explore the influence of feature variables and their linear combination on classification accuracy. Texture is an important attribute of an image and a structural feature [37] . The near-infrared band B8, which is more effective for vegetation monitoring, was selected to calculate texture features, and the glcmTexture (size, kernel, Average) function provided by GEE could be used to achieve fast calculation [4] . In addition, there is a large area of mountains in the north and southeast of the study area, and the use of terrain features can improve the classification accuracy. Therefore, this study used SRTMGL1_003 data to construct elevation, slope, aspect, and hillshade features.

Random Forest Algorithm for selecting feature and identifying winter wheat
Random Forest (RF) is a machine learning algorithm. Due to its stable performance, high efficiency, strong adaptability and good anti-noise ability, it has been widely used in remote sensing [43,44] and has also been proved by many studies to be suitable for crop classification. RF algorithm uses an integrated learning method to integrate many independent decision trees into a forest. After random sampling of training samples, the category of unlabeled samples and the weight of each tree vote are finally determined through the voting results of each decision tree in the forest are the same [23] . In addition, many studies have shown that the random forest algorithm not only performs well in classification but also has advantages in selecting optimal features [45,46] . In order to explore the contribution of spectral bands, indices, textures and terrain features to classification accuracy, this study evaluated the importance of feature variables by scoring the sample set using random forest algorithm. Combining the results of the visual interpretation of Google earth high-definition images and Sentinel-2 satellite images, the Landsat OLI data samples are marked online in the GEE program code, from a total of seven types of continental cover (winter wheat, water, bare land, urban, forest, other vegetation, and others) selected 13 000 samples evenly distributed in the study area. The samples used by the RF classifier for training are two-thirds of the samples randomly selected from the sample set, and the remaining one-third of the samples not involved in training are called out-of-bag samples (OOB) [47] , generated by OOB data Out of bag error is expressed by the unbiased estimation of the generalization error of the random forest classifier, which can be used to evaluate the classification ability of the RF classifier and the importance of feature variables [48] . The importance score of the evaluation feature variable j (VI j ) is calculated as follows [49] : where, N represents the tree of the decision tree; OOB j Ni represents the OOB error of the decision tree i without the participation of noise. When random noise is added to feature j, if the OOB error ( OOB j Ni ) changes greatly, it means that the feature variable j has a great influence on the classification result and is more important.
The sequence forward selection method was used to score all the features using the RF algorithm, and the feature variables were input to the classifier in descending order. Each time a feature was added, the OOB sample was used to calculate the classification accuracy, and finally, a subset of different features was obtained. The classification accuracy of the constructed classification model and the dimension of the best feature subset were determined.

Accuracy assessment
This study verified the results from two aspects, spatial distribution verification and area extraction verification of winter wheat, using confusion matrix and area percentage respectively to achieve the results. The winter wheat distribution results obtained from Sentinel-2 imagery with higher spatial resolution were used as references, and the winter wheat distribution results obtained based on Landsat-8 remote sensing images were compared with the reference data, and a confusion matrix was generated based on this to evaluate the extraction effect. On the other hand, based on the winter wheat sown area data in the same growing season obtained by the National Bureau of Statistics, the area percentage was calculated for accuracy evaluation.

Confusion matrix
The most accurate method to evaluate the classification results of remote sensing images is the confusion matrix [50] , also known as the error matrix, represented by a matrix with n rows and n columns. Evaluation indicators include overall accuracy (OA), user's accuracy (UA), Producer's accuracy (PA), and Kappa coefficient (Kappa). These accuracy indicators evaluate the classification accuracy of images from different perspectives. The calculation equation of Kappa coefficient is as follows [51,52] : where, M represents the total number of pixels; m is the number of categories; A ii is the number of pixels on the diagonal in the confusion matrix; A i+ and A +i represent the sum of the number of pixels in row i.

Validation of statistical data
Estimate the winter wheat planting area based on the winter wheat distribution results extracted in the study area, and compare it with the winter wheat planting area data provided in the statistical yearbook data of each province or city. The area percentage error (PE) was used to evaluate the remote sensing recognition accuracy of winter wheat in the provinces and cities in the study area. The area percentage calculation equation is as follows: where, s represents the statistical area; c represents the estimated area. This study uses Sentinel-2 data and Landsat-8 images classification results to calculate a confusion matrix to verify the accuracy of the spatial distribution extracted by winter wheat. Follow the principle of uniformity and use the online editing function of GEE to plan ten sample plots (45 km×45 km). The distribution was shown in Figure 2. Export the sample plot to the GEE cloud disk, and crop the Sentinel-2 image and Landsat-8 classification map according to the vector boundary of the study area. According to previous studies, near the jointing stage, winter wheat is most separable from other land cover types, which is the best time to identify winter wheat [18] . Therefore, this study obtained the distribution of winter wheat in the plots extracted from Sentinel-2 satellite images from March 25, 2020, to April 15, 2020, and used the results as reference data to verify its accuracy. Since the spatial resolution of Sentinel-2 is 10 m and the spatial resolution of Landsat-8 is 30 m, in order to match the spatial resolution of the two sensors, the nearest neighbor sampling algorithm is used to resample the Sentinel-2 image to 30 m, to achieve the spatial correspondence of each pixel in order to calculate the confusion matrix.
The process of identifying winter wheat planting distribution in this study was shown in Figure 3. All Landsat-8 images (325 scenes in total) of this period were acquired in GEE, using global 30 m land cover products, after removing clouds, missing value processing, stitching, and cropping, a high-quality cloud-free image were finally obtained. The distribution information of cultivated land in 2017 shields other types of land except cultivated land to reduce errors [21] . Secondly, construct classification features, including four types of features: spectral bands, spectral indices, texture, and terrain, and score all features to screen the best features. Analyze the impact of four types of features and their combinations, and priority features on classification accuracy. Finally, two methods of spatial distribution and area percentage were used to evaluate the classification effect.

Selection of optimal classification features
This study scores 34 features generated from the 325 high-quality images covering the study area. Random forest algorithm was used to score and sort all characteristic variables (Figure 4). The results showed that the slop had the highest score (5.18), and was the most critical characteristic variable for identifying winter wheat and other ground-object types, followed by altitude, B6, and B5 (short-wave infrared and near-infrared bands). The score of the moment of inertia (B8_inertia) was the lowest (0.11) and was a relatively insignificant characteristic variable in winter wheat extraction.
In order to select the optimal dimension of the feature subset, according to the importance score of the feature, the feature variable with the highest score was input into the classifier in descending order, and the classification accuracy was calculated using OOB samples. The results showed that after adding the features with the highest scores in sequence, the accuracy increases with the increase in the number of features, until the 10th feature was input, the accuracy no longer changed significantly. Therefore, it can be explained that using the top nine features of the score as the input of the classifier to build the model can achieve the highest accuracy, but this result can only explain the situation of the sample set, and the spatial distribution of winter wheat planting still needs to use Sentinel-2 data of sample plots is evaluated by calculating the Kappa coefficient. The top 9 features with importance scores (slop, elevation, Band 6, Band 5, sum average, NDVI, difference, Band 1, Band 2) in Figure 4 were selected as the best subset of features in the study area.

Results of sample plot validation
The distribution of winter wheat based on Sentinel-2 images was regarded as the true distribution on the ground. It used all feature set and the first 9 optimum feature subset respectively. Referred to the scheme all of features collection (AF) and scheme optimum features subset (OF) respectively, which was used as the input variable of the RF classifier for training and classification, and the confusion matrix was calculated with Sentinel-2 data. The overall accuracy of the scheme AF was between 86% and 95%, the Kappa coefficient was distributed between 0.70 and 0.84, the overall accuracy of the scheme OF was between 83% and 95%, and the range of the Kappa coefficient was between 0.67 and 0.85. It can be seen from the plots map that the spatial planting of winter wheat in the five wheat fields of plots 1, 5, 6, 9, and 10 was discontinuous, and the degree of fragmentation was high. Correspondingly, their Kappa coefficients were all lower than 0.8 except for plot 6. The wheat fields of plots 2, 4, 6, and 8 were planted densely and continuously, and the Kappa values were all above 0.80. It can be seen from Table 6 that in the two classification schemes, the overall accuracy of the continuous plots in the wheat field was above 92%, and the Kappa value was at least 0.81. The recognition effect of winter wheat was slightly better than other plots. In addition, among the results of classification using the scheme AF, the mapping accuracy of 6 plots was higher than 90%, and the mapping accuracy of the scheme OF classification was higher than 90%, and there were 7 plots, indicating that the mapping accuracy of the scheme OF was better. However, its average overall accuracy was 0.4% lower than the scheme AF, and the Kappa value was 0.01 lower. The results showed that the classification scheme AF did not show obvious advantages in the recognition ability of winter wheat compared with the scheme OF. Both schemes can provide valuable information for future large-scale crop classification and extraction research. Note: Green represents winter wheat; 1-10 represents plot number; AF represents the spatial distribution of Landsat winter wheat based on all candidate features; OF represents the distribution result of Landsat winter wheat classified using optimum features; S represents Sentinel-2 data Winter wheat distribution results. Figure 6 Comparison of plot verification

Mapping the spatial distribution of winter wheat
In this study, 34 feature variables and preferred feature subsets were used to compare and study the effect of the construction of different feature spaces on the extraction of winter wheat area. The results showed that the difference in the recognition effect of the two methods for winter wheat was small. In this case, it is more convenient to extract winter wheat by using the optimized feature subset. The spatial distribution of winter wheat planting in the HHHP area was extracted from 34 feature variables using a random forest classifier, as shown in Figure 7. The planting scale of winter wheat in the study area is relatively large, but the ground features are complex and diverse, covering a large area of mountains and forests in the northern part of the HHHP, and no winter wheat is planted. Winter wheat is sparsely distributed in the areas north of the Yellow River and adjacent to the east of the Yellow River, and the planting structure is relatively complex and the field landscape is fragmented. The spatial distribution of winter wheat plots in the south of the Yellow River and the Huai River area is relatively continuous.

Verification of the area percentage of winter wheat extraction
The winter wheat statistical data of seven provinces were used for data verification (Table 4), and the area percentage error (PE) of the total area was only 5.42%. Among them, the area percentage of four provinces was less than 10%, and only Hebei Province's winter wheat area extraction was underestimated, while the other provinces had slight overestimation, especially Beijing and Tianjin, with percentage errors of 26.36% and 17.22%, respectively.

Discussion
In predecessors' research on crop recognition based on remote sensing images, a large number of winter wheat extraction and mapping methods have been developed [53][54][55]14] . However, their research left many problems to be solved in the future. For example, Xu et al. [5] mapped the distribution of winter wheat in Shandong Province, and realizing the map of winter wheat area in a larger area was the focus of their later work; Zhang et al. [47] focused on identifying winter wheat in the early season from the perspective of phenology, but lacked discussion on the characteristics that affect the identification of winter wheat; You et al. [29] used random forest algorithm to draw 10 m crop type maps of annual maize, soybean and rice in the three northeastern provinces, which provided powerful support for the mapping prospects of winter wheat. This study used the GEE platform to realize the rapid mapping of winter wheat on a large regional scale, which makes up for the deficiencies in the above research and provides references and case support for the remote sensing monitoring of large-scale crops.

Winter wheat mapping using all feature sets
This study first scored all the feature variables, selected the top 9 features to form an optimum feature subset, and evaluated the feature set and its combination using a cross-validation method. It is concluded that the spectral bands and indices are the main factors affecting the recognition accuracy of winter wheat. Sentinel-2 data were used for the final accuracy evaluation through the sample plots verification method. The average Kappa value of the two classification schemes differed only by 0.01, and the average overall accuracy differed only by 0.4%.
Considering the computing performance of the computer and the large workload, the optimum feature subset can be selected first for classification mapping, but this study used the supercomputing power of GEE to select the scheme with higher precision, that is, all feature sets were classified.

Factors influencing the accuracy of winter wheat mapping
From the results of the confusion matrix calculated from the sample data of Sentinel-2 and Landsat-8, it can be seen that the producer accuracy of plot 2 and plot 8 is relatively high, while plot 1 and plot 9 are relatively low, and the producer accuracy is 76.4% and 81.6%, respectively. This may be caused by two factors, firstly, China's individual planting patterns dominate, leading to fragmentation of the distribution of cultivated land [55] ; secondly, due to the differences in climatic conditions and wheat phenology, there are many phenomena of "same substance with the different spectrum, and different substance with same spectrum" when visually interpreting winter wheat in this region. It can also be observed in the plot distribution map that the producer accuracy is higher in areas where the planting distribution is continuously concentrated (Plots 2, 3, 4, 8), while the producer accuracy is lower in the areas where the planting distribution is fragmented (Plot 1, 5, 10).

Effects of rapeseed on winter wheat mapping
In winter crops, the spectra and phenological periods of rapeseed and winter wheat are similar, and they are more prone to misclassification. According to official statistics, the sown area of rapeseed accounts for only 0.7% of the total sown area of crops in the study area [2] , but the influence of rapeseed on the extraction accuracy of winter wheat cannot be completely ruled out.
According to the related research shows that when the winter wheat at the jointing, stage rapeseed at flowering period, the true color (red, green, and blue wave period of combination shows) image on rapeseed presented is bright yellow, and winter wheat is full green, from the standard false-color (near red, red, and green band combination display) on the images of the same can distinguish, pink purple is rapeseed, The red is winter wheat [56] , which can be combined with the above two different image display methods for visual interpretation to distinguish rape and winter wheat. In this study, winter wheat was selected from the greening stage to the jointing stage to extract winter wheat [4] , and the pink-purple ground features were classified as "other vegetation" as a ground feature type, thereby reducing the influence of rape on the extraction results.

Uncertainty and limitation analysis
This study explored the identification and extraction of large-scale wheat fields and obtained preliminary results. However, this study still has some uncertainties and limitations that need to be further resolved by follow-up work. First of all, this study divided the study area into 3 sub-areas according to the agroclimatic divisions, and only one winter wheat growing season from 2019 to 2020 was selected for the study. From the perspective of obvious differences in winter wheat phenology, a good result has been achieved in identifying winter wheat. The use of time series or phenology knowledge to achieve multi-year regional-scale spatial mapping of winter wheat is the focus of our later work. Secondly, this study used Sentinel-2 data to verify the results in space. Due to the time limitation and actual conditions, a comprehensive field survey was not done. Only detailed field interpreting was made from the high-definition images of Google Earth and GEE platforms. There are certain deviations in the interpretation of highly mixed and broken areas of crop planting. Therefore, there is an urgent need for field survey data to improve the verification part of the results of this study.

Conclusions
The results of the study showed that the GEE cloud platform has great potential in large-scale identification and mapping of winter wheat. The short-wave infrared band B6 (1610 nm) and the near-infrared band B5 (865 nm) of Landsat-8 satellite imagery, as well as the slope and altitude of the elevation data, are the most effective features for distinguishing winter wheat from other ground features, and the texture features sum average and the vegetation index NDVI was more advantageous than other features in the identification of winter wheat. The optimum feature subset can also be applied to the extraction and mapping of winter wheat and other crops when computer computing performance is limited. The overall validation accuracy was in the range of 86% to 95%, and the Kappa coefficient was between 0.70 and 0.84, which accurately describes the spatial distribution of winter wheat. The results also indicated that random forest algorithm performs well in winter wheat identification. In this study, before the winter wheat harvest, the data resources of the GEE platform and the RF classification algorithm can be used to identify and map the main wheat-producing areas in a large-scale area of the HHHP, and obtained relatively reliable spatial distribution results of winter wheat, which made up for the shortage of research on winter wheat extraction under the condition of large-scale, complex climate type, limited storage space and operational memory. The current research is the basic work of crop phenological period observation and crop yield estimation. It also provides a valuable reference for government departments for agricultural decision-making and food security issues.