Integrating field images and microclimate data to realize multi-day ahead forecasting of maize crop coverage using CNN-LSTM

: Crop coverage (CC) is an important parameter to represent crop growth characteristics, and the ahead forecasting of CC is helpful to track crop growth trends and guide agricultural management decisions. In this study, a novel CNN-LSTM model that combined the advantages of convolutional neural network (CNN) in feature extraction and long short-term memory (LSTM) in time series processing was proposed for multi-day ahead forecasting of maize CC. Considering the influence of climate change on maize growth, five microclimatic factors were combined with historical maize CC estimated from field images as the input variables of the forecasting model. The field experimental data of four observation points for more than three years were used to evaluate the performance of CNN-LSTM at the forecasting horizon of three to seven days ahead and compared the forecasting results to CNN and LSTM. The results demonstrated that CNN-LSTM obtained the lowest RMSE and the highest R 2 at all forecasting horizons. Subsequently, the performance of CNN-LSTM under univariate (historical maize CC) and multivariate (historical maize CC+microclimatic factors) input was compared, and the results indicated that additional microclimatic factors were effective in improving the forecasting performance. Furthermore, the 3-day ahead forecasting results of CNN-LSTM in different growth stages of maize were also analyzed, and the results showed that the highest forecasting accuracy was obtained in the seven leaves stage. Therefore, CNN-LSTM can be considered a useful tool to forecast maize CC.


Introduction
Precision agriculture (PA) is the new trend of agricultural development in the world and the rudiment of agricultural development in the future [1,2] .The key goal of PA is to improve crop yield and quality while optimizing inputs, improving efficiency, and reducing environmental pollution [3] .Real-time observation of the changes in soil nutrient factors, climatic conditions, and crop status during the crop growth stages is the prerequisite of PA [1] .Therefore, accurately tracking and even forecasting the changes in these variables is the key to realizing PA.
Crop coverage (CC) is usually defined as the vertical projection of green parts of the crop (including leaves, stems, and branches) to the ground surface, and expressed as a percentage of the reference area [4,5] .CC is not only an important trait to represent crop growth and development but also has shown sufficient correlations with other crop parameters that are difficult to measure, such as leaf area index (LAI) [6] and canopy light interception [7] .In different growth stages of crops, CC can directly or indirectly provide the basis for agricultural managers to carry out corresponding agricultural activities.In the seedling stages, CC can be used to judge whether the plants are too dense, which provides help for thinning and fixing seedlings.In the rapid growth stages (such as from the seven leaves to the jointing stage of maize), the stems and leaves of the crops grow rapidly, and the soil moisture content and nitrogen status of crops can be inverted according to CC, which provides useful information for precision irrigation and fertilization [8,9] .In the reproductive growth stages (such as from the tasselling to the maturity stage of maize), CC has a strong correlation with crop yield, which provides a basis for yield prediction [10] .Therefore, continuous CC monitoring and forecasting in the whole growth stages of crops can explain the causes of differences in crop growth and provide a decision-making basis for agricultural producers [11] .
Digital photograph taken in the field is an effective approach for estimating CC because of the advantages of high spatial resolution, immediate utility, and low price [12] .Chianucci et al. [13] verified the feasibility of using digital photographs to estimate CC and applied it to crops, including aromatic plants.Coy et al. [14] introduced an unsupervised, threshold-based segmentation method to estimate CC from digital photographs, and achieved good results in the test of four different crops (maize, oat, flax, and rapeseed).In summary, the estimation of CC from digital photographs is mainly achieved by dividing the number of pixels of target crops by the total number of pixels in the entire digital photographs.However, all these studies mentioned above mainly focus on the real-time estimation of CC from digital photographs, while the ahead forecasting of CC has received little attention.Notably, the ahead forecasting of CC can make agricultural managers master the growth trends of crops, which is of great significance for guiding agricultural management decisions and realizing PA.
The performance of the forecasting model can be improved by using the influencing factors related to the forecasted quantity as additional inputs [15] .For example, Matsumura et al. [16] used multiple linear regression and ANN to forecast the maize yield of Jilin Province, with climatic conditions and the historical yield as input of the forecasting models, and achieved good forecasting results.Ferreira and da Cunha [17] assessed the potential of five models to forecast daily ET O up to seven days ahead using three input combinations, and the results showed that the best-performing models were obtained by using input containing additional data.Meanwhile, many studies have shown that climatic factors such as temperature and precipitation will have an impact on crop growth [18,19] .Therefore, the maize CC forecasting model established in this study fully considered the microclimate data affecting maize growth and adopted multivariate inputs.
Recently, deep learning algorithms have attracted much attention and have been applied in many fields, which are superior to traditional machine learning algorithms [17] .Long short-term memory (LSTM) [20] and Convolutional neural network (CNN) [21] are the most effective and widely used deep learning methods.For time series forecasting, LSTM can effectively capture series pattern information, while CNN can automatically extract more valuable features.However, standard LSTM is difficult to filter the noise in the input data, while standard CNN is difficult to solve the problem of long-term dependence [22] .Therefore, a time series forecasting model combining the advantages of CNN and LSTM could obtain better forecasting performance.
Considering the importance of maize CC forecasting, this study proposed a novel forecasting model to realize the multi-day ahead forecasting of maize CC by integrating field images and microclimate data.The main contributions of this study were summarized as follows: 1) the multivariate input was constructed by introducing additional microclimatic factors related to maize growth, which can provide more abundant information for the forecasting model; 2) a hybrid CNN-LSTM model that combined the advantages of CNN and LSTM was proposed to realize multiday ahead forecasting of maize CC; 3) the field experimental data collected at four observation points for more than three years provided sufficient data for the evaluation of the forecasting model.

Study region and observation data 2.1 Study region
In this study, four observation points (A1, A2, A3, and A4) were set up in three regions, of which A1 was located in Zhengzhou, Henan Province, China (34.46°N, 113.40°E),A2 and A3 were located in Tai'an, Shandong Province, China (36.11°N, 117.08°E), and A4 was located in Gucheng, Hebei Province, China (39.13°N, 115.67°E).These three regions are typical maize-planting regions in China.During the experiment, wheat-maize intercropping technology was adopted, and the sowing time and cultivation mode of crops were consistent with those of the local agricultural practice.Maize was sown with a row spacing of about 90 cm and a plant spacing of about 19 cm.Maize growth stages are from June to early October, which takes about 100 d from sowing to harvesting.

Observation data 2.2.1 Collection of field images
At each observation point, a digital camera (E450 Olympus) installed on a bracket about 5 m above the ground was used to capture the field images.The fixed-focus method with a focal length of 16 mm was used for shooting.The resolution of the captured images was 3648×2736 pixels and the corresponding actual area at the time of sowing was 30 m 2 , and the area would decrease accordingly with the growth of maize plants.
Field images of maize from sowing to harvesting were collected at observation points A2, A3, and A4 from 2011 to 2013, respectively, while at A1 from 2010 to 2013.On each experimental day, the field images were taken every hour from 9:00 to 16:00, so eight field images could be captured every day.The field images of maize from sowing to harvesting at one observation point were taken as one image series.Therefore, thirteen maize field image series were collected in this study.Detailed information on each maize field image series is listed in Table 1.Note: A1, A2, A3, and A4 represent four observation points that were set up in three regions, A1 was located in Zhengzhou, Henan, China, A2 and A3 were located in Tai'an, Shandong, China, and A4 was located in Gucheng, Hebei, China.

Estimating maize CC from field images
The core of estimating maize CC from field images is to segment maize accurately.There are two most common approaches: threshold-based approaches and machine learning-based approaches [14] .Many researchers used digital images to estimate CC without considering the problems caused by natural light changes or light reflection on crop leaves [11] .
Owing to the field images in this study being mainly captured in summer, the effects of natural light changes and crop leaves reflection must be considered.The maize images from the field were segmented using a method as in Ye et al. [23] , who proposed a novel crop segmentation method by using a probabilistic superpixel Markov random field.This segmentation method can improve crop segmentation performance under strong illumination or shadow.After segmenting, the percentage of the number of pixels in the green part of maize to the total number of pixels is expressed as maize CC (0-100%).A typical maize field image under strong natural light is shown in Figure 1a along with the corresponding segmentation result shown in Figure 1b.By calculating the percentage of green pixels in Figure 1b, the maize CC is 59.68% could be obtained.
As previously mentioned, eight field images were taken at each observation point every day.The average value of maize CC estimated from eight field images was taken as daily maize CC on that day.

Collection of microclimate data
Each study region had an agricultural microclimate observation system, which was used to collect microclimate data of maize from sowing to harvesting.Five microclimatic factors were obtained, including cumulative precipitation, maximum temperature, relative humidity, dew point temperature, and maximum air pressure (Table 2).The microclimate data, collected on an hourly timescale, were converted to a daily timescale.The daily cumulative precipitation was the sum of the hourly cumulative precipitation, the daily maximum temperature and air pressure were the maximum values of the hourly maximum temperature and air pressure, respectively, and the daily relative humidity and dew point temperature were the average values of the hourly relative humidity and dew point temperature, respectively.

Data management
In this study, the daily maize CC and its corresponding microclimate data were taken as observation data, so 1338 observation data were collected at four observation points for more than 3 years.All these observation data were divided into two parts: a training set and a testing set.Particularly, the data from 2010 to 2012 were used for model training and parameter optimization, including 931 data, and the data from 2013 were used to test the forecasting performance of the models, including 407 data.

Construction of multivariate input matrix
The collected observation data is used to form a time series dataset X, X=[x 1 , x 2 , x 3 , …, x t , …, x D ] T , where D is the total number of days of observation data and x t is a row vector representing maize CC and microclimate data at time t, .Among them, is the maize CC, and to are microclimate data, and M is the number of variables of observation data.
Notice that the time series dataset should be converted into an input matrix suitable for the forecasting model.Suppose that the "look back window", which is the number of days of previous historical data that are taken into consideration by a forecasting model to make forecasting [24] , is set to L, and the forecasting horizon is set to d days ahead, which means that the maize CC on the next dth day would be forecasted.As shown in Figure 2, the input matrices of the forecasting model at time t and t+1 are X t and X t+1 , x 1 t+d respectively, and the dimension of the input matrix is L×M.The goal of this study was to train the forecasting model so that when the input matrix is X t , the output is .That is to say, the maize CC on the next d-th day is forecasted by using the maize CC and microclimate data of previous L days.

Microclimate data CC
Figure 2 Input matrices of the forecasting model at time t and t+1 3.2 Proposed model CNN [25] and LSTM [26] are the two main branches in the field of deep learning.CNN is good at extracting features from input data, while LSTM is effective for processing time series data.Inspired by this, a hybrid deep learning model (CNN-LSTM) is proposed to combine the advantages of CNN and LSTM.

CNN
The main difference between CNN and traditional neural networks is that CNN has the characteristics of local connection and weight sharing, so the training parameters of CNN are less than those of traditional neural networks [27] .The convolutional and pooling layers are the most important layer structures in CNN.The role of the convolutional layer is to automatically extract features from input data through convolutional kernels, and the role of the pooling layer is to reduce the size of feature maps.The convolutional kernel used in this study was one-dimensional, which was determined according to the dimension of the input data, and was called 1D CNN [15] .

LSTM
LSTM is a variant of Recurrent Neural Network (RNN).It not only has the advantages of traditional RNN, but also solves the problem of gradient explosion or vanishing of traditional RNN in the process of back-propagation [28] .In particular, LSTM introduces a memory cell to store historical information, and combines three control gates (input, output, and forget gates) to read and write the memory cell.The decisions of the three control gates are all dependent on the previous output h t−1 and the current input x t .The internal structure of LSTM unit is shown in Figure 3.More specifically, the equations of LSTM can be described as Equations ( 1)-( 6):  In order to handle time series data, L LSTM units are cascaded in turn to form an LSTM layer (Figure 4), in which each LSTM unit corresponds to a time slot.All LSTM units share weights and biases.Each row of the input matrix X t is fed to the corresponding LSTM unit in the LSTM layer.The outputs of the j-th LSTM unit at time t, i.e., h t−L+j and c t−L+j , are part of the inputs of the next LSTM unit.Therefore, the outputs of each LSTM unit are not only dependent on its inputs but also dependent on the inputs of the previous LSTM units.
The unfolded LSTM layer structure

CNN-LSTM model
In order to combine the advantages of CNN and LSTM, a hybrid CNN-LSTM model was proposed.Firstly, the convolutional layer was used to automatically extract useful local features from fixed-length input data, and the max pooling layer was used to reduce the size of the extracted features.Then, the output was used as the input of the LSTM layer to learn the long short-term dependencies in time series data.Finally, the output of LSTM was used as the input of the full connection layer to obtain the final forecasting result.The number of convolution layers, LSTM layers, and full connection layers in the built network was obtained through many rigorous tests so that the proposed model can obtain the best forecasting performance.The structure of the hybrid CNN-LSTM framework is depicted in Figure 5, including one input layer, one 1D convolutional layer, one max pooling layer, one LSTM layer, and two fully connected layers.
The parameters of the convolutional layer were set as follows: the number of convolutional kernels equaled to 32, the size of convolutional kernel equaled to 2, the activation function was set to Rectified Linear Unit (ReLU), and the padding was set to "same".In the max pooling layer, the following settings were applied: the size of pooling kernel equaled to 1, the stride equaled to 2, and the padding was set to "same".The number of neurons in the LSTM layer was set to 60.The number of neurons in the first fully connected layer and the second fully connected layer (output layer) was set to 10 and 1, respectively.In the training process of CNN-LSTM model, we employed mean absolute error as the loss function and used Adam [29] optimizer to update the network weights and biases by minimizing the loss function.The learning rate was set to 0.001, and the training epochs were set to 300.In addition, the lookback window was set to 5 d, and considering that sufficient time should be reserved for agricultural producers to make management decisions, the forecasting horizon was set to three to seven days ahead.

Design of experiments
In order to evaluate the forecasting performance of the CNN-LSTM model, we compared it with CNN and LSTM models (Figure 6).All three models were implemented on Python 3.5 by using Keras 2.2.2.
CNN model consisted of one input layer, one 1D convolutional layer, one max pooling layer, one flatten layer and two fully connected layers.The parameters of the convolutional layer, max pooling layer, and fully connected layers of the CNN model were the same as those of the CNN-LSTM model.The flatten layer had no parameter to set, and its role was to convert multi-dimensional features into one-dimensional data.The LSTM model included one input layer, one LSTM layer, and two fully connected layers.The parameters of LSTM layer and fully connected layers of LSTM model were the same as those of CNN-LSTM model.Furthermore, the loss function, optimizer, learning rate, training epochs, lookback window, and forecasting horizon of CNN and LSTM models were the same as those of CNN-LSTM model.

Data preprocessing
To eliminate the negative effects of variable differences on the performance of the forecasting model, all six variables (maize CC and five microclimatic factors) were normalized into [0, 1] according to Equation (7).It is worth noting that the minimum and maximum values of each variable were calculated only by the training data, not by the testing data.
where, x and x norm represent the original and normalized values,

Model evaluation
The performance of the forecasting model is evaluated by using Root Mean Square Error (RMSE) and coefficient of determination (R 2 ) [30,31] .Obviously, for a good forecasting model, RMSE should be close to 0 and R 2 should be close to 1.They are defined as Equations ( 8) and (9).
ŷi ȳ where, y i is the observed maize CC, is the forecasted maize CC, is the mean of the observed maize CC, and N is the number of samples in the testing set.

Analysis of the changing trend of maize CC
In this study, 13 field image series of maize from sowing to harvesting were collected.After estimating maize CC from field images, the change curves of maize CC can be obtained.Taking 2013 as an example, the change curves of maize CC at four observation points are shown in Figure 7. Overall, maize CC shows a slow rise at first, then a rapid rise, then relatively stable, and finally a slow decline in the trend.However, the change trends of maize CC at different observation points are also different.For example, the data fluctuation at observation point A1 is the most obvious, followed by A4, while A2 and A3 are relatively stable.The difference in change curves of maize CC at four observation points is mainly related to climatic conditions.Therefore, in the ahead forecasting of maize CC, the traditional strategy of using only historical maize CC is difficult to obtain accurate results, while the strategy of combining additional microclimate factors can solve this problem.According to the microclimate data of maize during the total growth stage at four observation points in 2013, it was observed that the average maximum temperature and cumulative precipitation at A1 were 32.7°C and 125.8 mm, respectively.The temperature was high and the precipitation was low, so drought often occurred, causing maize leaves to turn yellow and wilt, and after rain, maize leaves returned to green and unfold, which led to the fluctuation of the changing curve of maize CC.A2 and A3 shared an agricultural microclimate observation system, the average maximum temperature and cumulative precipitation at A1 were 30.8°C and 339.1 mm, respectively.The temperature was suitable, and the precipitation was moderate and evenly distributed.Few maize leaves showed drought and wilt, so the change curve of maize CC was relatively stable.The average maximum temperature and cumulative precipitation at A4 were 28.8°C and 373.3 mm, respectively.The temperature was low, and the precipitation was abundant but unevenly distributed, which would also lead to slight fluctuations in maize CC.
Due to the accumulation or lag effect of microclimatic factors, there are some obvious data fluctuation points ( A, B, C, D, E, F, G) in the change curves of maize CC (Figure 7).The average maximum temperature and cumulative precipitation between data fluctuation points are listed in Table 3.It can be seen that in the range of A-B, C-D, and F-G, there was basically no precipitation and the temperature was relatively high, which made maize leaves wilt and led to a decrease in maize CC.In the range of B-C, D-E, and G-H, there was sufficient precipitation and the temperature was suitable, which made the wilted maize leaves unfold again, and led to the increase of maize CC.

Comparison of forecasting performance of different models
The performance of three models at the forecasting horizon of three-seven days ahead is listed in Table 4 and Figure 8.The RMSE is ranked in the descending order of CNN, LSTM, and CNN-LSTM, which is completely consistent with the ascending order of the R 2 .In addition, the performance of all models suffers from a loss with the increase of the forecasting horizons.However, even in the 7 d ahead forecasting, CNN-LSTM still obtains acceptable forecasting accuracy with the RMSE of 8.260% and the R 2 of 0.922.Furthermore, the advantages of CNN-LSTM are more obvious with the increase in the number of ahead forecasting days.For example, compared with the second-ranked LSTM, the RMSE of CNN-LSTM decreases by 0.48% and the R 2 increases by 0.10% in the 3day ahead forecasting, while the RMSE decreases by 9.53% and the R 2 increases by 1.99% in the 7 d ahead forecasting.This indicates that CNN-LSTM has better performance for long-term forecasting.
In practical application, the forecasting accuracy for more distant days ahead is what we are most concerned about, because the more distant days ahead, the earlier the decision-making that agricultural producers can make.Therefore, CNN-LSTM is considered to be the optimal model for multi-day ahead forecasting of maize CC.In order to further illustrate the difference in the forecasting performance of the three models, the comparison of forecasting results of the three models at the forecasting horizon of three, five, and seven days ahead is given in Figure 9.In general, all three models can track the change of maize CC at three forecasting horizons, but their forecasting accuracy is slightly different.In terms of different forecasting horizons, the forecasting accuracy of all three models decreases with the increase of the forecasting horizons.In terms of different observation points, the forecasting accuracy of all three models at A2 and A3 is higher than that at A1 and A4 because the change of maize CC at A2 and A3 is relatively stable and that at A1 and A4 fluctuates obviously.However, the forecasted maize CC values using CNN-LSTM are closer to the observed maize CC values than using CNN and LSTM at the same forecasting horizon and observation point.This indicates that the combination of CNN and LSTM achieves additional performance gains over the use individually of CNN and LSTM.

Comparison of the forecasting performance of CNN-LSTM under different input types
To better illustrate the contribution of the introduced microclimate data to the forecasting performance of CNN-LSTM, the RMSE, and R 2 of different forecasting horizons obtained by CNN-LSTM under univariate and multivariate inputs are listed in Table 5.Among them, univariate refers to only one variable, maize CC, which is contained as the input of CNN-LSTM, while multivariate refers to five microclimatic variables that are also contained besides maize CC.Meanwhile, CNN-LSTM used for two different input types has the same network structure and parameters.
Compared with univariate input, the mean RMSE of multivariate input decreases by 3.36% and the mean R2 increases by 0.34%, and  the forecasting accuracy of multivariate input is better than that of univariate input at all forecasting horizons.In addition, existing studies have shown that the stability of the forecasting model will be better if the influencing factors are related to the forecasted quantity as an additional input of the forecasting model [32] .Therefore, from the perspective of forecasting accuracy and model stability, CNN-LSTM under multivariate input is considered the optimal input type.

Analysis of forecasting results of CNN-LSTM in different growth stages of maize
From sowing to harvesting, maize includes the following growth stages: emergence, three leaves, seven leaves, jointing, tasselling, flowering, spinning, and maturity stage [33] .Taking observation point A3 in 2013 as an example, the dates of maize entering different growth stages and the corresponding days after sowing obtained by the field observation of the observers are listed in Table 6.In different growth stages, the change trends of maize CC are quite different, so the corresponding forecasting accuracy is also various.According to the characteristics of maize in different growth stages, it can be divided into four periods: emergence-three leaves stage, seven leaves the stage, jointing-flowering stage, and spinning-maturity stage.The 3-day ahead forecasting results of CNN-LSTM in different growth stages are analyzed.
As shown in Figure 10, the forecasted values are basically consistent with the observed values during the total growth stage of maize.From the emergence to the flowering stage, maize CC increases continuously and tends to be stable in the later periods, while from the spinning to the maturity stage, maize CC begins to decrease, accompanies by fluctuations.As shown in Figure 10a, the forecasting errors in several days are large, which may be due to the maize CC being relatively small and easy to be disturbed by weeding and other agricultural activities.As shown in Figures 10b, maize CC increases approximately linearly, and CNN-LSTM can well capture this linear change relationship, so the highest forecasting accuracy is obtained, and the fitted line almost coincides with the 1:1 line.As shown in Figure 10c, the early part of this period is the jointing stage.Maize grows most vigorously, and the ability of CNN-LSTM to capture rapid changes in maize CC is insufficient, which leads to low forecasting accuracy.However, in the latter part of this period, maize CC tends to be stable, and the coincidence degree between forecasted values and observed values is high.As shown in Figure 10d, the maize CC curve has fluctuations due to climatic conditions, but CNN-LSTM can still accurately track and forecast maize CC.This is because we fully considered the microclimatic factors affecting maize CC when establishing the forecasting model.
Comparisons between observed crop coverage (CC) and forecasted CC   In this study, a novel forecasting model CNN-LSTM was proposed to forecast maize CC, which combined the advantages of CNN in feature extraction and LSTM in time series processing.The forecasting performance of CNN-LSTM was compared with that of CNN and LSTM at the horizon of three to seven days ahead, the results showed that CNN-LSTM had the lowest RMSE and the highest R 2 , followed by LSTM, and the worst was CNN.Furthermore, the forecasting performance of CNN-LSTM under univariate and multivariate inputs was compared, and the results showed that multivariate input performed better than univariate input, which indicated that the additional microclimatic variables had a positive effect on the improvement of the forecasting performance.Finally, taking observation point A3 in 2013 as an example, the 3 d ahead forecasting results of CNN-LSTM in different growth stages of maize were analyzed, and the results indicated that the forecasting accuracy of CNN-LSTM in different growth stages of maize was different, and the highest R 2 was obtained in the seven leaves stage.Overall, this study showed the potential of using CNN-LSTM to forecast maize CC by integrating field images and microclimate data.

Figure 1
Figure 1 Example of maize field images taken at 10 o'clock on July 29, 2010, at observation point A1 and the corresponding segmentation result ) ct ⊙ where, σ represents the sigmoid function, g represents the hyperbolic tangent function, W ix , W fx , W ox , and W cx are input weight matrices, W ih , W fh , W oh , and W ch are recurrent weight matrices, b i , b f , b o , and b c are bias vectors, is the candidate state of the memory cell, and stands for the element-wise multiplication operation.
: Long short-term memory.

Figure 3
Figure 3 Internal structure of LSTM unit

Figure 5
Figure 5 Network structure of the forecasting model based on CNN-LSTM

Figure 7
Figure 7 Change curves of maize CC at four observation points A1, A2, A3, and A4 in 2013

Figure 6
Figure 6 Network structures of the forecasting models based on CNN and LSTM

Figure 8
Figure 8 Comparison of RMSE and R 2 obtained by different models at the forecasting horizon of three to seven days ahead

Figure 9
Figure 9 Comparison of forecasting results of different models at the horizon of 3-d, 5-d, and 7-d ahead

Figure 10
Figure 10 Comparisons and regression plots between observed CC and forecasted CC, respectively