Estimation of chlorophyll content in pepper leaves using spectral transmittance red-edge parameters

: The objective of this work was to monitor the growth status of pepper and provide precise guidance on fertilization through non-destructive detection methods for chlorophyll content based on spectral transmittance. The analysis of the narrower red-edge spectral region (680-760 nm) reduced the requirements for light sources and light detection sensors, and provided a simpler and more accurate method of data acquisition for the process of developing instruments for estimating chlorophyll content in leaves. The red-edge region of spectral transmittance was demonstrated to be closely related to chlorophyll content. Regression models for estimating chlorophyll content with seven different methods were developed using the four red-edge parameters extracted from the red-edge region. The problems of multicollinearity of red-edge parameters and errors in model coefficients were solved by the ridge regression method in the process of building a multivariate regression model. The results indicated that the ridge regression method reduces the errors of the model coefficients and constant terms while improving the detection accuracy, thus the ridge regression model could estimate the leaf chlorophyll content more accurately and repeatedly.


Introduction 
The chlorophyll content is an important indicator of photosynthesis and nutrient content of crops [1][2][3] , and the study of crop chlorophyll content can provide a reliable basis for precision agriculture [4] . How to measure the chlorophyll content of leaves accurately and quickly is important for monitoring the growth of plants and guiding agricultural production. Numerous studies have shown that spectroscopic techniques are effective methods for the rapid detection of chlorophyll content in leaves, obviating the disadvantages of inefficient traditional chemical analysis methods and helping achieve real-time collection and analysis of data, and uninterrupted monitoring [5][6][7][8] .
When studying chlorophyll content detection based on spectroscopy, researchers have used plant leaf reflection spectra for nutrient monitoring of field crops. Researchers used UAV remote sensing imagery to calculate multiple vegetation indices, using crop canopy reflectance spectra to model the nutrient monitoring of multiple peanuts, potatoes, maize, and apple trees [9][10][11][12][13] . Timea et al. [14] established a nonlinear estimation model of the total chlorophyll content of bell pepper at maturity using visible, near-infrared, and short-wave infrared reflection spectroscopy. The above studies have achieved certain results, but there is a long light range between the sensors and the sample in the reflection spectral data acquisition, and the detection needs to be carried out under conditions of a clear, unclouded sky, which prevents real-time data acquisition, while the detection results are easily affected by external environmental factors. In the detection study of crop leaf lipid content based on spectral transmittance, it is often necessary to utilize a wide spectral range. He et al. [15] used 560-940 nm as the characteristic band to establish spectral feature parameters for rapid nondestructive estimation of chlorophyll content in rice, cucumber, and tomato leaves. Ding et al. [16] used the absorbance at 384-763 nm as the characteristic band for chlorophyll content diagnosis in greenhouse tomato leaves. The spectral range used in the two aforementioned articles exceeds 380 nm, and spans the visible and near-infrared regions, although the detection results are accurate, the requirements for detection equipment range are high, and the development of related detection instruments and equipment is difficult and costly. Thus, this study improves on the spectral transmittance acquisition method and constructs a spectral transmittance acquisition system with streamlined structure and accurate measurement. An estimation model with narrow band measurement, simple measurement, and high detection accuracy is constructed to provide technical support for the development of a spectral-based chlorophyll content detection instrument.
The study of the spectral properties of plants revealed that a steeply rising red-edge region is formed in the range of 680-760 nm due to the strong absorption of red light and strong reflection of near-infrared light by leaves [17] , and many studies have shown that the red-edge parameter representing the characteristics of the red-edge region is strongly correlated with the chlorophyll content of leaves [18,19] . Ding et al. [20] extracts the red-edge position of tomato blade reflection spectrum using six different calculation methods, and established a variety of predicted chlorophyll content linear models. Zheng et al. [21][22][23] used hyperspectral remote sensing technology to model leaf chlorophyll and nitrogen content monitoring by calculating spectral red-edge parameters of potato, winter rape, and maize canopy leaves. The abovementioned authors used reflection spectroscopy, which yields data susceptible to interference by external environmental factors, and the reflection spectrum has high requirements on light intensity and light path angle. However, transmission spectra can be acquired with a stable light source fixed directly above the test sample. Many researchers have used transmission spectroscopy to increase the accuracy of quantitative chlorophyll measurements. Wang et al. [24] successfully used transmission spectroscopy to detect chlorophyll content in tomato leaves. Raymond et al. [25] calibrates chlorophyll meter using transmittance spectra of leaves. Zhang et al. [26] establishes a new vegetation index using transmittance spectroscopy and used it to successfully monitor chlorophyll concentration in rice leaves. In order to improve the accuracy of detection data, this study used spectral transmittance, eliminating the errors caused by the environment for spectral data, and quantitative analysis of chlorophyll content based on the red-edge region.
In this study, pepper was used as plant material; the red-edge parameter was applied to the spectral analysis of leaves by using their spectral transmittance at different leaves of nitrogen (N) application, the red-edge region characteristic parameters of the spectral transmittance of pepper leaves were calculated using the first order derivative method, and various regression models for chlorophyll content estimation based on the red-edge parameter were constructed, so as to achieve a simple, efficient and accurate estimation of chlorophyll content of pepper leaves, and provide a theoretical basis for the accurate fertilization of pepper plants and the development of related instruments.

Pepper cultivation
The experiment was conducted using pepper (Capsicum annuum L., Jilin Pepper 16) as plant material. The study was repeated three times in a solar greenhouse at the Jilin Academy of Vegetable and Flower Sciences (125°23ʹ37.1ʹʹE, 43°49ʹ51.9ʹʹN) from April 2018 to November 2019. Seedlings were cultivated in the same cultivation tank (width 40 cm, high 30 cm, length 600 cm) and 16 plants were planted in each tank, using a 3:1 ratio of coconut coir and perlite as substrate. The N level in the Japanese garden formula was used as the standard solution, and the balance between nitrate-N and ammonia-N was considered; the N content in the standard solution was increased or decreased by 50% and 100%, respectively, while ensuring the same content of other major elements. The temperature in the greenhouse was controlled within the range of (26±2)°C during the day and (16±2)°C at night, with a relative humidity of 50%-70%, light intensity not exceeding 1000 photosynthetic photon flux density (PPFD) during the day, and 800 mL/plant of the nutrient solution was supplied daily. After 30 d of different N application treatments, the developed leaves were selected for spectral data collection and chlorophyll content measurement, and a total of 150 samples were collected before and after. The 150 samples were randomly divided into 100 samples in the training set and 50 samples in the validation set.

Spectral data acquisition
The spectral measuring instrument used was the AvaSpec-ULS2048XL-EVO (Avantes, Netherlands). The main measurement range of the spectral transmittance of pepper leaves was from 350-1100 nm with a measurement step of 0.6 nm. During the spectral analysis and data processing, the light source was located directly above the detection platform, vertically downward, at a distance of 5 cm. The receiver was a cosine corrector, vertically and upward embedded in the detection platform.
The light source was a standard tungsten light source (Avantes, Netherlands), and the spectral data of the light source was saved before measuring the samples, producing the value of I 0 . Then the samples to be measured were laid flat on top of the cosine corrector avoiding the leaf veins, so that the leaves covered the cosine corrector completely, and the spectral data of each sample was collected four times in the middle of both sides of the main leaf veins and averaged, producing the value of I. After every five samples were measured, the measurement light source was calibrated once. Finally, the transmittance calculation formula was used to obtain a sample spectral transmittance T:

Determination of chlorophyll content
Using a UV5 ultraviolet-visible photometer (Mettler-Toledo, Switzerland) for the determination of chlorophyll content, 0.2 g of deveined leaves were submerged in 10 mL 80% acetone for extraction for 48 h under dark conditions at 4°C, and the absorbance at 663 nm and 645 nm was measured in the solution, and the chlorophyll content was calculated according to the modified formulae of Arnon's method [27] .

Modeling methods
One-variable linear regression and multivariate linear regression were used to establish a diagnostic model of chlorophyll content based on the red-edge parameters of the spectral transmission. Table 2 lists the univariate linear regression modeling methods used in this study. The multivariate linear regression included least square regression and ridge regression. The least squares method is calculated as follows: (2) where, X = [x (1) , x (2) , …, x (m) ] T , Y = [y (1) , y (2) , …, y (m) ], 12 [ , , ..., ] . m is the number of samples, n is the feature dimension, and x i (j) denotes the i-th feature of the j-th sample. Derivation of its loss function yields: According to the least squares method, let the result of the derivative be equal to the 0 matrix: Ridge regression [28] is a biased estimation regression method for data with multicollinearity in the independent variables. Hoerl and Kennard [29] derived specific proofs for ridge regression analysis in their analysis. Mcdonald [30] provided a brief overview of ridge regression methods and demonstrated the nature of ridge regression analysis as the most frequently used formalization method for the regression of ill-posed problems. Ridge regression is a modified version of ordinary least squares estimation. It gives up the unbiased nature of the least squares method at the cost of losing some information and reducing the accuracy of the model so that the regression method can obtain more realistic and reliable regression coefficients.
The ridge regression method transforms the undetermined problem into a deterministic one by adding a regularization term to the loss function. Given by: where, kI  , I is the unit matrix, further calculations lead to the equation: As the value of k increases, the absolute value θ(k) of each element of θ i tends to become smaller and smaller. When the value of k tends to infinity, θ(k) tends to 0. The trajectory of θ(k) with the change in the value of k is called a ridge trace. A very large number of k values are obtained in the calculation to form a ridge trace diagram, and the most appropriate k value is determined according to the basic principles of ridge regression parameter selection.

Accuracy assessment
The above seven models were established using the test set to obtain the model absolute coefficients R t 2 . The validation set was then brought into the models to obtain the leaf chlorophyll content estimates, and the accuracy of the models was tested by calculating the absolute coefficients R v 2 and root mean square errors (RMSE v ) between the estimates and the actual values. The absolute coefficient R 2 is calculated by the equation: The RMSE v is calculated by the formula: where, n is the number of samples; C is the estimated chlorophyll content of the leaves; c is the actual chlorophyll content; c is the mean of the actual chlorophyll content. Figure 1 shows the spectral transmittance curves of pepper leaves in the range of 350-1100 nm under five nitrogen (N0-N4) treatments and their first-order derivative curves in the range of 680-760 nm in the red-edge region. In the original spectral transmittance curve (Figure 1a), it can be seen that the average content and spectrum curve of chlorophyll differed significantly among treatments of different N contents. Due to the strong absorption of the red light band and to strong non-absorption of the near-infrared band by the leaves, a wave trough appeared at 680 nm, and in the 680-760 nm waveband range, T rose linearly. With the increase in chlorophyll content, the depth of the trough and the slope of the straight line both showed obvious changes. The spectral transmittance red-edge parameter was calculated using the first-order derivative method [31] (Figure 1b), where the maximum value of the first-order derivative curve is the red-edge amplitude (dλ red ), the corresponding wavelength is the red-edge position (λ red ), the area enclosed by the curve is the red-edge area (Srad), and the width at half of the peak of the curve is the full width at half maximum (FWHM). One anomalous sample was excluded by calculating the martingale distance of the original spectral transmittance data, and the red-edge parameters were calculated for the remaining 149 sample data. Figure 2 is a box diagram of red-edge parameters and chlorophyll content of leaves under five treatments. The red-edge parameters were significantly affected by chlorophyll content and showed an obvious trend. The red-edge position, red-edge area, FWHM, and chlorophyll had a positive correlation and the red-edge amplitude of the samples in the N1-N4 test area had a negative correlation with the chlorophyll content. Table 3 shows the statistics of red-edge parameters corresponding to spectral information collected from 149 leaves. The four red-edge parameters were strongly correlated with the chlorophyll content, and the absolute values of the correlation coefficients were all greater than 0.8. The red-edge position and FWHM were related to the correlation coefficient of chlorophyll content was relatively high (0.8547 and 0.9044, respectively), and the four red-edge parameters can be used as reference characteristic parameters for chlorophyll diagnosis.

Analysis of modeling results with a single red-edge parameter
Linear, quadratic polynomial, logarithmic, exponential, and multiplicative power models were developed based on each spectral transmittance red-edge parameter (Table 4). In the table, x is the red edge parameter and y is the chlorophyll content. The model based on dλ red has a lower R t 2 value. The R v 2 value of the quadratic polynomial model is 0.7656. The model based on λ red among them, the logarithmic model had the highest R v 2 value, but the R t 2 value and RMSE v results are not satisfactory. The S red -based secondary exponential model RMSE v value was 0.6091, the validation result of this model has the lowest error among the five models. Among the five linear models based on FWHM, the R t 2 value of all models except the exponential model was higher than 0.9. Where the quadratic polynomial models based on FWHM had coefficients of determination of 0.9472, and the determination coefficients of the validation set were 0.7909, RMSE v of the validation set were 0.7074. This model can be used to develop instruments for chlorophyll content detection based on spectral transmittance.

Analysis of results of the multiple red-edge parameter model
The ordinary least squares (OLS) regression model and ridge regression model are established respectively based on the four red-edge parameters, and the two multivariate linear regression models were validated and analyzed (Table 5). By analyzing the ridge traces for the four red-edge parameters, when the value of K is 0.2, the regression coefficient of each variable tends to be stable, and the value of the determination coefficient R t 2 of the model is also high. Bring K=0.2 into the ridge regression model to obtain a biased multivariate linear regression model.
The determination coefficient R t 2 of the ridge regression model was smaller than the OLS regression model, and the standard error (SE), residual sum of squares (RSS), mean square error (MSE), RMSE v were all higher than the OLS regression model. This shows that the accuracy of the ridge regression model is lower than that of the OLS regression model. But the R v 2 value of ridge regression is higher than that of least squares regression.  In Table 6, the a, b, c, d, and e are coefficient and constant terms of the dλ red , λ red , S red , and FWHM, respectively. The standard errors of the coefficients and constant terms of the ridge regression model were much lower than those of the least squares regression model.

Advantages of spectral detection methods
In this experiment, accurate detection of chlorophyll was achieved by using only the red-edge region 680-760 nm, just an 80-nm-wide spectral band range. The method reduces the requirements of the detection device on the light source and light detection sensor, reducing the cost of testing equipment. The device measures the optical path as 0, is not easily disturbed by the external environment, and enhances the accuracy of the data obtained. All this provides a scientific basis for the development of an instrument for the estimation of chlorophyll content of leaves of pepper and possibly other species.

Validity of modeling methods
The use of ridge regression methods to establish chlorophyll content estimation model based on four red-edge parameters, solved problems that the estimation results of the univariate linear model were greatly influenced by measurement and calculation errors. In the OLS regression model, λred was negatively correlated with chlorophyll content, while the coefficient of λ red in the ridge regression model was positive, which was consistent with its positive correlation with chlorophyll content. Moreover, the standard errors of the coefficients and constant terms of the ridge regression model are smaller. In summary, the ridge regression model is structurally more stable. This shows that when the multivariate linear model is established based on the four red-edge parameters, the ridge regression model is more stable than the least squares regression model.

Conclusions
In this study, seven different regression models were developed using four transmittance spectral red-edge parameters. The accuracy of the different regression models in rapid nondestructive estimation of chlorophyll content in leaves was compared.
The ridge regression method provides a more appropriate, reliable, and stable regression model at the cost of reduced accuracy. It is demonstrated that spectral transmittance can be used to develop instruments for chlorophyll content detection based on spectral transmittance. It provides technical support for the non-destructive detection of leaf chlorophyll based on transmittance spectroscopy.