Dynamic ensemble selection of convolutional neural networks and its application in flower classification

: In recent years, convolutional neural networks (CNNs) have achieved great success in image classification. However, CNN models usually have complex network structures that tend to cause some related problems, such as redundancy of network parameters, low training efficiency, overfitting, and weak generalization ability. To solve these problems and improve the accuracy of flower classification, the advantages of CNNs were combined with those of ensemble learning and a method was developed for the dynamic ensemble selection of CNNs. First, MobileNet models pre-trained on a public dataset were transferred to flower datasets to train thirteen different MobileNet classifiers, and a resampling strategy was used to enhance the diversity of individual models. Second, the thirteen classifiers were sorted by a classifier sorting algorithm, before ensemble selection, to avoid an exhaustive search. Finally, with the credibility of recognition results, a classifier subset was dynamically selected and integrated to identify the flower species from their images. To verify the effectiveness, the proposed method was used to classify the images of five flower species. The accuracy of the proposed method was 95.50%, an improvement of 1.62%, 3.94%, 22.04%, 13.77%, and 0.44%, over those of MobileNet, Inception-v1, ResNet-50, Inception-ResNet-v2, and the linear ensemble method, respectively. In addition, the performance of the proposed method was compared with five other methods for flower classification. The experimental results demonstrated the accuracy and robustness of the proposed method.


Introduction
Classification of flower species plays an important role in the informatization of flower management [1,2] .
The results of identification can greatly improve the efficiency of retrieving information related to flower species.
However, accurate classification of flower species remains difficult because of the similarities between different flower species, and differences between flowers in the same species.
Manual classification, whereby flower species are assessed by specialists who perform naked-eye observations according to their experience and knowledge, is laborious, time-consuming, and has low recognition accuracy [3] . Because of the rapid development of image processing and artificial intelligence technology, researchers have utilized various image-based classification methods to overcome these obstacles. Hus et al. [4] developed an interactive system for recognizing flower images taken by digital cameras.
Guru et al. [5] investigated the suitability of texture features and designed a system for flower classification, and then used the features for training and classification with a probabilistic neural network. These results demonstrated that the combination of multiple features vastly improves the classification accuracy, from 35% for the best single feature to 79% for the combination of all features. Cheng et al. [6] proposed an attribute-based method for flower recognition, which extracted a series of visual attributes from a given set of flower images and generalized them to new images with possibly unknown flowers. Soleimanipour et al. [7] developed a vision-based hybrid approach, using a hybrid of the Viola-Jones detector and multi-template matching, for highly accurate and effective identification of Anthurium flower cultivars. Their results indicated that the technique had acceptable performance in detecting the spadix region and achieved more than 99% classification accuracy.
In general, the accuracy of traditional image-based classification techniques depends largely on manually designed features to express the characteristics of flower images. Because this is a difficult process, the accuracy and generalization of classification are weak in new application scenarios.
In recent years, deep learning, a kind of efficient technology for feature representation learning, has been rapidly developed and successfully applied in many fields [8][9][10] .
Owing to their outstanding ability to learn complex and robust feature representations, the accuracy of flower image classification has been greatly improved by employing deep neural networks, particularly convolutional neural networks (CNNs). Cıbuk et al. [11] applied a CNN-based hybrid method to the classification of flower species and showed that their proposed method achieved 96.39% and 95.70% accuracy for the Flowers17 and Flowers102 datasets, respectively. Guo et al. [12] classified flowers using the Tabu_Genetic algorithm together with a CNN, which consisted of 14 layers, and it achieved 78.81% accuracy. Toğaçar et al. [13] proposed a hybrid method that used four CNN models, AlexNet, GoogLeNet, ResNet-50, and VGG-16, for feature extraction, and the classification performance was measured using an SVM classifier. Their results showed that the intersection of the features obtained using the feature selection methods improved the classification performance, and the overall accuracy obtained was 98.91%. To improve the accuracy of flower image recognition, Hiary et al. [14] adopted the FCN and VGG-16 to construct a novel two-step deep-learning classifier to distinguish flowers of a wide range of species and achieved at least 97% classification accuracy. In the study of Mitrovic et al. [15] , numerous network models were implemented for flower classification. They found that the AlexNet model, with a uniform sigmoid function for allocating initial weights, produced the best classification results. Although deep-learning-based methods have achieved great success in flower classification, there are still some limitations: it is difficult to select appropriate network structures, parameters, and algorithms, and there is a requirement for long training times for optimal recognition performance in practice.
Unlike classification approaches that use a single fixed model, ensemble learning provides methods that overcome the limitations mentioned above, to some extent. Ensemble learning uses a set of learners and applies rules to integrate the learning results, to obtain better performance than a single learner [16] . The effectiveness of ensemble learning has been widely demonstrated in a variety of applications [17][18][19] . Bae et al. [20] proposed a modified m-CNN model that integrated images and text in multi-view learning for flower classification. Experimental results demonstrated that the proposed algorithm achieved superior performance, compared with other data fusion methods. Huang et al. [21] presented a flower classification framework based on ensemble CNNs and demonstrated its effectiveness on the Flowers102 dataset. In the author team's previous work [22] , a CNN ensemble method was developed for flower classification. Experimental results showed that the method proposed by authors had a better generalization ability and higher recognition rate than the single classifiers. Notably, after obtaining multiple learners, most ensemble algorithms employ all of them to constitute an ensemble. However, both theoretical and empirical studies have shown that, instead of using a whole ensemble, a proper subset of an ensemble can often achieve better generalization performance [23][24][25] . Another clear advantage of a selective ensemble is that storage cost is reduced and efficiency is improved because fewer individual learners need to be stored and used for the classification. However, the selection process is not easy, and choosing appropriate learners remains something of an art.
To overcome these problems and achieve better recognition performance, the advantages of CNNs and ensemble learning were combined and a method was developed using dynamic ensemble selection of CNNs for flower classification.
The main contributions of this study are 1) MobileNet models pre-trained on the ILSVRC-2012-CLS image classification dataset are adopted as single classifiers. Adopting these pre-trained models as the single classifiers can avoid the problems of selecting optimal single-network model parameters and designing appropriate network structures; 2) CNNs have complicated mechanisms that tend to cause overfitting, with poor generalization ability.
However, in the proposed method, by randomly generating multiple single classifiers with a simple structure and integrating their outputs, the accuracy and generalization of the recognition algorithm can be greatly improved; 3) To compromise between ensemble accuracy and efficiency, a dynamic ensemble selection method was developed. With the credibility of recognition results, an optimal classifier subset was dynamically selected and integrated to identify each test sample.

Flower image datasets
Two datasets of flower images were used to evaluate the performance of the proposed method. One dataset [26] contained 3670 flower images from five species and was divided into a training set with 3320 images and a validation set with 350 images. The other dataset contained 2670 images of the same five flower species as the first dataset. This dataset was divided into a test set with 1600 images and a validation set with 1070 images. The images in the second dataset were obtained in one of two ways. Some images were captured in the field using digital cameras or mobile phones; to ensure the robustness of the proposed method, images of the same species were taken under various illumination conditions from different angles and featuring different flowers. The other images were collected from the Internet, for example, the Flowers17 and Flowers102 datasets from the Visual Geometry Group at the University of Oxford and the Subject Dataset of China Plant [27] .
Therefore, there was no uniform image size or resolution in the datasets. Before classification, all the flower images were scaled to the same size (224×224 pixels) and then converted to TensorFlow native TFRecord format.
In this study, flower images were collected from different sources, different regions, and different seasons to represent a wide range of scenarios so that they provided a challenge for testing the performance of the proposed method. To evaluate the diversity of each base classifier in the ensemble, a new validation set with 1420 images was used which was constructed by combining the validation sets in the two flower image datasets. Representative images and the image datasets used in this study are shown in Figure 1 and Table 1, respectively.

Methods
The framework of the proposed method is depicted in Figure 2. The method has four main modules: pre-training, fine-tuning, dynamic ensemble selection, and output. First, MobileNet models pre-trained on the ILSVRC-2012-CLS image classification dataset are used as single classifiers for feature extraction. Second, the models are transferred to flower datasets to train thirteen different MobileNet classifiers, and a resampling strategy is used to enhance the diversity of individual models. Third, the thirteen classifiers are sorted by a classifier sorting algorithm, before ensemble selection, to avoid an exhaustive search. Finally, with the credibility of recognition results, a classifier subset is selected dynamically and integrated to identify the flower species.

MobileNet
MobileNet, which was developed for use in mobile and embedded systems by Howard et al. [28] , can achieve a good balance between performance and computational cost. The MobileNet architecture is based on depthwise separable convolutions, followed by a pointwise convolution with a 1×1 convolution layer. In the standard convolution layer, each kernel is applied to all channels of the input image, whereas depthwise convolution is applied to each channel separately. This approach significantly reduces the number of parameters, compared with standard convolutions with the same depth. A comparison between standard convolution and depthwise separable convolution in MobileNet is shown in Figure 3.

Dynamic ensemble selection
Dynamic ensemble selection is an ensemble learning paradigm in which one or more base classifiers are selected for each query instance to be classified [29][30][31] . In dynamic selection, the aim is to select the most competent classifiers for any given query sample. However, finding the optimal subset of classifiers entails searching in the space of all classifier combinations, and the computational complexity increases exponentially with the number of classifiers.
Note: Conv: Convolution; BN: Batch Normalization; ReLU: Rectified Linear Unit. Figure 3 Comparison between standard convolution and depthwise separable convolution in MobileNet models A method was proposed for the dynamic ensemble selection of CNNs, which combines the procedures of classifier selection and integration. A flowchart of the proposed method is shown in Figure 4. The proposed method has two stages: classifier sorting and dynamic classifier ensemble selection. First, all classifiers are sorted by the proposed classifier sorting method. Classifiers are then dynamically selected, one by one, from the sorted sequence to be integrated to identify the test sample. The number of selected classifiers is determined by the credibility of recognition results of the test sample. Details of the proposed method are presented in the following sections. Diversity among the base classifiers is generally considered to be important when constructing a classifier ensemble. There are several diversity measures for classifier members [32] .
The disagreement measure was used. For binary classification, this measure is defined as the ratio between the number of observations on which one classifier is correct and the other is incorrect to the total number of observations. The relationship between two base classifiers C i and C j is shown in Table 2. Table 2 Relationship between two classifiers C i and C j If N is the total number of samples, for a two-classifier measure, N 11 indicates the number of times both classifiers are correct, N 00 indicates the number of times that both classifiers are incorrect, and N 10 and N 01 indicate the number of times that only the first or only the second classifier is correct, respectively, and N = N 01 +N 10 +N 11 +N 00 . The disagreement measure (dis) between the two base classifiers Ci and Cj is 01 10 The diversity of a whole set of L base classifiers (Dis) is the disagreement measure, which is defined as the average over all pairs of base classifiers (Equation (2)).
Therefore, the diversity increases with the value of the disagreement measure.

Classifier sorting
A static classifier sorting method (Algorithm 1) was developed by combining the selection of classifiers with the disagreement measure. The flowchart of Algorithm 1 is shown in Figure 5.
In Algorithm 1, the classifier with the highest recognition accuracy on the validation set is ranked first, and then the next classifier is selected from the remaining candidate classifiers to rank second. The selection criterion is that the classifier set composed of the newly chosen classifier and the previously selected classifier(s) maximizes the value of Dis. The selection step is repeated until all the candidate classifiers have been sorted. Thus, Algorithm 1 transforms the problem of classifier selection to a problem of classifier sorting, and thereby avoids the process of classifier searching and improves the efficiency of classifier selection. With the sorted classifier sequence, the top classifiers in the sorted classifier sequence are dynamically selected as the optimal classifier subset and integrated to identify each test sample. When only one classifier is needed, the classifier with the highest recognition rate on the validation set is selected for each query instance to be classified. However, when N classifiers are needed, the top N classifiers in the sorted classifier sequence are selected as the optimal classifier subset. Algorithm 1 transforms the problem of classifier selection to a problem of classifier sorting, and thereby avoids the process of classifier searching and improves the efficiency of classifier selection.

Dynamic classifier ensemble selection
After all base classifiers are sorted, an optimal classifier subset is selected dynamically from the sorted classifier sequence P and integrated to identify the test sample. The method for dynamic classifier ensemble selection (Algorithm 2) is presented in Figure 6. The posterior probability is computed by using Bayes' theorem [33] . Consider the specific task of flower classification, which is the focus of this study. Each input image comprises an array of pixel intensity values, and the desired output is a posterior probability distribution over all the categories of flower species. For a classifier whose output is not a posterior probability distribution, the output should be transformed to a posterior probability [34] . Algorithm 2 proceeds as follows. First, the initial credibility ε 0 is set according to the required recognition accuracy in practical application. Second, one or more base classifiers for an optimal classifier subset are selected dynamically from the sorted classifier sequence P and integrated to identify test sample x; if ε k * ≥ε 0 , the category of ε k * corresponds to the recognition result R k of sample x.
Third, if all classifiers have been selected and the recognition credibility requirement remains unsatisfied, the identification results [R 1 , R 2 , …, R N ] of each integration are voted on, and the category with the most votes is considered to be the category of sample x. Therefore, for each test sample, there is a specialized optimal classifier subset, which is dynamically selected and integrated to identify the test sample. That is, the recognition is targeted.

Evaluation criteria
The performance of the proposed classification method was evaluated according to accuracy. Accuracy (%) is a good metric for measuring the proportion of correctly classified instances over all the samples in a test set. Given the number of correctly classified instances (NC) and the number of all the samples in the test set (NA), accuracy is defined as NC Accuracy 100% NA = × (3)

Results
All the experiments were performed using TensorFlow in the Python programming environment on a computer with an Intel® Core™ i7-7700HQ (2.8 GHz) processor, 16 GB of memory (RAM), NVIDIA GeForce 940MX, and Windows 10 operating system.

Single classifier generation
MobileNet models were selected as the single classifiers to classify flower images. MobileNet was pre-trained on the ILSVRC-2012-CLS image classification dataset and the pre-trained weights were fine-tuned using the following parameters: initial learning rate 0.01, number of training iterations 500. The Adam algorithm was used as the optimizer, and the output layer used the softmax classifier to convert the output result to a probability in the range [0, 1].
To construct a good ensemble, the bootstrap resampling method was used to create different training sets, so that each classifier in the ensemble was trained with a different training set, to ensure diversity. Moreover, different batch sizes were used for single-network training because this approach achieved better recognition performance in the previous studies of authors [22] . Thirteen MobileNet single classifiers were generated randomly and the recognition accuracy of every single classifier is shown in Table 3. The results show that MobileNet_13 had the highest recognition rate and MobileNet_2 had the lowest recognition rate on both the test and validation sets.

Dynamic ensemble selection
The diversity among the single classifiers in Table 3 was measured on the validation set using the disagreement method. All the classifiers were sorted according to the Dis measure, defined in Equation (2), using Algorithm 1. The diversity Dis and the classifier ranking results, using Algorithm 1, are shown in Table 4. The second column shows, for each classifier, the diversity (Dis) of the set comprising that classifier and all previously selected (i.e., higher-ranked) classifiers. For example, the diversity of {MobileNet_13, MobileNet_10}, the set of the two highest-ranked classifiers, is 0.0268. The classifier ranking was used with Algorithm 2 to dynamically select a classifier subset and integrate them to identify each flower image in the test set.
The proposed method was compared with several frequently used CNN models, such as MobileNet, Inception-v1, ResNet-50, Inception-ResNet-v2, and the linear ensemble method. In the experiments, the initial credibility was set as ε 0 =1. The results of the comparison of the classification methods are shown in Table 5. The classification accuracy of the proposed method was 95.50%, an improvement of 1.62%, 3.94%, 22.04%, 13.77%, and 0.44% over the accuracies of MobileNet_13, Inception-v1, ResNet-50, Inception-ResNet-v2, and the linear ensemble method, respectively. The proposed algorithm had a higher recognition rate because an optimal classifier subset was dynamically selected and integrated to identify each flower image. Therefore, the recognition was more targeted, and therefore more accurate. The linear ensemble method used the same subset of classifiers to identify all images, therefore, lacked focus. The results also revealed redundancy among the multiple classifiers. If all classifiers were integrated directly, some invalid decisions would be fused and the final recognition results would be affected. Among all the single classifiers, ResNet-50 had the lowest recognition rates. For the limited training set, a complex network structure was more likely to lead to overfitting and poor recognition performance. However, in the dynamic ensemble selection method, by randomly generating multiple single classifiers with a simple structure and integrating their outputs, the accuracy and generalization of the recognition algorithm were greatly improved.

Parameter influence
Initial credibility (ε 0 ) is an important parameter that affects the performance of the proposed dynamic ensemble selection method. To study the influence of initial credibility on the results of the integration algorithm, three initial credibility values were used. The results showed that the recognition rate of the proposed method was highest with the highest initial credibility value, but the number of identified samples whose credibility values exceeded the initial credibility was lowest (Table 6). Conversely, the recognition rate was lowest with the lowest initial credibility value, but the number of identified samples whose credibility values exceeded the initial credibility was highest. For example, for ε 0 =1, 836 samples in the test set (52.25%) had credibility values that exceeded the initial credibility. For other samples, classified with Algorithm 2, if one or a few classifiers have credibility values exceeding the initial credibility, there is no need to select and integrate any more classifiers; otherwise, classifiers are added in turn for ensemble recognition until all the classifiers have been selected. If all the classifiers have been selected but the initial credibility has not been reached, the recognition results of all integrations are voted on, and the category with the most votes is considered to be the result of sample identification. Therefore, the dynamic ensemble selection method has better pertinence than a single classifier, and the credibility of each recognition result is assessed, which ensures both the reliability of the recognition results and the accuracy of recognition. However, this means that more classifiers are involved in the integration, leading to long recognition times. Similarly, for ε 0 =0.8, 1495 samples in the test set (93.44%) had credibility values that exceeded the initial credibility. This result shows that the recognition accuracy for most samples satisfied the requirements of credibility. The remaining samples that were difficult to identify required the use of the dynamic integration recognition method. Therefore, the number of classifiers to be integrated was small and the recognition time was short; however, the accuracy of the corresponding recognition results was relatively low. In practical application, the appropriate initial credibility should be set according to the application scenario, taking into account both the recognition accuracy and efficiency.

Discussion
CNNs have been used widely to improve the accuracy of image classification. However, there are still some limitations, as discussed in Section 1. To overcome these difficulties, the advantages of CNNs and ensemble learning were combined and developed a method for the dynamic ensemble selection of CNNs. The experimental results, as shown in Table 5, demonstrated that the proposed model had the best accuracy, compared with the single-CNN models, MobileNet_13, Inception-v1, ResNet-50, Inception-ResNet-v2, and the linear ensemble method.
In addition, the performances of the proposed approach in this study with that of previous methods were compared, as shown in Table 7. In experiments, the flower dataset was used in the study of Toğaçar et al. [13] was also used to test the performance of the proposed method. Twenty percent of the images in each class were randomly selected for testing, and five groups of experiments were conducted. The classifier ranking shown in Table 4 was used to dynamically select a classifier subset, which was integrated to identify the flower images in the test sets.
The results in Table 7 show that the proposed method still achieved good classification results. The mean and standard deviation of the accuracy was (95.93±0.45)%. Furthermore, the results demonstrated that the proposed method in this study is accurate and robust. Notably, the method proposed by Toğaçar et al. [13] had the highest recognition accuracy (98.91%). In their study, the AlexNet, GoogLeNet, ResNet-50, and VGG-16 CNN models were used for feature extraction. In the next step, the efficient features were selected using f-regression and multiple inclusion criteria. As a result, two new feature sets were created with the mentioned feature selection methods and then the intersecting features of these two clusters were extracted. These features were then classified by the SVM method and achieved 98.91% classification accuracy. The intersection of the features obtained by feature selection methods contributed to the classification performance. For the other classification methods in Table 7, detailed classification performance analysis has been discussed in Reference [13].
Diversity is a necessary condition for high generalization capability in classifier ensembles. In this study, two methods of ensuring diversity were adopted. First, the bootstrap resampling method was used to create different training sets, so that each classifier in the ensemble was trained with a different training set. Second, the homogeneous ensemble technique was used in this proposed method: multiple identical MobileNet models were selected as the single classifiers to be integrated. Nevertheless, the results in Table 7 show that, in the study of Toğaçar et al. [13] , using different CNN models (such as AlexNet, GoogLeNet, ResNet-50, and VGG-16) for feature extraction could obtain better recognition results than the others. Therefore, the heterogeneous ensemble technique will be adopted to ensure diversity among single classifiers in future work.
Effectively identifying and analyzing materials are key procedures for breeding novel crop varieties because of the large quantities of materials and their combinations. Currently, breeding information management systems have been developed and applied, which can both provide more (and more comprehensive) breeding information and improve the accuracy and reliability of breeding decisions [38][39][40] . A crop trait information acquisition system was developed by the authors previously (Seed Breeding Cloud Platform, http://ebreed.com.cn/; in Chinese) [41] , to effectively improve breeding information management.
Furthermore, the wide application of mobile phones provides convenience for real-time and on-field management. The acquisition system has been applied in the breeding of many crops. In future research work, the authors plan to combine the dynamic ensemble learning method with the acquisition system to realize the precise management of flower breeding information. The design of a planned flower breeding management information system is shown in Figure 7. Flower images can be captured using mobile devices, and then they can be linked using the breeding management information system to display growth stages, flower colors and characteristics, and types of disease symptoms or insect infestations. Subsequent image processing, for example, accurate monitoring of flower growth status, determination of disease progression, and recognition of disease types can be implemented using the captured images.

Conclusions
In this study, a method for the dynamic ensemble selection of CNNs for flower image classification is described. Pre-trained MobileNet models were used as single classifiers for feature extraction, and thirteen different single MobileNet classifiers were generated randomly, and then dynamically selected and integrated to identify flower species. The initial credibility was adopted to ensure the reliability of classification results. Classification experiments were performed using images of five flower species. The accuracy of the dynamic ensemble selection method was 95.50%. By comparing the performance results of this proposed method with those of MobileNet, Inception-v1, ResNet-50, Inception-ResNet-v2, the linear ensemble method, and several previous flower classification methods, the accuracy and effectiveness of the proposed method were demonstrated.
Diversity is an important factor for improving the performance of ensemble models. In the future, the heterogeneous ensemble technique will be adopted to ensure diversity among single classifiers. The authors plan to combine the dynamic ensemble learning method with the Seed Breeding Cloud Platform to realize the precise management of flower breeding information.