Deep learning-based classification for lung opacities in chest x-rays through batch control and sensitivity regulation
Skewed classification refers to the uneven distribution of data categories in the training dataset of a classification model and is a widespread problem in data-driven deep learning models. For example, the COVID-19 epidemic has spread around the world in just a few months, affecting the lives and health of countless people. Early in the outbreak, CXRs of patients with confirmed COVID-19 cases were available, but attempts to structure a deep learning CXR model identifying COVID-19 cases with the unbalanced dataset would hamper precise evaluation of the performance of the model obtained. In this study, we investigated BCM as a potential solution to unbalanced classification. The main methodological concept is the regulation of the sensitivity of the CXR model by manipulating the data distribution of the data sets used for the training procedures.
In the dataset, the ratio of positive to negative cases was unbalanced (about 1:4). At the start of this study, we implemented the UNet model and trained it using a vanilla approach; we shuffled and created random batches of the dataset, fed these batches into the UNet model, and calculated the loss function for mini-batch gradient descent optimization. However, the result obtained at the preliminary stage was unsatisfactory and unstable. We then adapted the BCM to solve the problem of class imbalance. The BCM manipulates the distribution of positive and negative cases in each batch. For a batch size of six, we changed the number of positive cases from six to one and the number of negative cases from zero to five in each batch to create models P100 through P17. The vanilla UNet approach trained with a random data distribution produced the RAND model.
After the balanced classroom test set (positive: 229, negative: 229 cases) and eight trials of training procedures, the P83 model (F1: 0.78) outperformed the other six models (F1: 0.62 -0.77) in terms of F1 score. The other measures of the P83 and RAND models were TPR=0.80, FPR=0.25, ACC=0.78 and F1-CV=0.012 for P83 and TPR=0.46, FPR=0.02, ACC=0 .72 and F1-CV = 0.135 for RAND. The F1 score of 0.62 for the RAND approach (proportion of positive samples: 22%) was between those of the P33 (0.72) and P17 (0.60) models. The results indicate that F1 scores were associated with the distribution of training batch data. In all three models (P33, P17, RAND), more negative cases were used with each network optimization iteration. We predicted that the networks would use a higher proportion of negative samples, which tended to lead to more negative predictions. The TPR and FPR values from P100 to P17 were 0.93 to 0.44 and 0.77 to 0.01, respectively, validating the association between data distribution and model performance. The results of P83, P66, and P50 had higher F1 scores (0.75 to 0.78) than the other model (0.60 to 0.72), indicating that the CXR model performed better when more than half of the training batch included positive samples. We can regulate the sensitivity of CXR models using the BCM to meet the demands of different clinical environments. For example, if identifying patient lung opacities is the primary reason for CXR review, a P83 BCM model may be appropriate.
In the eight trials of the training-testing procedures, the F1-CV values of the BCM models ranged from 0.011 to 0.043, and those of the RAND model ranged from 0.118 to 0.135, demonstrating that BCM produced more stable results than RAND. method. A fixed ratio of positive and negative samples resulted in a smoother loss function and led to better convergence in BCM models. In our study, we studied three networks: UNet, SegNet and PSPNet; most of their performance parameters (eg, TPR and FPR) were similar, except for the F1-CV. The mean F1-CV values of the BCM models were 0.0236 (UNet), 0.0363 (SegNet), and 0.0345 (PSPNet), suggesting that the UNet method was the most stable.
In this study, a batch size of six was used due to the GPU random access memory limit (11 gigabytes in the GTX 1080Ti GPU card); therefore, the batch data distribution was limited to six variants, P100 to P17. With more GPU RAM, we can further improve classification performance through fine-tuning of data distributions. For example, the result of P83 (positive:negative, P:N=5:1) had the best F1 score among the BCM models. We could produce P92 (P:N=11:1) or P90 (P:N=9:1) models to further optimize CXR models. The supplement Fig. 1 presents our preliminary investigations of different lot sizes (lot size = 6, 9, 12, 18) and data ratios (P33, P66, P100). We can regulate the sensitivity of CXR models by different batch sizes. The increase in GPU RAM and suits in each batch deserves further investigation.
In machine learning methods, the learning procedures are usually biased towards the majority class because the classifiers aim to reduce the overall loss function. Therefore, the obtained model tends to misclassify the minority class in the dataset. To deal with class imbalance problems, there are several approaches in the field of machine learning. At the data level, resampling methods, such as oversampling, downsampling, or SMOTE15,16,17, generate a new dataset with a fitted data distribution. At the algorithm level, advanced loss functions, such as class rectification loss18 and focal loss19, taking the data distribution in the loss derivation have been shown to be effective in class imbalance problems. The BCM method proposed in this study can be considered as an implementation of the oversampling method. The BCM method adjusts the proportion of positive samples in a batch and manipulates the oversampling rate during the learning process. Combining the BCM method with feed loss functions can improve the performance of CXR models. Supplementary Table 1 lists our preliminary comparison of cross-entropy loss and focus loss. The results suggest that RAND models trained with focal loss significantly improve measures of CXR classification. Future studies are needed to validate the effectiveness of combined methods.
We used the UNet segmentation network for the classification application. The UNet architecture was used for the implementation of an encoder-decoder network with connection hops. In the field of deep learning based pattern recognition, image classification can also be done with the encoder alone, followed by a fully connected output such as VGG1620ResNet21or DenseNet22. Additionally, object detection networks such as Faster Region-based CNN (R-CNN)23YOLO24or R-CNN mask25 can be applied to CXR classification issues. We have not implemented or evaluated BCM with these network architectures; this drawback is a limitation of this study. Nevertheless, we expect BCMs to be advantageous because they are all based on convolutional networks. Further investigation of this theory is warranted.
In conclusion, we presented the deep learning method as employed in the RSNA challenge for CXR recognition. To address the class imbalance of the RSNA dataset, we developed and evaluated the BCM. The patterns obtained using BCM were more stable and the sensitivity was adjustable by manipulating the distribution of positive and negative cases. Therefore, BCM is a convenient method to produce tunable and stable CXR models whether the training dataset is unbalanced or not. The rapid increase in the number of confirmed COVID-19 infections continues to strain medical care systems and strain medical resources. As researchers in medical science, we believe that global collaborative and investigative efforts will help overcome this catastrophe.