The implementation of artificial intelligence (AI) models in the context of radiology have shown unprecedented potential in terms of aiding in the diagnosis and treatment of diseases [1]. However, to ensure only the most high quality, and robust AI algorithms are implemented in clinical practice , there is an imminent need to prepare the training data, that is, conduct e adequate data preparation considering the selection, collection, cleaning, organization, and pre-processing of the data, aiming to maximize the quality and effectiveness of AI models [1]. In this context, there have been intense discussions about the relevance of understanding the techniques and tools available for the proper implementation of each process, namely, from the selection, collection, cleaning, organization, and pre-processing of the model [2]. We shall look at each stage in turn.
The first step in the process refers to the selection of data, the objective of which being to recognize which medical imaging modalities will be necessary for the training of the AI model, considering the clinical environment and the diversity of cases as aspects of fundamental importance for adequate data processing [1]. We’ve covered the imporatance of diversity in training data for eliminating bias in our other blog here: https://gradienthealth.io/2022/10/01/the-importance-of-solving-data-bias/
The second stage is data collection, which is carried out in different ways, often through retrospective medical image databases, data brokers, public and private hospital databases, and clinical and scientific research centers [1]. Data collection is essential, due to the diversity of relevant data which can be associated with the collection of imaging tests, such as clinical, histopathological, and evolutionary data, favoring a greater correlation between clinical findings with imaging [1-3]. This is where we can help innovators with on demand data, saving time and cost.
The third step is data cleaning, which consists of eliminating irrelevant information, duplicate or corrupt data, possible and definitive cases of Protected Health Information (PHI), and correcting errors or inconsistencies in the data. This step is critical to ensure the quality of the data used to train the AI model, remember “bad data in, bad data out” [1-3].
The fourth step is data organization, a process that involves the classification of medical images based on predetermined criteria, such as image type, clinical characteristics, age, gender, and demographic data, among others the main objective here is to facilitate and streamline the AI model training process [1].
Finally, the fifth step is data pre-processing, which includes the application of image processing techniques, such as segmentation, edge detection, and normalization, among others. These techniques aim to improve the quality of the images and prepare them to be used in AI model training, maximizing the quality and effectiveness of the results obtained. Data preparation for the implementation of AI in medical images has been the subject of several recent studies, which reinforce the relevance of this step to ensure the effectiveness and reliability of AI models applied in the medical industry. Several works have shown that inappropriate data preparation can lead to inaccurate results and to failures in the detection of medical anomalies [1-3]. Moreover, given the intricacy of medical imaging data, which encompases information from various modalities (such as CT, MRI, and X-Ray) and heterogeneous formats, presents a significant obstacle in readying this data for training AI models. The absence of standardization in image acquisition protocols and formats, coupled with the considerable amount of data within each image, renders the selection and organization of data a multifaceted process susceptible to errors. In addition, the need to safeguard patient data privacy and security poses a further challenge when preparing data for AI implementation in medical imaging [4,5].
In order to successfully implement AI in medical imaging, appropriate techniques must be developed for secure data storage and sharing, and ethical and legal regulations governing data usage must be followed. Although there are challenges, recent studies have demonstrated that proper data preparation is critical for success. To maximize the quality and effectiveness of AI models used in healthcare, it is necessary to create efficient tools and techniques for preparing medical image data. Therefore, it is important to understand the available techniques and tools for this preparation, as well as the potential challenges that may arise.
References:
Huang Y, et al. Preparation of medical image data for machine learning: A review. Engineering. 2021;7(4):364-73.
Raza S, et al. A review on pre-processing techniques for medical image analysis using deep learning. Neurocomputing. 2020;396:361-78.
Chartrand G, et al. Deep learning: a primer for radiologists. Radiographics. 2017;37(7):2113-31.
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., & Aerts, H. J. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, 18(8), 500-510.
Kim, J., Park, S.H., & Lee, K. (2020). Current status and future prospects of artificial intelligence in radiology: a review. Annals of translational medicine, 8(8).
Nair, A., Gavrielides, M. A., & Sahiner, B. (2020). Artificial intelligence in medical imaging: Lessons learned from challenges in data annotation. Journal of magnetic resonance imaging, 52(6), 1611-1623.