Siemens Healthineers Academy
Whitepaper - Operating Characteristics and Algorithm Performance AI-Rad Companion Chest X-ray

Whitepaper - Operating Characteristics and Algorithm Performance AI-Rad Companion Chest X-ray

This whitepaper details the operating characteristics and bench testing performance of the algorithms within AI-Rad Companion Chest X-ray VA40A and newer versions.
Target group: All users
Recommended to be viewed on the following devices: All (incl. tablet, smartphone)

HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance AI-Rad Companion Chest X-ray VA40A SIEMENS Healthineers HOOD05162003298357 I Effective Date: 10 DEC 2022 Whitepaper · Operating Characteristics and Algorithm Performance Table of contents Abstract ………………………………………………………………… 3 Introduction …………………………………………………………… 3 Description of Multi-reader Multi-case (MRMC) Study ……………………… 5 Results of MRMC Study …………………………………………… 7 Discussion and Conclusions of MRMC Study ……………… 9 Performance Testing of Consolidation Detection on COVID-19 Data ………………10 Revision History ………………………………………………………11 References ……………………………………………………………11 2 Siemens Healthcare GmbH, 2022 HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance · Whitepaper Abstract The following whitepaper details the operating charac- atelectasis, pneumothorax, consolidation, and pleural teristics and bench testing performance of the algorithms effusion. The conclusions of the study are used as a within AI-Rad Companion Chest X-ray VA40A and newer measurable baseline to quantify standard-of-care perfor- versions. It describes the general principles of the soft- mance. It has been demonstrated that AI-Rad Companion ware such as its workflow and results. The whitepaper Chest X-ray VA40A performs comparable to the average also explains performance metrics like area under performance of radiologists participating in the study receiver operating characteristics curve, sensitivity, with higher area under receiver operating characteristics specificity etc. and how to interpret these algorithm curve (AUC) and sensitivity for all the target radiographic performance metrics. The white paper summarizes findings. With AUC of 95–99%, the AI algorithms within results of two bench testing studies: a MRMC study that AI-Rad Companion Chest X-ray demonstrate high accu- shows how the device performance can be benchmarked racy for the detection of the target radiographic findings in comparison to average performance of board-certified for use as a diagnostic aid for concurrent reading of radiologists, and secondly, a standalone testing that Chest X-rays to be reviewed. Additionally, we also show was conducted to demonstrate performance of the that AI-Rad Companion Chest X-ray demonstrates high consolidation detection algorithm on a cohort of unseen accuracy for detection of radiographical signs of consoli- data including RT-PCR confirmed COVID-19 cases. The dation in X-rays of target population, including patients target radiographic findings include pulmonary lesions, with COVID-19. Introduction AI-Rad Companion Chest X-ray is a diagnostic aid for capture DICOM objects and as machine readable DICOM qualified medical professionals which identifies and Structured Reports. AI-Rad Companion Chest X-ray is highlights the pre-specified radiographic findings using designed as a clinical extension to the AI-Rad Companion artificial intelligence algorithms. It is intended to be used Engine and is deployed through the teamplay digital by a qualified medical professional concurrently with health platform. It must be used in conjunction with original images before a final decision is made on a case. appropriate software such as reporting software, to The results generated by AI-Rad Companion Chest X-ray report the findings and clinical observations. The work- summarize the identified radiographic findings with flow for clinical integration of AI-Rad Companion Chest localization information in the format of secondary X-ray is shown in Figure 1. AI-Rad Companion hosted on teamplay digital health platform H Local I Cloud Potential radiographic findings Radiology Workplace Lesions Pneumothorax Pleural effusion Atelectasis Consolidation · · · · · Functionalities Highlighting A/B Characterizing Figure 1: Clinical Workflow of AI-Rad Companion Chest X-ray. Siemens Healthcare GmbH, 2022 3 HOOD05162003298357 I Effective Date: 10 DEC 2022 Whitepaper · Operating Characteristics and Algorithm Performance AI-Rad Companion Chest X-ray is indicated for patients consolidation, atelectasis, pneumothorax, and pleural aged 22 years and above and for use in clinical settings effusion). Qualified medical professionals should review such as routine check-up, inpatient and outpatient original images for all suspected pathologies by following departments, and disease screening. AI-Rad Companion standard clinical procedures. AI-Rad Companion Chest Chest X-ray may operate at a higher sensitivity than an X-ray works best if the Chest X-ray image is acquired in average qualified medical professional as intended for compliance with the practice parameter for performance Computer Aided Detection devices and it must be noted of chest radiography laid out by American College of that the device could mark occasional false positive or Radiology guidelines [1] such as using high-kilovoltage missed findings. AI-Rad Companion Chest X-ray is not technique (120 to 150 kVp), using anti-scatter technique, designed to detect presence of radiographic findings etc. American College of Radiology guidelines [1] also other than the prespecified list (pulmonary lesions, recommend quality control of acquired radiographs 1 Annotated image ES 1. 7 LES 2 . 9 2 Overview table • Presence of finding indicated by pink dot • Finding types found in image → white font • Not found → grayed out Findings - (Al Confidence above 5 are highlighted) Caution 3 AI Confidence Score 2 Abbr. Al Confidence* The findings shown on this summary were auto-generated. • Normalized algorithm confidence: PTX Pneumothorax 1 (low) – 10 (high) W • PEF Pleural Effusion Shown are only 6–10 • LES Pulmonary Lesions For findings not found by AI, score is ≤ 5 (not shown in the image) CO Consolidation • Normalized score to express the algorithm‘s AT Atelectasis certainty for the presence of a finding no measure of actionability – Al Findings @ Not found by Al nfidence Score Range (1-Low to 10-High) no measure of malignancy – Figure 2: Description of elements in results generated by AI-Rad Companion Chest X-ray. SIEMENS AL-KALU companion Chest x Ray VAZZA Healthineers SIEMENS AL-QAU companion Chest X-Hay VAZIA Healthineers. (L) 36 stehend REF . 6 Findings .LlC.w.busha faitg->0) Caution Caution Findings @- Caution Abbr. Name Al Confidence" Abb Al Cenlidente* ALLI. Name Al Confidence' PTX Pncu mothorax 10 matractor .. PER Plcurel Etfusion LES Pulmonary Lesions Pulrosale cio'n Consofestion Consolidation AT Att actasis Figure 3: Examples of device output of AI-Rad Companion Chest X-ray with markings indicating Pulmonary Lesions, Consolidation, Pleural Effusions, and Pneumothorax, along with the corresponding AI-confidence scores. Image courtesy: Medizinisches Versorgungszentrum Prof. Dr. Uhlenbrock und Partner in Dortmund, Germany. 4 Siemens Healthcare GmbH, 2022 HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance · Whitepaper by a trained technologist prior to archival in the PACS pneumothoraxes1. In a study by Eltorai et al. [4], to ensure the anatomy is captured sufficiently and a majority of 56.8% of US-board certified thoracic appropriate DICOM tags are filled in. Figure 2 pictorially radiologists desired AI applications that aided in illustrates the different elements that are generated as detection of pneumothoraces. a part of the results created by AI-Rad Companion Chest X-ray. Figure 3 illustrates sample outputs generated by • Atelectasis: AI-Rad Companion Chest X-ray detects the medical device indicating the suspected findings increased opacities associated with volume loss which found, their location, and their corresponding confidence can be accompanied by abnormal displacement of scores (see Figure 2 for an illustration). fissures, bronchi, vessels, the diaphragm, the heart, or the mediastinum. Written in an article by Ruscic The definitions of the radiographic findings detected in et al. [5] this is one of the most common respiratory AI-Rad Companion Chest X-ray are adapted from RadLex complications after a surgical procedure. [2] and Fleischner Society Glossary for Thoracic Imaging [3] and are defined as follows: • Consolidation: AI-Rad Companion Chest X-ray detects radiographic signs associated with increased paren- • Pulmonary Lesions: AI-Rad Companion Chest X-ray chymal attenuation which includes consolidation detects pulmonary lesions including lung nodules and and ground glass opacity. Consolidation appears as masses. Nodules present as rounded or oval opacities a homogeneous increase in pulmonary parenchymal < 3 cm in diameter and masses are defined as pulmo- attenuation that obscures the margins of vessels and nary, pleural, or mediastinal lesions greater than 3 cm airway walls. Ground glass opacity appears as an area in diameter. In a study by Eltorai et al. [4], a vast of hazy increased lung opacity, usually extensive, majority of 88.4% of US-board certified thoracic radiol- within which margins of pulmonary vessels may ogists desired AI applications that aided in detection be indistinct but not obscured. of pulmonary nodules. • Pleural Effusions: AI-Rad Companion Chest X-ray • Pneumothoraces: AI-Rad Companion Chest X-ray detects radiographic signs suggestive of fluid in the detects radiographic signs suggestive of gas (air) pleural space between the visceral and parietal pleura. in the pleural space, including both small and large 1 The characterization of the size of pneumothorax is based on Baumann, M.H., Strange, C., Heffner, J.E., Light, R., Kirby, T.J., Klein, J., Luketich, J.D., Panacek, E.A. and Sahn, S.A., 2001. Management of spontaneous pneumothorax: an American College of Chest Physicians Delphi consensus statement. Chest, 119(2), pp.590- 602. Description of MRMC Study Motivation Chest X-ray. Statistical analyses were performed Reading Chest X-rays is subject to a large inter-reader comparing the means of each group (AI-Rad Companion variability in clinical routine. To better estimate the Chest X-ray vs. radiologists). The null hypothesis of both performance of human readers on the task of identifying groups performing equally was tested using a paired radiographic findings scoped in this device, a clinical t-test. The goal of the study is purely to form an empirical reader study was conducted where the average reading basis for standalone testing of AI-models and is not performance of human readers is estimated. An Artificial intended to replace radiologist‘s review of the original Intelligence based diagnostic assistant that is intended cases. The medical device is intended as a diagnostic aid to improve the diagnostic accuracy of readers should and does not remove any cases from the radiology have a performance superior or comparable to the stan- worklist.2 dard of care. Since an objective measurable definition of standard of care is not possible, the average reader performance observed in this study was used to compare 2 Refer to the Instructions for Use of AI-Rad Companion Chest X-ray for further details on the intended use, indications of use, contraindications and use and contrast the performance of AI-Rad Companion scenarios of the device. Siemens Healthcare GmbH, 2022 5 HOOD05162003298357 I Effective Date: 10 DEC 2022 Whitepaper · Operating Characteristics and Algorithm Performance Dataset Description Truthing Procedures In total 1019 patients were selected retrospectively from The ground truth for the test data set was constructed in three different sites in US and Europe: Princeton Radi- a consensus reading fashion. Three board-certified radio- ology in Princeton NJ, USA, Medizinisches Versorgungs- logical experts with each > 7 years of reading experience zentrum Prof. Dr. Uhlenbrock und Partner in Dortmund, (two truthers have Fellowship of the Royal College of Germany, and Ludwigs-Maximilians-Universität, Munich, Radiology), read and marked all the abnormalities they Germany. found in the image independently. In case of disagree- ment, the consensus with regards to the presence of The data set is representative of the target intended an abnormality is reached by majority voting. At least population (mean age: 59.1 years with 24.7% of data two experts need to confirm and agree a positive finding with < 45 years old, 48.4% female) and is sampled at in terms of type and location before it is recognized as random from a larger consecutively collected dataset positive in the ground truth. The distribution of positive acquired for product development. The dataset consists and negative cases truthed by consensus reading is given of patients from the United States and European Union in Table 1. The controls are included in the negatives (Germany). The data set was enriched using parsing of pool for each of the findings. The truthers also confirmed the clinical reports with the help of Natural Language if the datasets were of diagnostic quality and cases were Processing (NLP) for the pre-specified radiographic replaced in case the quality was found to be not suffi- findings to ensure that enough positive cases will be cient for clinical use. presented after truthing (at least 30 positive cases). The case selection is based on the initial clinical reports Reading Procedures which were automatically processed by in-house NLP technique to reach the target distribution. The distribu- In the reading session, seven independent board-certified tion of vendors within the dataset is shown in Figure 4. radiologists (radiology experience ranging from 2 years Only the Posterior-anterior (PA) view Chest X-ray images to 12 years, 5 male / 2 female) were recruited in the fulfilling a set of pre-defined DICOM image quality gate reader study. The individual reader iterated each case was included in this study3. Additional to the enrichment and identified the findings of which the type and of positive cases with targeted findings, a control group is randomly selected which only encompasses the normal patients without findings referred in the clinical reports. 3 Refer to the Data Sheet of AI-Rad Companion Chest X-ray for further details on DICOM tag prerequisites. 400 300 200 100 0 Carestream Health FUJIFILM Corporation SIEMENS Agfa Varian KONICA MINOLTA Figure 4: Distribution of vendors within the retrospective cohort in this MRMC Study. 6 Siemens Healthcare GmbH, 2022 HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance · Whitepaper location were indicated and marked. Considering the probability of the presence of a finding, ensures a better subjectivity in reading and interpreting chest X-rays, estimation of the reader performance in terms of AUC for each finding that the reader identified was ranked (area under the curve) of the receiver operating charac- with a confidence level scaling from 1 to 10 (1 = lowest teristics curve. The AUC is calculated at a case-level. confidence to 10 = highest confidence), representing If there are multiple instances of the same finding, how much confidence was associated to the detection the maximum confidence at the case-level was used. of the finding by the reader. The full dynamic range of For negatives, zero-imputation was performed. the confidence level, which can be translated to the Radiographic Findings Pulmonary Lesions Atelectasis Consolidation Pleural Effusion Pneumothorax Positive 138 69 55 69 61 Control 306 264 278 253 257 Table 1: Final composition in number of cases within the dataset for bench testing after definition of ground truth. Results of MRMC Study Performance Metrics • Area under the ROC curve (AUC): this number The following performance metrices are calculated for estimates the probability of correct ranking positive/ each of the findings and is used to measure both reader negative. It is bound between 0 and 1: the closer and AI algorithm performance: to 1, the better the system’s performance. 1 means a perfect classification of positives vs. negatives. A diagnostic test producing an AUC value between • Sensitivity: proportion of positive images (i.e., having a certain radiographic finding) correctly labeled as 0.9 to 1.0 is considered ‘Excellent’ in the traditional positive. This is also known as the True Positive Rate academic point system [4]. and is a measure of what part of positive cases are detected by the reader or the AI algorithm. Results on Reader and Algorithm Performance • Specificity: proportion of negative images (i.e., not having a certain radiographic finding) correctly labeled The observed average performance of the board- as negative. False Positive Rate is defined as 1 - Speci- certified readers and the contrasting results of the ficity and is a measure of how often the device / reader algorithm performance of AI-Rad Companion Chest creates a false alarm. X-ray are shown below for each of the target findings in Figures 5–7 for AUC, Sensitivity and Specificity • Receiver Operating Characteristics (ROC) curve: respectively. To measure the average reader perfor- this curve is created by plotting the True Positive mance, the area under the ROC curve (AUC), the sensi- rate (Sensitivity) against the False Positive Rate tivity at a confidence of ≥ 5 with the corresponding (1 - Specificity) at various threshold settings [4]. specificity is reported. Siemens Healthcare GmbH, 2022 7 HOOD05162003298357 I Effective Date: 10 DEC 2022 Whitepaper · Operating Characteristics and Algorithm Performance Area under the ROC Curve ** ** ** ** ** 100 90 80 70 T 60 50 AUC 40 30 20 10 0 LES PTX CO AT PEF Readers 67 64.9 67.2 69.9 88.8 AI algorithm 95.1 98.3 94.9 97.8 99.5 Figure 5: Performance of readers and AI algorithm measured in terms of area under the receiver operating characteristics curve (Notation: **: p-value < 0.01 of superiority tests). Sensitivity ** ** ** ** ** 100 90 T T T 80 70 60 50 40 T Sensitivity T T 30 20 10 0 LES PTX CO AT PEF Readers 35.5 30 39.4 43.7 83.4 AI algorithm 82.6 85 74.7 91.1 93.2 Figure 6: Performance of readers and AI Algorithm measured in terms of their sensitivity to detection of each of the radiographic findings (Notation: **: p-value < 0.01 of superiority tests). 8 Siemens Healthcare GmbH, 2022 HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance · Whitepaper Specificity * ** ** ** ** 100 T T 95 90 85 80 75 Specificity 70 65 60 LES PTX CO AT PEF Readers 97.9 99.7 95 95.4 94 AI algorithm 92.1 96.4 95.4 95.8 97.3 Figure 7: Performance of readers and AI algorithm measured in terms of specificity in the detection of each of the target radiographic findings (Notation: *: p-value < 0.05, **: p-value < 0.01 of non-inferiority tests). Discussion and Conclusions of MRMC Study With AUC of 95–99%, the AI algorithms within AI-Rad Pleural Effusion: We observe a high reader performance Companion Chest X-ray demonstrate high accuracy for with AUC of 88.8 and high sensitivity of 83.4 for the task the detection of the target radiographic findings. of detection of pleural effusion. The AI algorithm demon- strates a higher AUC of 99.5 and sensitivity of 93.2. Pulmonary Lesions: The detection of pulmonary lesions is considered a challenging task while reading and this is Pneumothorax: Subtle and small pneumothoraces are reflected in the performance of the readers in the reader easy to miss while reading chest X-rays. The AI algorithm study. Particularly, the case-level sensitivity of readers has higher sensitivity 85.0 than an average reader of 30 was observed at 35.5 as subtle pulmonary lesions are with comparable specificity (AI algorithm 96.4 vs. reader easy to miss. In contrast, the sensitivity of AI improved specificity of 99.7). to 82.6 at case-level with a statistically significant margin of 47.1 points (p-value < 0.01). The intended workflow of AI-Rad Companion Chest X-ray for the clinician is to read the original images and AI Consolidation: The AUC of AI algorithm for detection results concurrently in his reading workstation. The AI of consolidation in comparison to readers improved by algorithms would mark and identify findings with a statistically significant margins by 27.7 (p-value < 0.01) higher sensitivity and accuracy than the average of and the sensitivity by 35.3 (p-value < 0.01). readers participating in this study. This increased sensi- tivity would help alleviate potentially missed findings and Atelectasis: The AUC of AI algorithm for detection in act as additional confirmatory evidence prior to reporting comparison to readers improved by statistically signifi- on the image. It must however be noted that AI-Rad cant margins by 28 (p-value < 0.01) and the sensitivity Companion Chest X-ray should not be used in lieu of full by 47.4 (p-value < 0.01). patient evaluation or solely relied upon to make or con- firm a diagnosis. It is not intended to replace the review of the X-ray image by a qualified medical professional. Siemens Healthcare GmbH, 2022 9 HOOD05162003298357 I Effective Date: 10 DEC 2022 Whitepaper · Operating Characteristics and Algorithm Performance Performance Testing of Consolida Performance Testing of Consolidation Detection on COVID-19 Data Chest imaging, X-ray in particular, plays an important The vendors included in the cohort support DICOM role in patient management during the COVID-19 Standards [6]. The relevant descriptive statistics from pandemic. Patient management and clinical decisions the cohort are shown in Table 2. depend on clinical outcomes and imaging reports. The ground truth for this cohort set was constructed in For the performance evaluation of detection of radio- a consensus reading fashion using three board-certified graphic finding Consolidation, based on this algorithm radiological experts. The truthers also confirmed if the (VA40A), a representative cohort consisting of 938 datasets were of diagnostic quality and cases were images has been selected. This cohort includes 1454 replaced in case the quality was found to be not suffi- images (15.5%) acquired from patients with COVID-19 cient for clinical use. The performance evaluation was confirmed via an RT-PCR test. The data comprise both done using three standard evaluation metrics – Area posterior anterior and anterior posterior views from three under the Receiver Operating Characteristics Curve (AUC) different data sources5 (MVZ Prof. Uhlenbrock & Partner, (shown in Figure 8), Sensitivity and Specificity. The Dortmund, Germany 58.82%, Tokushukai Shounan observed performance figures are reported along with Kamakuru General Hospital, Okhinawa, Japan 20.95%, their 95% confidence intervals estimated using boot- Lung Image Database Consortium (LIDC) 20.23%). strapping in Table 3. Positive Control PA AP Mean age Median age IQR Male Female NA age NA gender 251 446 602 95 66 69 26.75 280 276 163 141 Table 2: Descriptive statistics of case distribution for consolidation. Metric Value with 95% confidence intervall ROC 1.0 AUC 90% (87% – 92%) Sensitvity 81% (76% – 85%) 0.8 * Specificity 85% (81% – 92%) 0.6 Table 3: Performance evaluation of consolidation detection along with 95% confidence intervals. 0.4 Sensitivity 0.2 V9: AUC: 0.90 0.0 * V9@cutoff 0.16: sens 0.81, spec 0.85 Conclusion: With the above standalone testing, we 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 demonstrate that AI-Rad Companion Chest X-ray has a 1 - Specificity high accuracy for detection of radiographical signs of consolidation in X-rays of the target population, Figure 8: Receiver Operating Characteristics Curve for including patients with COVID-19. detection of consolidation on patient cohort enriched with RT-PCR confirmed COVID-19 data. 4 Based on the mean and 95% confidence intervals estimated for AI, assuming a type-1 error rate of 0.05, power of 80% and a superiority margin of 5% the following minimum sample sizes of 55 patients have been estimated. 5 AI-Rad Companion Chest X-ray is tested and validated on multi-vendor dataset including Siemens Healthineers, GE, Philips, Carestream, Fuji and Agfa. 10 Siemens Healthcare GmbH, 2022 HOOD05162003298357 I Effective Date: 10 DEC 2022 Operating Characteristics and Algorithm Performance · Whitepaper Revision History Version Comments 1.0 Contains methodology, results, and conclusions of internal MRMC study and bench testing of the algorithms of AI-Rad Companion Chest X-ray VA2X. 2.0 Revised version including feedback from Customer Relationship Management and Regulatory to include definitions of radiographic findings and findings-specific discussion and conclusions. Updated to performance metrics of algorithm (VA40) and additional clinical validation on cohort including RT-PCR 3.0 confirmed COVID-19 data. References [1] American College of Radiology, 2017. ACR–SPR–STR [5] Prevention of respiratory complications of the surgical practice parameter for the performance of chest patient actionable plan for continued process radiography. improvement. https://journals.lww.com/co-anesthesiology/ [2] RadLex radiology lexicon http://www.radlex.org/ Fulltext/2017/06000/Prevention_of_respiratory_ complications_of_the.22.aspx [3] Hansell, D.M., Bankier, A.A., MacMahon, H., McLoud, T.C., Muller, N.L. and Remy, J., 2008. [6] Fawcett, T., 2006. An introduction to ROC analysis. Fleischner Society: glossary of terms for thoracic Pattern recognition letters, 27(8), pp.861-874. imaging. Radiology, 246(3), pp.697-722. [4] Eltorai, A.E., Bratt, A.K. and Guo, H.H., 2020. Thoracic radiologists versus computer scientists’ perspectives on the future of artificial intelligence in radiology. Journal of thoracic imaging, 35(4), pp.255-259.5 Siemens Healthcare GmbH, 2022 11 HOOD05162003298357 I Effective Date: 10 DEC 2022 AI-Rad Companion Chest X-ray is not commercially available in all countries, and its future availability cannot be ensured. The information in this document contains general tech- nical descriptions of specifications and options as well as standard and optional features which do not always have to be present in individual cases, and which may not be commercially available in all countries. Due to regulatory reasons their future availability cannot be guaranteed. Please contact your local Siemens organization for further details. Siemens Healthineers reserves the right to modify the design, packaging, specifications, and options described herein without prior notice. Please contact your local Siemens Healthineers sales representative for the most current information. Note: Any technical data contained in this document may vary within defined tolerances. Original images always lose a certain amount of detail when reproduced. Siemens Healthineers Headquarters Siemens Healthcare GmbH Henkestr. 127 91052 Erlangen, Germany Phone: +49 9131 84-0 siemens-healthineers.com Unrestricted · Published by Siemens Healthcare GmbH, Germany · © Siemens Healthcare GmbH, 2022

  • ai-rad-companion
  • ai
  • companion
  • artificial
  • intelligence
  • chest
  • xray