Clin Res Cardiol (2023). https://doi.org/10.1007/s00392-023-02180-w

The Influence of Image Quality on Inter-observer Variability of Myocardial Segmentation in Ultra-high Field 7T Magnetic Resonance Imaging of Porcine Hearts with Acute to Chronic Myocardial Infarction
A. Kollmann1, D. Lohr1, M. Ankenbrand2, M. Bille1, M. Terekhov1, M. Hock1, I. Elabyad1, F. Schnitter3, W. R. Bauer3, U. Hofmann3, L. Schreiber1, für die Studiengruppe: DZHI
1Deutsches Zentrum für Herzinsuffizienz, Universitätsklinikum Würzburg, Würzburg; 2Center for Computational and Theoretical Biology (CCTB), Universität Würzburg, Würzburg; 3Medizinische Klinik und Poliklinik I, Universitätsklinikum Würzburg, Würzburg;
Introduction
Cardiac magnetic resonance imaging (cMRI) is considered the gold standard for the assessment of cardiac function and mass. This requires the segmentation of a short-axis stack, which is observer-dependent and time-consuming when done manually. Higher field strengths increase spatial resolution and are expected to improve diagnostic value, however automated approaches for segmentation that exist for clinical use are of limited applicability for preclinical ultra-high field data. In a preclinical study, we investigated acute to chronic stages of infarction in pigs using 7T MRI. In this study we assess the influence of image quality on inter-observer variability in LV segmentation for two human observers and a deep learning (DL) model dedicated to automatic LV segmentation.
 
Methods
The study was approved by local authorities (Government of Lower Franconia). In n=7 animals, myocardial infarction was induced (90 min occlusion of the LAD). Four animals functioned as sham group. All animals received four MR scans over a period of ~65 days. Short axis cine images (resolution: 0.4x0.4 mm2) were generated on a 7T MAGNETOM™ Terra whole-body MRI system (Siemens, Erlangen). Every end-systolic and end-diastolic image was visually scored from 1 (best) to 4 (worst) for three criteria (artefacts, noise, and general image quality). Endo- and epicardium were then manually labelled by two observers (twice by observer 1). Labels of observer 1 and 7T images were used to re-train an existing DL model to segment 7T CINE data of this study. We calculated inter-observer variability for metrics of cardiac function, compared segmentation results using the Dice score, and correlated those with the image quality scores for images of the test set of the DL model.
 
Results
In the image quality assessment of 1060 images, the parameter artefacts was rated 2.6±0.6, noise 1.9±0.7 and general image quality 2.0±0.8, giving a total score of 6.5±1.6 (for examples see Fig. 1). Coefficients of variation (CoVs) for the parameter ejection fraction and mean Dice scores, respectively, were 2.4% and 0.90 for intra-observer comparison, 8.0% and 0.82 for inter-observer comparison of observer 1 and 2, and 6.6% and 0.84 for inter-observer analysis of observer 1 and the DL model prediction, respectively. Pearson correlation plots (Fig. 2) showed an inverse correlation of Dice score and the image quality scores: artefacts, general image quality, and total image score.
 
Discussion
Image quality was heterogeneous but was found to be good overall. CoVs for intra- and inter-observer variability in LV segmentation are in line or lower compared to literature reports. However, these can still be considered somewhat high in the context precision imaging. The DL model performed well on our data. Artefacts appear to have the strongest impact on segmentation quality (measured as Dice score). While noise has been shown to impact DL models for image reconstruction, the impact of noise on segmentation performance appears to be rather low. Higher correlation of each parameter for the comparison between two human observers, indicate that the DL model is less susceptible to differences in image quality than human observers are. Artificially increasing noise or artefacts could enable more conclusive results, providing deeper insights into model limitations and potential influences on their performance.
 
Acknowledgements
BMBF grant: 01E1O1504
Parts of this work will be used in the MD thesis of AK


https://dgk.org/kongress_programme/jt2023/aP2156.html