Clin Res Cardiol (2022). https://doi.org/10.1007/s00392-022-02002-5

Automatically determined functional and structural cardiac measures are equivalent to human measurements for outcome prediction in chronic heart failure
S.-O. Tröbs1, T. Hauptmann2, D. Velmeden3, J. Söhne3, A. Schuch3, F. Müller1, M. Heidorn3, S. Göbel3, A. Schulz1, T. Münzel3, J. Prochaska1, S. Kramer2, P. S. Wild4
1Zentrum für Kardiologie, Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz; 2Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz; 3Kardiologie 1, Zentrum für Kardiologie, Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz; 4Präventive Kardiologie und Medizinische Prävention, Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz;

Background: Assessment of cardiac structure and function is critical for clinical decision making. However, measurements are time-consuming and highly examiner-dependent Convolutional neural networks (CNN) can acquire echocardiographic parameters with good agreement to human observers automatically and with good reproducibility. A direct comparison of CNN-based with manual measurements regarding outcome in chronic heart failure (HF) has not yet been performed.

Methods: Data from the MyoVasc study (n=3,289; NCT04064450) on chronic HF were analyzed. Comprehensive clinical phenotyping including thorough echocardiography was performed during an investigation in a dedicated study center. Left ventricular ejection fraction (LVEF) and left ventricular mass (LVM) were measured by a pipeline published by Zhang et al. Clinical outcome was assessed by a structured follow-up with subsequent validation and adjudication of endpoints. Worsening of HF was defined as a composite of cardiac death, HF hospitalization and worsening of HF symptoms. HF phenotypes were defined according to the universal definition of HF.

Results: Data from automated and human measurements were available from 2,815 subjects with a median follow-up of 3.3 [2.1/5.0] years. 10.6% of participants were categorized as HF stage A, 29.5% as stage B, and 52.6% as stage C/D. Of the latter, 20.0% were classified as HF with reduced ejection fraction (HFrEF), 23.2% as HF with mildly reduced ejection fraction (HFmrEF), and 41.1% as HF with preserved ejection fraction (HFpEF). Median NT-proBNP concentration was 166 [71/469] pg/ml. In multivariate linear regression analysis with NT-proBNP as dependent variable, automatic (β=-0.50 95%CI[-0.54/-0.46], p<0.0001)  and manual LVEF (β=-0.71 95%CI[-0.75/-0.67], p<0.0001) measures were associated with NT-proBNP concentration after adjustment for age and sex as were automatic (β=0.55 95%CI[0.50/0.60], p<0.0001) and manually (β=0.56 95%CI[0.51/0.61], p<0.0001)  assessed LVM. In multivariable Cox analysis adjusted for age and sex, automatic and manually measurements were predictive of worsening of heart failure (LVEF: HRhuman 0.54 [0.50/0.58] vs. HRauto 0.57 [0.52/0.63]; LVM: HRhuman 1.61 [1.49/1.73] vs. HRauto 1.64 [1.51/1.78], all p<0.0001)  and cardiac death (LVEF: HRhuman 0.93 [0.92/0.94] vs. HRauto 0.93 [0.91/0.94]; LVM: HRhuman 1.0 [1.00/1.01] vs. HRauto 1.01 [1.00/1.01], all p<0.0001). C-indices of the prediction models did not differ significantly for worsening of HF (LVEF: p=0.053, LVM: p=0.95) or cardiac death (LVEF: p=0.89, LVM: p=0.99). Sensitivity analyses showed no significant differences in model performance in HF stage C/D and HF phenotypes (e.g., HFrEF, HFmrEF, and HFpEF) for LVEF and LVM (all p > 0.1).

Conclusions: CNN-based assessment of LVEF and LVM was noninferior to conventional expert measurement in predicting worsening HF and cardiac death. The observation could be confirmed consistently in symptomatic HF as well as across HF phenotypes. This highlights the potential for future use of artificial intelligence in echocardiography.


https://dgk.org/kongress_programme/jt2022/aV9.html