The new age of AI-assisted echocardiographic reporting forces us to think again about the way in which we assess systolic function, and specifically, how we use ejection fraction

I was recently involved in a research study tracking changes in ejection fraction (EF) and global longitudinal strain (GLS) across serial echocardiograms on the same group of patients. My role was to manually trace the necessary curves for these measurements using AI-supported reporting software, but with AI assistance turned off. You learn a lot about yourself and the way that you think when spending 30+ hours repeatedly tracing around the left ventricle, often on ventricles which show very little visual change from the previous study.

Efficiently working through hundreds of images

The software I was using has a game-changing feature which uses artificial intelligence to classify every view taken. For the user, this means that you simply click a measurement, and the software will automatically present the appropriate video clips and still images for you, rather than you having to go through the study finding them. When using this clinically, particularly for those patients that I find myself triple-checking before finalising (either because they’re a borderline or complex case, or it’s the end of the day and I simply can’t make decisions efficiently anymore!), this saves me so much time and mental energy vs my previous reporting software, where I would have to scroll back and forth through my entire study to have that final last glance at my views again.

Due to this automatic view-sorting, it meant that for this project I could instantly jump to the ‘ejection fraction’ and ‘global longitudinal strain’ sections and begin measuring right away. Since I did not need to find the correct image myself, it also meant that I did not look through the cine loops prior to selecting them. In that sense, I was performing measurements as objectively as a human possibly could, with no prior expectation as to what I thought the EF% or GLS% would be. The only exceptions were cases where impairment was so severe that it was obvious even from flicking back and forth between a few frames; a necessary process for getting one’s eye in to where the endocardial border or mid-myocardial curves are.

Uncovering biases

Once I had performed the required measurements, I would then ‘sense check’ them by looking at the video clips afterward. I soon noticed that my eyeball EF% was frequently a little bit more optimistic about the patient’s function than my measurement was, particularly if the calculated EF% was showing the ventricle to be impaired. In these cases, I felt myself itching to bump the EF up by 2 or 3%. For very high calculated EFs, I sometimes had to battle with the opposite feeling – an urge to revisit my traces and see if I had been too generous.

The way I tackled these temptations, as most people would, was to perform more traces (if multiple captures were available). If two or three separate traces were all within a few millilitres of each other in terms of volume, then it gave me more confidence in my original tracing and I was able to resist any temptation to overrule it based on my eyeball EF.

This internal battle I was repeatedly having really opened by eyes to the biases I have as a human operator. It is true neural networks can have inbuilt biases, but for our purposes – where artificial intelligence is tracing an endocardial curve, for example, rather than making a decision on whether or not to offer someone credit – AI has no preconceived ideas about the ventricle it is going to trace. The AI does not watch the ventricle contracting, form a subjective opinion on left ventricular systolic function, and then place and adjust its curves accordingly. During my borderline cases, especially, I longed to have AI turned on for me!

Overthinking

Most difficult of all was ignoring particular cut-offs engrained in my mind. While I might not give a second thought to an ejection fraction coming out at 52% when I felt it should be 53%, I spent an irrational amount of time worrying over an EF coming out at 54% when I felt it should be 55%, or 35% when I thought it should be 36%, due to the significance of these numbers when reporting a patient’s left ventricular systolic function (LVSF). In these situations, we’re talking about a discrepancy between the measurement and my subjective judgement of a few percentage points (and sometimes as low as 1%), well within the normal intra-operator variability inherent within ejection fraction. Despite this, no amount of reasoning with myself could stop 2% meaning so much more to me in some patients than in others. To AI, of course, there is no special significance at all.

Facing up to human biases

I have always tried my best to approach my work as scientifically as possible, but coming face-to-face with the biases within me made me wonder (and worry) about how many times I must have bumped an ejection fraction up or down over my career, to make it fit in with what I already thought it should be. Was that the right thing to do? Was that the scientific thing to do? And how much time did I waste obsessing over the tiniest of differences?!

It does not take much introspection to understand that my years of clinical work, reporting alone without the assistance of AI, have ingrained in me a ‘when in doubt, pull values towards the mean’ habit that I never even knew I had. Yet, it makes perfect sense: whenever an echocardiographer unexpectedly finds an EF that is very low or very high, they need to be absolutely sure about reporting that number because it might change patient management – but often (primarily due to image quality constraints), we are not as confident as we would like to be.

In the context of suboptimal images and an unexpected result, we feel uneasy, and quite rightly reattempt our measurements – but this time, subconsciously bringing them closer what it more likely to be the case, statistically-speaking. We are drawing on the tens of thousands of patients we have already scanned in our careers, and when things are unusual or unclear, we use that experience to figure out what is the most likely result.

The human-AI partnership

Making measurements with a preconceived idea of what is most likely to be true is human bias in action, yes, but the value of this experience should not be dismissed. The ability to look at the whole picture and interpret results appropriately in context is powerful, and cannot be replicated or replaced by any artificial neural network. AI’s great strength – that it is not influenced by what it ‘thinks’ is right, by arbitrary cut-offs from past guidelines or outliers, is also a risk and a weakness without any human oversight. The software I was using has been built with the complementary strengths of human and machine mind, and its system of plotting measurements along a range and flagging outliers for closer inspection is invaluable for an efficient human-AI partnership.

What AI assistance can also definitely do is safely take away much of that angst I experienced when I was deliberating pointlessly over 1% or 2% discrepancies, freeing up mental energy for more important decision-making. The process I naturally went through to build confidence in uncertain situations (taking repeated measurements) is done by the software automatically, and displayed conveniently for me so that I can see where measurements are clustered tightly together and where there might be outliers deserving a closer look. Finally, it provides an always-available expert second opinion during our times of weakness, such as fatigue, or simply poor image quality hindering our judgements (humans have been shown to be strikingly poor at knowing when our visual grading of LV function is being hampered by poor image quality).

A powerful future

An echocardiographer’s knowledge and experience contributes so much more to the reporting process than our ability to perform linear measurements or trace around endocardial curves. Far from replacing the human expert, the software’s integrated AI frees us up from feeling like machines on a production line in the never-ending backlog of echo requests, and gives us back the time and energy to actually use our expertise.

I was already at a point at which I could not even imagine reporting an echocardiogram without my reporting software’s interface even in its most basic form, but after these tens of hours spent with EF, GLS and my own thoughts, I am even more excited about the partnership between human and AI. Together, echocardiography is about to enter a whole new level in terms of job satisfaction and career progression for echocardiographers, and accuracy and reproducibility for our patients.