I have written several posts about how to address the problem of alert fatigue in medication safety screening. Alert fatigue occurs when the signal:noise ratio is so low that clinicians develop a habit of ignoring alerts, thereby potentially missing important alerts when they occur.
In a recent JAMIA paper, Joffe et al describe a novel approach to knowledge acquisition that has the potential to improve the signal:noise ratio. By way of background, the importance of a given alert is likely to depend on the patient context in which it occurs. For example, it might depend on the patient’s age, sex, kidney function, recent lab results, current diagnoses, current medications, etc. In theory groups of experts could sit down and try to write a large number of rules that would determine the context(s) in which a particular alert should be displayed. This process would however be quite labor-intensive. Alternatively, machine learning could be used such that a computer would try to learn the rules based on some training data. To do so, it would need a training set consisting of both input data (a set of features or context variables) and output data (a gold standard human judgment on whether an alert is relevant in a given context). But what features should it use and how well would it work?
In the method described by Joffe et al, the features were selected by having experts review a small number of anemia alerts (18 cases) using a Talk Aloud protocol, in which they verbally expressed the data they reviewed as part of their assessment. The data elements reviewed by at least two experts became the feature set. The experts also provided an opinion on the importance of each alert. A larger training set was then constructed by extracting these same features out of the electronic health record for 100 additional cases of anemia. The experts were also asked to provide an opinion on the importance of the anemia alerts in these additional 100 cases, resulting in a training set of 118 cases. This training data set was then fed to the computer for the machine learning exercise. The results were tested using a test set of 82 additional cases, for which the experts had also provided an opinion of the importance of each alert.
How well did the computer do? For identifying low level alerts, the system had a precision of 0.87, meaning that 87% of the alerts the computer had classified as low level had in fact also been classified as low level by the experts. The recall for identifying low level alerts was 0.37, meaning that 37% of the alerts the experts had classified as low level were also classified as low level by the computer.
These results mean that if the computer classified an alert as low level, there was a good chance it was right, but there were a fair number of low level alerts that the computer classifier essentially missed (i.e. classified as high level instead of low level). It may be reasonable therefore to suppress alerts that a computer classifies as low level, since the computer is likely to be right in these cases. These alerts are more likely to represent noise rather than signal.
This paper therefore represents exciting possibilities for improving the signal:noise ratio of alerts, thereby reducing alert fatigue, and focusing the attention of providers on more important alerts that may have a significant impact on patient safety.