Dr. Robert Wachter, Editor, AHRQ WebM&M: Can you describe the notion of trigger tools to the uninitiated?
Dr. David C. Classen: Most safety reporting systems rely on voluntary reporting; however, despite much investment and effort in the best of hands, these systems fail to detect 90%–95% of adverse events. Therefore, the notion of triggers is that most health care workers won't take the time to report adverse events, for example, a very low glucose related to insulin use, either because they're worried about the punitive nature of reporting an event or because they are very busy and just don't have the time to fill out a report—actually the most common reason. But, some things (triggers) might be tracked routinely as part of the patient's care that might indicate that an adverse event had occurred. For example, one trigger that might suggest that an adverse event is occurring is the ordering an antidote. If a patient has a strong response to a sedative, someone might order a drug known to counteract that sedative or send an emergent stop order for the sedative, either of which triggers would indicate that there might be a problem. Our initial work with triggers started with drug safety and then expanded to hospital-acquired infections and then to all adverse events and thus began the development of an overall approach to improve the detection of all types of adverse events with triggers.
RW: I can imagine that some triggers might be essentially perfect correlates of what you're looking for (a positive blood culture looking for evidence of a certain kind of infection) and other triggers less so. How do you differentiate between those two circumstances?
DC: Initially we spent a lot of time at LDS Hospital refining our triggers to make them more specific because we get a lot of false positives. Some were very helpful out of the box. While ordering an antidote like Narcan often was associated with an adverse event, a doubling of blood urea nitrogen (BUN) turned out not to be a very good indicator of a problem. We learned over time to refine this approach and make our triggers more and more specific. Now, we have improved the specificity of triggers by almost an order of magnitude using this experience married to electronic health systems.
RW: In most testing, there are tradeoffs between sensitivity and specificity—as you make it more specific you might actually miss some things. How do you decide where to set that bar?
DC: When we had a home grown electronic medical record at LDS Hospital and we could identify triggers in real time, we decided to cast the net a little bit wider rather than a little bit narrower. We published an article in JAMA back in the 1990s, where we were able to review on a daily basis all the positive medication safety triggers in a hospital with about 600 beds with about 3 hours of pharmacists' time per day. Slowly we have expanded from drug triggers to infection triggers, to ICU triggers, to surgical triggers, to pediatric triggers, and now to a comprehensive trigger approach called the IHI Global Trigger Tool. Finally, we now have automated the Global Trigger Tool within health systems in their commercial electronic medical record (EMR) systems, and although it's still a work in progress in terms of refining the triggers to deal with sensitivity versus specificity, we have expanded this approach to all of the leading EMR vendors at various health systems and demonstrated the proof of concept that these systems can all accommodate automated triggers.
RW: It sounds like you're making a lot of effort to move this from a relatively manual process to one that's electronic. How important is that transition?
DC: At a national level, it's incredibly important because then it allows us to move from sampling a very small percentage of patients retrospectively, which is how the IHI Global Trigger Tool is currently used, to actually doing 100% of sampling of patients in real time. When we were focused retrospectively on triggers, as we had been through most of our IHI work, it was obviously detecting them after they had occurred; thus preventability became an issue. But, when we do it electronically, it's not just preventability—it's the ability to actually intervene in real time and to mitigate rather than just prevent—so it changes the equation.
An example of this is a study we did at LDS Hospital and published in 1998 where we were improving our efforts to reduce adverse drug events. We were able to prevent 70% of them in that study. Even though prior to this work other studies had estimated only 30% to 40% of adverse drug events to be prevented—because we were beginning to identify, intervene, and prevent them in real time. As we have also seen in several recent studies, there is a cascade effect of adverse events in hospital patients—minor events can precede more serious events, in probably about a quarter of the adverse events. So that gave us the perspective that we needed to, as we did this in real time, think more broadly about preventability, moving to the concept of real-time intervention and amelioration.
The more recent automation of global triggers within EMRs allows for real-time detection of adverse events in hospital patients, and it moves patient safety from a retrospective focus on injury, after the fact, to a real-time detection that can not only prevent these events but also allow for amelioration and recovery. This real-time detection allows us to impact a far larger percentage of adverse events (many thought previously to be entirely unpreventable), the majority in fact, rather than the traditional retrospective view of only those adverse events that were clearly preventable (a much smaller percentage). This is a real shift in perspective in patient safety.
RW: Can you give an example of a situation where the electronic record triggers something and then you intervene?
DC: When a patient on antibiotics develops symptoms that might be indicative of Clostridium difficile colitis, it can often be preceded by what we call antibiotic-associated diarrhea. Those patients may not be C. difficile positive when you test them initially. However, if you identify them early when they develop diarrhea on antibiotics (from information in the EMR), it may signal that they are at increased risk for C. difficile and that their antibiotics should be stopped immediately, which we often aggressively did, before they develop a worse problem such as C. difficile colitis. Another example would be patients on anticoagulation. The electronic trigger shows a highly elevated INR level. They haven't started to bleed yet. You can intervene in real time to change the dose and stop or modify the drug dosage. Whereas the old way (retrospectively), we waited until they had their bleed and there's not much we could do about it at that point. But, newer work is using triggers themselves to identify patients at risk for various adverse events, long before they experience it.
RW: The increasing popularity of triggers seems to be part of a broader trend in moving from our focus on identifying errors and mistakes to one that's now identifying adverse events or harm. Can you talk about that evolution and how important you think it is?
DC: I think it's a critical evolution. It may be one reason that, despite an awful lot of effort thrown at improving patient safety over the last 10 years, the amount of improvement is not as much as we'd expected or hoped for. One of the reasons may be that we focused on a lot of errors that never impact a patient. It's part of my view that not all errors are created equal. Some never impact a patient and some can dramatically impact a patient. It's possible, in certain areas, to prevent 90% of errors and have no impact whatsoever on patients because those errors never cause any harm to the patient. The initial focus of patient safety on errors attracted a lot of interest, and led to the idea that we could create an error-free world in health care. I think we're now maturing in our understanding of patient safety, and beginning to focus on identifying events that actually do impact patients. The problem is our reporting systems are not necessarily well set up for this transition. That will be the challenge. The easiest thing to do is to only focus on those more serious events and not focus on all-cause harm. Certainly we've seen that in the focus on reporting of never events and hospital-acquired conditions. The problem is, at least in the recent Department of Health and Human Services (HHS) Office of the Inspector General (OIG) study, serious events represented only 0.6% of all the adverse events detected. Focusing on all-cause harm is a really important transition to make, and it's going to be hard because it's going to put a lot of pressure on improving our surveillance and reporting systems and acknowledging that despite more than 10 years of work since the IOM Report "To Err Is Human," we still have a lot of work to do to truly reduce all-cause harm and improve patient safety.
RW: How do you decide on the calibration between serious events and all harm events, given that there's a tradeoff in terms of how much noise you have to deal with?
DC: That is where future research is probably going to focus. If people experience a minor event, does that increase their risk of other events? Certainly the work we did at LDS Hospital shows that if you experience an inpatient adverse drug event, you increase not only your mortality but also your risk of another event during that hospitalization. We're going to see more and more of that research come to bear over the next 5 years. That will help us understand the epidemiology of those early harm signals that could be used to identify patients at risk for more serious events and lead to closer monitoring and intervention.
RW: Can you describe your recent Health Affairs study and its key findings?
DC: IHI supported two other trigger studies that helped put our Health Affairs study in perspective. One was an OIG study done of Medicare where they looked at adverse events in the Medicare population in hospitals. The other one was a study that was focused on safety in 10 North Carolina hospitals over a 5-year period. In our Health Affairs study we focused on three large teaching hospitals that were all part of three big health systems that were in many ways exemplars of patient safety in terms of programs, interventions, research, and degree of use of electronic medical record use. We went to each hospital to review a similar sample of hospital discharges using highly trained outside reviewers to review charts, patient records, and admissions and to apply the Global Trigger Tool to those medical records.
In our study of these leading hospitals, about a third of the patients were found to have adverse events during our record reviews. The adverse events that occurred in those three settings were of a range of severity. About 2% of them were associated with death, about 3.5% of them were associated with the requirement for life-saving intervention, about 2.8% were associated with permanent harm, about 33% were associated with harm that prolonged the length of stay, and about 57% were associated with harm that was temporary in nature. We found that blood stream infections were very uncommon in our study. It supported the idea that maybe the epidemiology of patient safety has been changing with the interventions that have been going on over the last decade.
In addition, we looked at how good a variety of techniques were to detect these problems. What we found in the overall study is that chart review with IHI Global Trigger Tool detected 354 of 393 of the events, AHRQ Patient Safety Indicators (PSIs) detected 35 of 393 events, and voluntary incident reporting detected 4 of 393 events. Basically, even if you combine the AHRQ PSIs and the voluntary reporting system—they missed 90% of the adverse events detected.
RW: How about the specificity of those different methods?
DC: In an analysis conducted at one of the hospitals looking at the sensitivity and specificity of the major methods, we found that the IHI Global Trigger Tool had a sensitivity of 93.1%. The AHRQ PSIs had a sensitivity of 5.8%. And, we found voluntary reporting picked up no adverse events at that organization. Then we also looked at a larger version of the AHRQ PSIs the Utah and Missouri index, and it had a sensitivity of about 39%. The IHI Global Trigger Tool had a specificity of 100%. The AHRQ PSIs had a specificity of 98.5%, and incident reporting had a specificity of 100%.
RW: So you're certainly not giving anything up in specificity and your net is far broader with this.
DC: Far broader.
RW: Should the other methods of error detection go away?
DC: There are some who believe that those other methods detect so few that they certainly shouldn't be used for measurement. The question is: Can they be used for learning and not necessarily measurement? I think that's still an open question. The difficulty here is that it has traditionally been said that these other methods, such as the AHRQ PSIs and the voluntary reporting, do a better job of picking up more severe events. We unfortunately did not find that in our Health Affairs study.
RW: I've spoken to hospitals and suggested that your data would indicate that if you had a fixed amount of resources to put into error detection and adverse event detection, more should go into promotion of the trigger tool and less into incident reporting. The pushback is that we need this method to know what's happening on the ground, and we've spent a lot of energy encouraging people to speak up. Moreover, they say that there's something that feels different about a provider-reported error in terms of generating conversation and maybe even a narrative and drama than events that are picked up through a more bland computerized review of records. Is there any merit to that?
DC: I think there is. Building a safety culture clearly has a big part in enabling people to report problems in the system and to fix them as we improve patient safety. One argument is that this effort is an essential part of building a culture of safety. Even if it doesn't lead to a huge volume of reports, it may play a cultural role that is just as important. The problem is you can spend an infinite number of resources on building these reporting systems. One strategy may be that we build it but we're not going to spend a fortune on the type of system that we develop. A lot of vendors have grown up in this area with very expensive systems, and that may not be necessary in terms of their cultural or learning benefit. But, a whole industry has grown up around voluntary reporting systems, and another industry has grown up around the use of administrative codes to measure patient safety such as the use of the AHRQ PSIs. These can be expensive systems. It's a great question to ask if these systems pick up less than 10% of adverse events: Is this level of investment worth it?
RW: Talk for a moment about preventability. You've demonstrated very good test characteristics of finding adverse events, and then people instinctively push back and say, "Sure, bad stuff happens, but unless it's preventable who cares?" Can you address that?
DC: In the OIG study, they found about 44% preventable. In the NEJM study in North Carolina, they found about 63% preventable. Right now we're involved in studies of this in EMRs where we're not only looking at preventability but mitigatibility and ameliorability. As we move into the electronic era, we should expand the discussion from sole focus on preventability to include mitigation and amelioration. When we do, then I think we may come back to the article I mentioned from LDS Hospital where 70% of adverse drug events were prevented in this kind of real-time review. Everyone challenged that and said, "Well that's much greater preventability than anyone else has shown with adverse drug events." But, it was a different approach. So the question is: How do we enlarge the perspective here to look at mitigatibility and ameliorability rather than just preventability?
RW: I want to shift to talking about the implications of your work. It strikes me that those three studies that you mentioned (the North Carolina study, the OIG study, and your recent one in Health Affairs) begin to suggest that maybe everybody should be measuring preventable harm through these methods. Maybe it should be required at some level. Then once it's required and we're looking at safety through this lens, then it's obviously a short shot to: this gets publicly reported, this becomes the pillar for pay-for-performance or no-pay-for-errors programs. If this becomes the method for measuring the safety of an individual institution, what happens?
DC: It's very easy to show improvements in safety when you track very few events, for example with voluntary reporting and administrative coding approaches or with a focus on never events. It's easy to make the case that dramatic improvements in safety have occurred when that may not be in fact the case. So, it's fair to say that as we learn and continually iterate in the world of patient safety as we develop better ways to detect safety problems, it would probably be incumbent upon us to build them into our systems and to update our patient safety measurement approaches. The cascading of events that occurred in the Medicare study—more than a quarter of the patients had them and minor events were sometimes followed by more severe events—it makes it easier to make a strong case that we need to start focusing on all-cause harm and create better systems to measure that. From a patient perspective, I know that if I get a very bad reaction to a medication then I'm at increased risk of not only having another during the hospitalization, but I am also at increased risk of dying during that hospitalization. So although people say, "Well, we're not worried about minor events," from the patient perspective I think we do need to be worried about it. It's time to move on to much better safety detection systems because that's the only way we know if all of our patient safety interventions are really making a difference. We've spent an awful lot on this in the last 11 years since the IOM report. How do we know if they're actually working if our measurement systems detect less than 10% of all the events? As this rolls out, what are the public policy implications? As we move down the road to value-based purchasing and hospital-acquired conditions and everything else, we would certainly want to make sure that we have better performance metrics in safety if we're going to start changing reimbursement. But, in addition, as we move down the road to meaningful use and the proof of concept has been done that this can be done in an EMR system, why wouldn't we build certain criteria into meaningful use that would help facilitate the rollout of these types of measurement systems?
RW: So let's jump ahead 3 to 5 years and imagine meaningful use incentivizes hospitals to have, and IT companies to build in, the tools to measure triggers this way and the methodologies have been standardized. Would you favor public reporting of comparative data that allows patients to decide to go to hospital A versus B based on the results of those trigger tools, or do you still think it's not quite ready for prime time?
DC: Doing that in every hospital with their health IT system will probably offer some challenges along the way in making a highly reliable measurement system. The question is when are the systems reliable enough for us to turn it on and start making comparisons. We went through this in the infectious disease world as we began to build standards for hospital-acquired infection definitions, built the system to measure them, and built the system to report them. We have finally arrived now where we're seeing some effective comparisons between hospitals. But, it took us an awful long time to get there, refining and improving definitions, adjusting for severity of illness and risk, improving the way we measured, and then implementing it all within the health IT system. The caution is that this will require fairly complex health IT systems with fairly complex clinical decision support. We published a paper in Health Affairs last year that looked at the challenges of developing complicated decision support systems. We developed a tool that evaluated clinical decision support for medication safety after implementation and found that there was still a lot of work to be done in that area as well.
RW: So, is the issue with reliability partly that a hospital with a better health IT system might actually pick up more stuff and might actually be safer, or at least more attentive to its safety concerns, and yet look worse until you have that wired?
RW: When you say better health IT systems are wired to do this work, do you get out of the chart review phase? When the tool puts out this net, it captures these signals, but then you have to go in and pull a record and review it, whether an electronic or a paper form—a human being has to do that. So is the end stage here that you don't have to do the follow-up chart review?
DC: I don't think we'll get to the end stage any time soon, probably not in our lifetimes, where it's so automated that you don't need a human to review it at all. Because if you look at the infection control world, even with our highly developed infection control surveillance programs within EMRs, we still have an infection control practitioner review that record to make sure it's true. We do that for a lot of our quality reporting, especially stuff that's reported externally, either for regulatory review or for reimbursement. So we'll always have a human review to assess and validate its accuracy. But, the IT systems allow us to expand that review to not just a small percentage of the hospital but for every patient. They also allow us the opportunity not only to document and measure but to intervene as well.
RW: Did you get depressed by the North Carolina study? How do you explain all the work that we've done over the past 10 years if you truly believe that there's been no improvement?
DC: Well, it is depressing that we see these three studies together, which suggests that there may not have been a whole lot of improvement over the last 11 years. I take the long-term view that if you look at what happened in the airlines, it took them decades to fundamentally improve their safety from all the crashes in the 1920s and 1930s—and we're an even more complex environment in health care than in aviation. Clearly there have been improvements because we have reduced hospital mortality rates in the past decade and we have reduced certain types of events such as ventilator-associated pneumonia; however, more recent studies suggest that patients are still experiencing high rates of harm, in hospitals. When the IOM report "To Err Is Human" came out, I don't think that we had a very good lens for detecting patient safety problems—in hindsight I think it was a pretty crude lens. Now that we have a better lens, it will alter our perception of how we're doing. The other thing that doesn't surprise me is that we've spent a lot of effort on reducing errors and reporting systems that may not have had a big impact on actually improving safety. I view this as a multi-decade problem to solve, much as it was for the aviation industry.
RW: Any final thoughts about things in the future that we haven't touched on?
DC: I think we've covered a lot of things that I would expect. I can tell you, having automated this at some health systems, that it does enable you to create almost an air traffic control–type monitoring system that can monitor patients when they have a fully electronic medical record in a way we've never had the opportunity before. I think that may be a perspective-changing view as that work begins to get published.
RW: What does that look like then in that system to someone sitting in air traffic control and sees a hot spot on the 10th floor, what is the feedback loop to mitigate the issue?
DC: The feedback loop is usually that air traffic controller having access to the patient information, seeing a problem, and then communicating with the health care team to say, "Have you seen this problem? Are you aware of it and are you dealing with it?" I think that's going to be a very different view of safety.
RW: What's the usual answer they get when people start doing that?
DC: When we did this at LDS Hospital in the early 1990s, 50% of the time the care team didn't know that the problem was ongoing. More than 20 years later in newer studies using triggers in commercial EMRs, we are still finding similar numbers in terms of percentage of these trigger-identified safety events that the health care team did not know was ongoing. Much work remains to be done to improve patient safety.