Dr. Robert Wachter, Editor, AHRQ WebM&M: What are some of the fundamental challenges of measuring patient safety?
Patrick Romano: There are a number of fundamental challenges. One is that no single data source is ideal or perfect. Everyone who has looked at multiple data sources has found that, in any one data source, you see problems that aren’t documented in others. Adverse event reporting systems have a tremendous problem with underreporting. With systems based on ICD-9-CM codes, like the Patient Safety Indicators, there are problems with how the codes are generated. With systems based on triggers, there are problems with the inherent performance of those triggers and the reliability of the subsequent evaluation. Multiple approaches are necessary to get closer to concepts that we’re trying to measure.
RW: Why is measurement important? Why is it so important to figure out the answer to this question?
PR: Well, the simple answer is you can't improve what you don't measure. How do you know where your problems are? How do you know what you need to improve? How do you know where to focus your efforts? How do you know whether your efforts have been successful? The only way you know the answers to those questions is by measuring the phenomenon that you're concerned about, in this case, by measuring patient safety and safety-related outcomes.
RW: It strikes me that there is value to providers or the providing system in knowing how they're doing and using their internal motivation to improve. Then there are outside stakeholders that care about safety and want improvement as well. Is there a tension between those different audiences, and if so how do you negotiate that?
PR: There is a tension, and I'm not sure that anyone has really solved that dilemma. Clearly, all of us who work on the front lines as physicians, nurses, and hospital employees recognize that we are in an environment where increasing transparency and accountability are expected. We're in an environment where stakeholders don't want to pay for our mistakes the way they have in the past. The stakeholders have a right to demand better care, to demand care that has fewer defects. It's our duty as health care providers to try to provide that care. From the measurement perspective, we need to have measures that promote transparency and accountability for outside stakeholders but also allow people within health care organizations to drill down and figure out where the problems are, what the root causes are, why failures are occurring, and how to fix those problems. It really requires a multi-modality approach.
RW: You've been one of the leaders in generating the AHRQ Patient Safety Indicators (PSI). Where did the idea come from and what were the early days of that effort like?
PR: The AHRQ Quality Indicators Program emerged out of a data program called the Healthcare Cost and Utilization Project. HCUP works with state health data agencies in more than 40 states to collect data on hospitalizations, emergency visits, and ambulatory surgery episodes. This stream of data has become more comprehensive and more accurate over the years. So there's been increasing demand from the stakeholders asking, how can we use this information to improve the health care system? It was really requests from HCUP partners at the state level that led AHRQ to establish the Quality Indicators Program. The Program started with a fairly limited set of measures that are now called Inpatient Quality Indicators or IQI, including risk-adjusted mortality measures and measures of volume on specific procedures for which there's known to be a strong volume–outcome association. These were indicators that were in the literature that AHRQ simply adapted to HCUP data and put into this specific module.
Over time, AHRQ began to look at ways of expanding the portfolio of quality indicators. New modules were created, including the Patient Safety Indicators. With the PSI, our original charge from AHRQ was to identify specific coded events occurring in hospitals that may represent potentially preventable patient safety events. We use the term “indicators” because we recognize that these are not measures where we can say that every single flagged event represents an act of negligence, omission, or commission. All we can say is that these are indicators of patient safety, and we believe that they're useful when interpreted in that context.
RW: So you began with a charge to take these coded data and figure out the best way within that resource to measure safety, as opposed to starting from a broader construct of how we should measure safety. How do you even begin culling this humongous dataset down to a smaller, manageable set of potential patient safety indicators?
PR: We basically worked on two tracks. The first track was that we did a fairly comprehensive review of the health services research literature to find what others had proposed as potential indicators of patient safety or preventable adverse events. This led us, for example, to the pioneering work of Lisa Iezzoni and colleagues, who developed a system called the Complication Screening Program, or CSP, in the 1990s. It was tested with Medicare data and was found to have potential value, although there were also some limitations. So we tried to borrow the best from that work and from other published studies. We literally went through the ICD-9-CM code book from beginning to end to identify every code that might represent an adverse outcome of inpatient care. Then we went through a somewhat subjective process of reviewing those codes to figure out how likely they were to represent a preventable adverse event versus something that was nonpreventable or even something that wasn't hospital acquired at all.
When we did this original work, hospitals were not reporting on whether diagnoses were present on admission, so we had to make educated guesses based on some preliminary data about whether a specific diagnosis such as hypokalemia, or low potassium, is likely to be hospital acquired or present on admission. Many diagnoses that might represent adverse events were discarded through this process, because of concern that in most cases they either would be present when the patient was admitted to the hospital or they would have been nonpreventable. Then we culled it down to a list of 40 or 50 groups of codes that we thought were candidates for further consideration.
RW: How mindful were you in the early process that these indicators would ultimately be scrutinized by various stakeholders, such as attorneys, or reporters looking to see whether a hospital is having safety problems in their state? Obviously, once you've put these out there there's relatively little control that you have over how they're used.
PR: That was certainly in our minds. Our charge from AHRQ was to consider indicators that would be useful for quality improvement purposes. So the initial focus was on hospital-based quality improvement efforts. That might involve hospitals analyzing their own data, hospital associations, vendors working with hospitals to analyze their data, or it might even involve state or county health agencies running and sharing the data with hospitals. We weren't focused on public reporting initially because the indicators obviously cover a wide spectrum of different events. Some of them aren't readily understood by lay people, and we were not told to constrain our focus on indicators that would have salience to consumers. We were asked to think broadly about what might be useful for informing quality improvement.
RW: How do you know when you're done whether you've done a good job capturing true safety?
PR: That is a critical question, and it's been an incremental process. Each year AHRQ puts resources into this program and prioritizes the activities that they feel would be most helpful in moving the program forward. Initially, the goal was to create the indicators and to put them through an expert panel process. Once we had identified these candidate indicators both from literature review and from going through the code books, we created a set of expert panels with participants from a wide variety of medical specialty organizations and other stakeholders. Then we went through a modified Delphi process, modeled on what RAND had done. It's a two-stage review of the indicators that's informed by review of relevant literature. At the end of it, we come out with scores that represent the panel's assessment of the usability of the indicators for purposes of quality evaluation and improvement. Through that process, we ended up with the final set of 20 hospital-level indicators.
Then through use, we discovered things about the indicators that we didn't realize initially. Two of the original indicators have actually been retired because of evidence that has emerged since their release. Other indicators have been found to have relatively poor predictive value. We've tried to address those problems by educating hospitals, by educating the coding community, in some cases by actually changing the ICD-9-CM codes themselves, and in other cases by re-specifying the indicators—by changing the definition to increase their specificity or their sensitivity.
RW: Can you take us through one of the indicators that was retired and what told you it wasn't working?
PR: One of the ones that we retired was the original PSI-1, which was Complications of Anesthesia. We received information from a variety of sources about this indicator. Users started telling us that the code being used in the construction of this indicator was being applied to code postoperative nausea, which of course is a very common and expected side effect of general anesthesia. This wasn't the type of complication that we were looking for at all. We then took this user feedback and went to several experts in the coding community and said, help us understand how this works in terms of the ICD-9-CM coding. When we went through this question with coding experts, we realized that because of certain codes that were included, postoperative nausea might in fact be correctly coded even when the impact was really trivial. Then we did construct validation work, and found that when you look at the association of these indicators with other hospital outcomes—for example mortality, length of stay, and inpatient cost—that most of the Patient Safety Indicators were clearly associated with much worse outcomes, but Complications of Anesthesia was not. There was no evidence of higher cost, higher charges, or higher length of stay among patients who had that event compared with well-matched patients who didn't have the event. So that told us that whatever we were capturing wasn't really having an impact on other outcomes or on how the patient was being treated in the hospital. The combination of that evidence led us to recommend retirement of the indicator.
RW: The indicator that gets a lot of attention in this field has been "Failure to Rescue." Tell us a little about the history of that indicator and its being built into the PSI list.
PR: Failure to Rescue was a concept originally described by Jeff Silber and colleagues at the University of Pennsylvania. It's an interesting concept—to look at patients who experience some kind of adverse event or complication and then look at whether the hospital was able to rescue the patient, in other words whether the patient ended up living or dying. Failure to Rescue is basically allowing a patient to die after they've experienced a complication.
The advantage of Failure to Rescue is that it inherently adjusts for patient severity to a large degree, because sicker patients get more complications and they get worse complications. When you start with a denominator limited to patients who had complications, then you automatically exclude all the patients who were healthy, all the patients who had routine courses in the hospital, and you focus where the action is—the subset of sicker patients—and you pose the hypothesis that better hospitals should recognize these complications earlier; they should intervene faster and more effectively. If that intervention is effective, then it should lead to rescuing patients so they go home alive. There's been a lot of debate about how to operationalize these Failure-to-Rescue measures. We actually changed the name of the AHRQ measure so it's now called Death Among Surgical Patients with Serious Potentially Treatable Complications, and that title better captures the fact that we're focusing on patients who've experienced specific complications that are of potential interest. We hope that provides a better opportunity for hospitals to look at how they're identifying and treating patients who experience those complications.
RW: Often a health care reporter will go through a state's PSI data and find a hospital that appears to be doing poorly and write an exposé. Do you feel that that's a reasonable use of PSIs? If you read that piece do you cringe a little, or are you happy it's being used for something, or somewhere in the middle?
PR: Well, I think somewhere in the middle. I cringe a little. But how much I cringe depends on how it's described. If people use the data to pose questions, I think that's great. If they can use the data to go to hospitals and say look, we saw that you reported five retained foreign bodies, for example. Or you reported 10 iatrogenic pneumothoraces, tell us about that. What have you done to evaluate those events? What are you doing to try to prevent those events? If it provides a foundation for accountability, for intelligent discussion, questions and answers, then I think it's good. On the other hand, when it's posed as an exposé, when it's presented as a scandal, then it's not constructive. Then it just leads people to get defensive and to figure out how to game their data or how to cover things up.
RW: If you were a patient thinking about going to a hospital, how would you figure out whether it was safe, and what role would looking at the PSIs have for you in your analysis?
PR: Acting as an individual consumer, I probably wouldn't give the PSIs a lot of weight in my own choice of where to go. I say that knowing that a variety of different factors affect how an individual hospital looks on the PSIs. The first is that most of these are inherently rare events, so there's a lot of random variation. In any one year, a hospital could have a run of bad luck on any single indicator that may occur purely by chance. On the other hand, if I see a pattern, a hospital that's looking poor on multiple events, year after year, that is failing to improve at the same rate that other hospitals in the market are improving, then I would ask some serious questions. My message is that this is one source of information, which has to be viewed in a broader context. It has to be viewed in the context of what's going on in terms of quality improvement, what's going on in terms of random noise, and what other information we have about quality at that hospital: information from The Joint Commission, from Hospital Compare, from process measures, risk-adjusted mortality measures, and other sources. One thing we've learned through 20 years in this business is that often hospitals that provide high quality in one domain don't provide quality to the same degree in another domain. It's surprising sometimes how institutions can perform very well, for example, at cardiac care, but not so well on cancer care. If I were interested in a particular condition, if I had cancer or heart disease, then I would probably focus on measures related to those conditions.
RW: As you think about where this world is going over the next 5 or 10 years with more and more institutions having computerized clinical systems, are you hopeful? Where do you see this evolving as the world of clinical care and the way we capture what we do changes?
PR: It's a timely question because AHRQ has funded several pilot projects in which states have added laboratory data to their statewide all payer patient discharge data systems. In these states, the state health data agencies have worked with hospitals to directly transfer the electronic data from their laboratory information systems into the administrative data set. Going forward, I think AHRQ is trying to figure out how to align these different pathways. Clearly with the HITECH program and with the incentives for electronic health record adoption, more and more hospitals will be implementing sophisticated EHR systems. The challenge is that these systems have been well designed to support patient care on a day-to-day basis, but they haven't really been designed to support research applications and quality improvement applications. So the leading vendors are just doing this work right now in partnership with some of our leading health systems. As you know, the National Quality Forum and others are working together on identifying existing quality measures that can be operationalized electronically. I think these trends are converging, and going forward I'm hoping that it will be easier for hospitals to directly dump information that is relevant to estimation of quality indicators. But we have to rethink the quality indicators to prepare for that world, because for example, it will be relatively easy for hospitals to dump laboratory data or even radiologic findings. However, when we think about how physicians document diagnoses and how they go through the process of considering diagnoses, ruling out diagnoses, and identifying alternatives, it's very hard to convert that process into information that can be readily used in risk adjustment. It's really untested. I think that a lot of development work will need to be done to rethink quality indicators, like how to redesign indicators to take advantage of enhanced data and hopefully to reduce some of the biases that affect our current work that's dependent on ICD-9-CM codes.