Practice Guidelines and Outcomes Research (Part III): Scientific Results

Author: 
Jane M. Orient, MD
Article Type: 
Feature Article
Issue: 
Winter 1997
Volume Number: 
2
Issue Number: 
1

The trend in the medical literature written for practicing physicians is away from the old-fashioned study with “materials and methods,” “results,” and “discussion” sections. The new “consensus statements” on practice guidelines have a different format. Instead of a hypothesis, there is an objective (for example, “to integrate the realization that peptic ulcer most commonly reflects infection with Helicobacter pylori or the use of aspirin and other nonsteroidal antiinflammatory drugs into a disease management approach”).(1) Instead of a description of the experimental subjects, there is a list of participants — a panel of experts. A careful exposition of the experiment or the observational protocol is replaced by “evidence and consensus process,” and the evidence comes from a computerized literature search. The discussion took place at a committee meeting; the reader is simply presented with the predigested “conclusions.” And instead of listing individual authors, each of whom is supposed to be responsible for the results, the consensus statement is attributed to a committee with one or more spokespersons.

Formerly used primarily as a tool for hypothesis testing (is the experimental group different from the control group?), statistical analysis is now frequently employed to sort individual patients into risk groups for purposes of deciding whether to perform certain diagnostic tests or to offer certain interventions such as hospitalization or surgery. In 1995 alone, JAMA published more clinical prediction rules than it did in a previous four-year period (1981-1984).(2)

The new predominance of articles on practice guidelines, outcomes analysis, and prediction rules shows a major shift in emphasis. (There is even a new journal, the Journal of Clinical Outcomes Management.) The old questions pondered by the clinician were: What is this patient’s diagnosis? What course of action will most probably lead to an optimum outcome for him? Will a particular diagnostic test alter my decision about how to treat this individual? The new questions are focused on rationing care: for example, a method for targeting use of knee radiographs(3) “might reduce knee radiography by 28% for patients with a knee injury.”(2)

Not only the individual patient but the individual physician is being obliterated in the new literature. A committee is interposed between the physician and the primary research, distilling it into the guidelines. (The physician is thus presented with a pony rather than Shakespeare.) Then the patients are resolved, as it were, into a number of vectors in multiple dimensions, which can be manipulated only by powerful computers with sophisticated software, so that even if the physician were presented with the raw data set, it would be incomprehensible.

“Practice guidelines” and “outcomes research” are fundamentally flawed concepts because they ask the wrong questions for the wrong purposes. The purpose of this article is to examine a few specific articles to illustrate problems that are manifest even if the research is considered on its own terms.

 

Carotid Endarterectomy and Some Other Procedures

 

One frequently cited example of the need to determine “what works” is the use of carotid endarterectomy for the prevention of stroke. The procedure became popular for a number of reasons

1. Strokes are so devastating that aggressive means of prevention are worthwhile.

2. There is visible pathology (a plaque in the carotid artery) that can be detected with noninvasive imaging.

3. There is a rationale for believing that physical removal of the pathology should be beneficial.

4. Generous third-party payment is (or was) available both for the imaging procedure and for the surgery.

5. Physicians have reason to fear malpractice litigation if they “neglect” to perform these procedures and the patient does have a stroke.

Carotid endarterectomy has probably been overused, to the detriment of patients as well as insurers (stroke is a complication of the procedure itself). There is now a backlash of sentiment against the procedure. From an internist’s perspective, the issue was always a very complicated one, especially since the most important considerations (the technical skill and clinical judgment of a particular surgeon and the situation of the individual patient) do not fit neatly into a practice guideline.

This problem helps to illustrate two types of literature. One type emphasizes the clinical course of real patients (see the bibliography for the RAND study). The other describes clinical practices and emphasizes “utilization” issues. For example, the widely quoted RAND Corporation study4 describes the clinical practice at four different facilities and the evaluations by an expert panel as to the “appropriateness” of various procedures, including carotid endarterectomy, coronary angiography, and diagnostic upper gastrointestinal endoscopy. A panel of experts concluded that about 35% of carotid endarterectomies were “appropriate,” 32% “equivocal,” and 32% “inappropriate,” there being no significant difference between the “high use site” and the “low use site.” In this study, “appropriate” meant the expected clinical benefit exceeded the expected negative consequences by a wide enough margin to make the procedure worthwhile, in the expert’s estimation, with economic consequences specifically excluded.

The method used by RAND was basically to have a group of nine Monday-morning quarterbacks vote retrospectively on the basis of information abstracted from charts. There were persisting disagreements, despite discussion. The experts were exceptionally well informed, but on the basis of data and technology available in 1981. There have been significant advances since that time.

The RAND study was undertaken to explore the question of whether differences in appropriateness explained geographic variations in the use of medical services. The conclusion was that they did not. The authors also noted the possibility that “low use” might be just as inappropriate as “high use.” But the study only looked at patients who had procedures, not at those who did not.

The conclusions of the RAND study were couched in appropriately cautious scientific terms. The methodology was meticulously defined. The objectives of the study were focused and appropriately modest. A careful reading of the caveats, hedged conclusions, and actual data tables gives little reason for optimism that research of this type will lead to any quantum leap in the quality of clinical decisions, especially for conditions which do not have the thick bibliography of real clinical research comparable to that which informed the experts’ opinions on the three specific procedures in the RAND study. Additionally, it should be noted that the experts looked only at a few procedures, highlighting the probably erroneous assumption that the universe of possibilities is narrowly limited to the ones they viewed (e.g., “surgery” vs. “no surgery”).

 

Practice Guidelines for Common Conditions

 

Commenting on the prospect of a “third revolution in medical care” based on “outcomes management,” Arnold Epstein takes a cautiously positive view but observes that “the production of guidelines will be expensive and time-consuming” and that “in many instances guidelines are not likely to produce morerational or more efficient care.”(5)

The federal government is nonetheless investing substantial research dollars in the development of guidelines. Foreseeing this development, the former dean of the University of Arizona College of Medicine, the late Louis Kettel, M.D., undertook perhaps the first (and one of few) controlled studies of the concept.

 

Algorithms for Twelve Common Chief Complaints

 

The algorithm study was performed in a Veterans Administration out-patient facility staffed by nurse practitioners.(6) Indeed, the initial rationale for practice guidelines was to expand the role of “physician extenders” rather than to direct the practice of physicians.

In the early stages of the study, there were extensive discussions about the algorithms themselves, which applied to 12 common chief complaints. (The algorithms were based on a branched-chain logic, with key findings on history or physical examination at the “nodes” or decision points.) One of the first discoveries was that it was really not easy to write down a protocol even for relatively simple problems like a sore throat. There were a very large number of variables and numerous differences of opinion in the literature. Enthusiasm for discussion paled before we finished the more straightforward complaints, and the algorithms for conditions such as headache and dizziness were simply used as written at another academic institution.

The second discovery was that the nurse practitioners hated the algorithms, the checklists, and the “error messages” sent to them by a remote computer (and distributed to them by a coordinator who was a registered nurse but hadn’t laid hands on a patient for years). The process of enrolling patients in the study was very much slower than anticipated, and the number finally enrolled (1002) was relatively small. There was undoubtedly selection bias at the triage desk where the patients’ chief complaint was ascertained and suitable patients were asked to enroll in the study.

The most important reported result was a decrease in the utilization of diagnostic radiographs, especially spine films in patients with low back pain. This might have resulted from using an algorithm that defined a set of restricted indications for these films. An alternate hypothesis is that the study effectively increased the cost of the radiograph. (At the VA, cost to the patient and practitioner is measured in inconvenience rather than dollars.) Before the algorithm was introduced, the triage nurse sent almost all patients with back pain to the radiology department to get their films while they were waiting. With the algorithms, the triage nurse stopped doing that. (We wouldn’t have needed a computer to implement this particular intervention!)

An incidental result of the study was to cast doubt on the accuracy of checklists in the medical record. The presence or absence of spine tenderness was recorded in 95% of back-pain checklists, but percussion of the spine was observed by the research assistants in only 18 to 35% of the encounters. (Research assistants were in the room and recorded the time spent on each activity.) Examination of the chest and legs was recorded in 80% but observed by research assistants in only 50% of the visits by patients complaining of chest pain.

The use of the algorithms had no effect on measurable outcomes or on the nurse practitioners’ scores on an examination concerning the conditions under study.

 

Patients Hospitalized for Chest Pain

 

Another controlled study, this one concerning 375 patients with a low risk of complications admitted to a coronary care or intermediate care unit for chest pain, concluded that study patients spent about one day less in the hospital and incurred an average of $1400 less in costs, with no adverse effect on mortality or health status at one month.(7) The intervention was to send clinicians concurrent, personalized, verbal and written reminders about the recommendations of a practice guideline. Physicians complied with the guideline in 69% of cases during the intervention period and 50% of cases in the control period.

As in the VA algorithm study, the basic intervention could be described as a “nag” factor, or an increase in the nonmonetary price of care, the need for which is occasioned by the disconnection of the normal price mechanism of a free market.

 

The Evaluation of Abdominal Pain

 

One of the conditions included in the VA algorithm study, abdominal pain, offered the opportunity to compare our branched-chain logic with a linear discriminant rule and a Bayesian method used by other investigators. I applied all three methods to the same set of patients, deriving data either from first-hand evaluation of patients myself or retrospective review of the clinical record. The result was that the unaided clinician did better than all of the protocols in making the right diagnosis or distinguishing a benign, self-limited condition (“nonspecific abdominal pain”) from a specific pathologic condition and thereby prescribing the correct course of action (e.g., surgery or “expectant observation”). Clinicians included VA nurse practitioners and physicians in training.(8)

 

Chronic Obstructive Pulmonary Disease

 

Because exacerbation of chronic obstructive pulmonary disease is such a common reason for visits to the Tucson VA ambulatory care section, I attempted to develop a protocol for deciding whether or not such patients needed hospital admission or could be safely discharged after treatment in the emergency room.(9) The conclusion was that the data routinely collected during an emergency visit were not reliably predictive of the patient’s clinical course except in the most severe cases. (These data did not include vital capacity or forced expiratory volume; the equipment for measuring these values was seldom available. However, most of the patients did have arterial blood gases.)

If the consensus-based criteria of the Peer Standards Review Organization (PSRO), used by Medicare reviewers at the time of the study to “justify” or deny hospital admission, had been applied to these patients, 14% would have been admitted unnecessarily and 26% would have been discharged inappropriately. (“Inappropriateness” is a judgment call; of the 26%, 52% demonstrably failed out-patient treatment and 48% were admitted by a clinician when they first presented for an episode of illness, almost all of them remaining in the hospital for longer than 72 hours despite the fact that a brief admission meant considerably less work for the housestaff.) The PSRO blood-gas criteria did not actually apply to 62% of patients because their baseline status was so severely impaired.

Other investigators have developed a multivariate model for identifying patients at relatively high risk of hospital admission for decompensated chronic obstructive pulmonary disease.10 They considered “high risk” to be a greater than 20% probability of admission as calculated from a logistic regression. The specificity of the model was low; most “high- risk” patients were not actually admitted to the hospital. Four years of study did not result in a clinically useful tool. The authors concluded that “the safety and efficacy of such a program should be confirmed by a randomized, prospective clinical trial.”

 

A General Review of Practice Guidelines and Outcomes Research

 

In answer to the question “Are we building reform on an untested foundation?” Jerome Kassirer, MD, editor of The New England Journal of Medicine, concluded that “the data we have on guidelines so far are not very convincing with respect to the efficacy of guidelines on practice.”(11)

Specific problems(12) include these:

Only a few algorithms have been assembled after quantitative analysis of the decision under consideration. Objective assessments of the approach in actual clinical practice are rare. “Testing” a practice guideline doesn’t necessarily mean a controlled test. It may mean simply asking whether external reviewers have judged the conclusions reasonable and whether clinicians have found the guidelines “applicable” in practice. “Explicit frameworks for judging the strength of recommendations” are described. “The weaker the underlying evidence, the greater the argument for actually testing the guideline to determine whether its application improves patient outcomes.”(13) In other words, if the authorities find the evidence to be strong, there is no need to test the guidelines on real patients.

Algorithms are better suited to simple clinical problems and are “not optimal” for application to complex or controversial problems. Relying on published algorithms may lead to overconfidence.(14) There is a bias implicit in the conduction, reporting, review, and publication of the results of research studies on diagnostic testing. Sensitivities and specificities that are high are more likely to be published, and it is treacherous to extrapolate results to other populations, especially those having a lower probability of disease.

Even in the test populations, the reliability of decision rules may not be very high. Depending on the thresholds used, the classification of subjects undergoing coronary angiography into groups with or without coronary disease had an error rate of 20 to 40%.(14) Although decision rules may have “come of age,” in only a very few have the effects of clinical use been prospectively tested.(2)

Most of the literature concerning practice guidelines and outcomes assessment is framed in terms of cost-effectiveness and cost-benefit analysis. The methodology used in these studies is far from rigorous, as shown in a review of 77 reports published between 1978 and 1987.(15) Only three articles adhered to six widely accepted fundamental principles of analysis. The articles adhered to a median of only three principles, and there was no improvement since 1978. The principles demanded the following: 1. explicit statement of a perspective for the analysis (e.g., that of insurers, hospitals, patients, physicians, or society); 2. explicit description of the benefits of the program or technology; 3. specification of the costs considered; 4. discounting for differential timing of costs and benefits; 5. sensitivity analysis to test important assumptions; and 6. calculation of a summary measurement of efficiency. The authors of this review cautioned that, due to methodologic defects, the efforts of the Health Care Financing Administration to incorporate cost-benefit analyses into coverage decisions “may not result in increased efficiency of the health care system.”

As one physician stated, “clinicians should not be forced to conform to protocols (or regulations either) and should not be judged by them until there is some evidence that these protocols (or regulations) are in fact superior to the clinician in some significant way. In the present situation, the evidence is to the contrary.”(16) Of course, guidelines are generally accompanied by a caveat that physicians must be able to override them. But physicians incur increasingly severe penalties for so doing; under the Kassebaum-Kennedy legislation, the provision of “medically unnecessary” services may actually be a crime. The “possible failure of education alone to sustain substantial changes in physician practice patterns” has been noted — “when direct feedback of guideline recommendations was withdrawn, physician practice reverted to that observed before initiation of the intervention.”(7)

This is not to say that research into practice guidelines is altogether useless. They may provide a method for studying clinical findings of patients as they occur in real situations; for determining which findings are the best predictors of outcomes; and for helping students and practitioners refine their skills.

Although protocols have yet to successfully rival the performance of an expert diagnostician or even an experienced nurse practitioner, they might prove very useful in helping patients to become more intelligent customers. Self-management for benign conditions (such as the common cold) could potentially reduce demand for medical services that are of marginal value. Providing patients with guidelines about self-management can lower rates of use of service by 7 to 17%. Guidelines can also alert patients to questions that they should ask, preventive measures that they should undertake, danger signs that demand immediate attention, and alternative treatment methods. Patients given information and alternatives have been shown, on average, to select less invasive (and less expensive) strategies than their physicians.(17)

Obviously, self-treatment has many pitfalls. But if a protocol is so reliable that it can be used to second-guess a clinician and deny payment for his services, why is a clinician needed at all? Many patients are at least as well educated as the clerks who assess physician “compliance” with guidelines (such as criteria for hospital admission), and patients have first-hand knowledge of their own condition. Patient algorithms are regularly published, for example, in the magazine Patient Care, and at least one compendium of guidelines for the layman is now available.(18)

 

Conclusions

 

“Outcomes assessment” and “practice guidelines” can be used as coercive tools for rationing medical services. Given current methods, they are not to be mistaken for scientific research into “what works” in clinical medicine. Still less do they represent authoritative guides to good practice; indeed, their implementation could cause a significant deterioration in the quality of medical care. It certainly would provide a barrier, perhaps an insurmountable one, to innovation.

On the other hand, clinical algorithms based on existing knowledge can be an educational tool, a method of increasing patient information. Better patient information is essential to true reforms intended to restore patients to their rightful place as the primary beneficiaries and purchasers of medical services. Patients, not remote bureaucratic committees, should make decisions about their medical care, ideally with the counsel of their chosen physician.

The continued availability of excellent medical care (and advanced medical technology) depends upon patients regaining control. A reduction in medical costs (which is not to be mistaken for a reduction in expenditures due to rationing of services) depends upon allowing patients to reap the financial rewards of their own cost-saving decisions.(19) The prudent consumer will not knowingly trade his life or health to pad his savings account. He will want more information, in the context of an honest statement of its pitfalls and uncertainties and caveats.

In the hands of patients and physicians, information technology can be a powerful tool. But if used to promulgate and enforce pseudoscientific “guidelines” placed in the hands of rationers and prosecutors, it could destroy American medicine.

 

References

 

1. Soll AH for the Practice Parameters Committee of the American College of Gastroenterology. Medical treatment of peptic ulcer disease. JAMA 1996;275:622-629.

2. Wasson HJ, Sox HC. Clinical prediction rules: have they come of age? JAMA 1996;275:641-642.

3. Stiell IG, Greenberg GH, Wells GA, et al. Prospective validation of a decision rule for the use of radiography in acute knee injuries. JAMA 1996;275:611-615.

4. The Appropriateness of Selected Medical and Surgical Procedures, edited by Chassin M, et al., Association for Health Services Research and Health Administration Press, 1989.

5. Epstein AM. The outcomes movement: will it get us where we want to go? N Engl J Med 1990;323:266-270.

6. Orient JM, Kettel LH, Sox HC Jr., et al. The effect of algorithms on the cost and quality of patient care. Medical Care 1983;21:157-167.

7. Weingarten SR, Reidinger MS, Conner L, et al. Practice guidelines and reminders to reduce duration of hospital stay for patients with chest pain. Ann Intern Med 1994;120:257-263.

8. Orient JM. Evaluation of abdominal pain: clinicians’ performance compared with three protocols, South Med J 1986;79:793-799.

9. Orient JM. When do patients with chronic obstructive lung disease need hospital admission? Reflections based on a VA experience. South Med J 1983;76:593-602.

10. Murata GH, Gorby MS, Kapsner CO, et al. A multivariate model for predicting hospital admissions for patients with decompensated chronic obstructive pulmonary disease. Arch Intern Med 1992;152:82-86.

11. Kassirer JP. Are we building reform on an untested foundation? An interview in The Internist, January 1993, pp. 16-19.

12. Kassirer JP, Kopelman RI. Diagnosis and decisions by algorithms. Hospital Practice, March 15, 1990, pp. 23-31.

13. Hayward RS, Wilson MC, Tunis SR, et al. for the Evidence-Based Medicine Working Group. User’s guides to the medical literature. VIII. How to use clinical practice guidelines. A. Are the recommendations valid? JAMA 1995;274:570-574.

14. Detrano R, Janosi A, Steinbrunn W, et al. Am J Cardiol 1989;64:304-310.

15. Udvarhelyi S, Colditz GA, Epstein AM. Cost-effectiveness and cost-benefit analysis in the medical literature. Ann Intern Med 1990;116:238-244.

16. Sapira JD. The moloch of the quick fix. South Med J 1986;79:791-792.

17. Fries JF, Koop E, Beadle CE, et al. Reducing health care costs by reducing the need and demand for medical services. N Engl J Med 1993;329:321-325.

18. Fries JF, Vickery DM. Take Care of Yourself: the Healthtrac Guide to Medical Care, 4th edition, Addison & Wesley, 1989.

19. Goodman J, Musgrave G. Patient Power: Solving America’s Health Care Crisis, Cato Institute, 1992.

 

Dr. Orient practices internal medicine in Tucson, Arizona, and is the Executive Director of the AAPS. Her address is 1601 N. Tucson Blvd., Suite 9, Tucson, AZ 85716.

Originally published in the Medical Sentinel 1997;2(1):19-22. Copyright ©1997 Association of American Physicians and Surgeons.

 

 

 

No votes yet

It is now legend the AAPS legally lanced the secret task force and pulled its secrets...into the sunshine. It destoyed the Health Security Act.


The Oath of Hippocrates
and the Transformation of Medical Ethics Through Time


Patients within a managed care system have the illusion there exists a doctor-patient relationship...But in reality, it is the managers who decide how medical care will be given.


Judicial activism...the capricious rule of man rather than the just rule of law.


The largest single problem facing American medicine today is the actions of government...


The lessons of history sagaciously reveal wherever governments have sought to control medical care and medical practice...the results have been as perverse as they have been disastrous.


Children are the centerpiece of the family, the treasure (and renewal) of countless civilizations, but they should not be used flagrantly to advance political agendas...


Prejudice against gun ownership by ordinary citizens is pervasive in the public health community, even when they profess objectivity and integrity in their scientific research.


The infusion of tax free money into the MSA of the working poor give this population tax equity with wealthier persons...


It was when Congress started dabbling in constitutionally forbidden activities that deficit spending produced a national debt!


Does the AMA have a secret pact with HCFA?


The lure of socialism is that it tells the people there is nothing they cannot have and that all social evils will be redressed by the state.


Canada's fatal error — Health Care as a Right!


The Cancer Risk from Low Level Radiation: A Review of Recent Evidence...


...Moreover, the gun control researchers failed to consider and underestimated the protective benefits of firearms.


Vandals at the Gates of Medicine — Have They Been Repulsed or Are They Over the Top?