The Holy Grail of Statistics
As dissatisfied as activists were with AZT and its chemical cousins, in practice it was impossible for activists not to invest their energies in staking out positions on the use of antiviral drugs that were most readily available. The problem was that the data concerning combinations of nucleoside analogues such as AZT and ddC did not lend themselves to unambiguous interpretation. ACTG 155, Fischl's study of the AZT/ddC combination, was an important and telling illustration: debates about the significance of ACTG 155 demonstrated the interpretative flexibility that could make AIDS clinical trials a source of uncertainty rather than the mechanism by which uncertainty
was resolved. "People believe passionately on both sides of the question as to whether [the combination therapy] worked," Deborah Cotton, the dissenter on the Antiviral Advisory Committee, commented more than a year after the Berlin conference. "There are schools of thought. There are the believers and the nonbelievers, and … I don't think anything is going to change. I really don't."
The ACTG 155 researchers were quick to insist, and the published study was careful to point out, that the controversial subgroup analysis presented by Fischl in Berlin was not "post-hoc": "The three CD4 cell count subgroups were specified by the study chairpersons in June 1992, which was before any interim review of the primary end-point data," Fischl and her coauthors argued. But this did not settle the issue, because the subjects were not randomized into treatment arms from the outset according to this particular breakdown in CD4 counts. Biostatisticians are typically cautious about pulling subsets out of clinical trial populations after randomization—or at least, cautious about overinterpreting what they find. Susan Ellenberg, formerly the chief biostatistician for the ACTG trials at NIAID, praised the TAG activists' suspicion of subgroup analyses: "I think that that is the kind of thing that they have learned. They have become very methodologically astute," reflected Ellenberg.
Declaring that "several potential explanations exist for the overall findings in our study," the authors in fact presented a series of arguments for why combination therapy might be advantageous even though the trial seemed to suggest otherwise. Because patients had taken AZT before enrolling in the study, it was possible that they were already resistant to that drug. In that sense, what was really being studied was not combination therapy but ddC alone. Or perhaps the best time for combination therapy was simply earlier in the progression of HIV disease, before certain phenotypic changes in the virus associated with late-stage AIDS occurred. Finally, the report pointed to the consequences of "intent-to-treat" analysis. Because of the way the protocol was worded, many patients who experienced strong side effects from either AZT or ddC were taken off all their drugs. But in accordance with the logic of intent-to-treat, these patients were still counted as part of the treatment arm to which they were originally randomized. The effect was potentially to blur the difference between the different arms, creating a higher standard for establishing efficacy. But biostatisticians argued that to remove from analysis the patients who go off a drug can bias trial results, because there is no reason to assume that people who go off medications are typical of those who
remain. Some clinical researchers found the logic maddening all the same: How could they study the effects of a drug in patients who weren't even using it? "As a clinician, what I really want to know is, is a drug working while a patient is taking it—not six months after he stopped taking it," commented Martin Hirsch, a virologist and one of the principal investigators on ACTG 155.
The debate over intent-to-treat analysis was a perfect example of the clash in perspective between infectious-disease researchers and biostatisticians. "I don't think that statistics is the holy grail," complained Hirsch, at the same time affirming his strong general support for the methods of the randomized clinical trial. "Many of us clinicians think that an 'as-treated' analysis … gives us at least as much information that is useful clinically as does 'intent-to-treat.'" And indeed, the authors of the published report on ACTG 155 undertook "an exploratory analysis to gain insight into the possible association between early treatment cessation and treatment outcome." That is, they performed an "as-treated" analysis, ceasing to count people in the study two months after they stopped their treatment. The results: "combination therapy was associated with a significantly lower rate of disease progression or death than were either [AZT] or [ddC] monotherapy." In other words, from this standpoint, the study was quite simply a success, and combination therapy worked . Yet "caution should be used when interpreting this exploratory analysis," the authors quickly added, because "this type of analysis is known to be biased."
For all the careful disclaimers, what was noteworthy about the published report on ACTG 155 was how intent it seemed to be on reaffirming the conclusion from which everyone began—that combination therapy really was better. Douglas Richman insisted that the results from ACTG 155 were "compatible" with those from ACTG 106; he added that ACTG 106 had now been corroborated by the final results from the Burroughs Wellcome study of the AZT/ddC combination. By contrast, Mark Harrington of TAG argued that "the high baseline CD4 group (150–300)—the one with the claimed 'benefit' of combination therapy—was exactly the arm with the fewest clinical events, thus the least [statistical] power." Harrington's bottom line: "Combination therapy with AZT/ddC in the 155 population is 50% more toxic and no more effective than monotherapy with AZT alone."
What were the true "results" of ACTG 155? How, and in what way, can we take data produced by such an experiment and apply
them to the real-world dilemmas of patients who demand answers? The point here is that clinical trials do not occur in a vacuum—and when the environment in which trials are conducted and interpreted is so contentious, then these experiments, rather than settling controversies, may instead reflect and propel them. Consider the range of factors and pressures that structured the determination of the "meaning" of ACTG 155 as well as that of its precursor, ACTG 106: the methodological (and jurisdictional) disputes between infectious-disease researchers and biostatisticians; activist demands for access to drugs, plus or versus activist conceptions of "good science"; the social construction of hype; the profound need experienced by patients, and the kinds of pragmatic decisions that patients and research subjects make in response to their immediate perceptions of their interests; the marketing strategies of pharmaceutical corporations and the incentive structures to which these companies respond; the complicated role of practicing physicians in interpreting the data produced by clinical trials; the politics of regulation and deregulation; and the distinctive character of regulatory science as practiced by expert advisory bodies. In such an environment—given these stakes—is it any wonder that the interpretation of key trial results is often up for grabs? AIDS trials are not unique in this regard, as studies of cancer trials make clear. But insofar as the participation of knowledge-empowered activists increases the number of claims-makers and alters the distribution of credibility among them, AIDS trials may be particularly inclined toward conflicting readings.