HER2 Support Group Forums - View Single Post - Batch Processing of tumor biopsies for cell markers

gdpawel · 07-03-2013, 01:04 PM

Run Batch Effects Potentially Compromise the Usefulness of Genomic Signatures for Ovarian Cancer

Keith A. Baggerly, Kevin R. Coombes, and E. Shannon Neeley

Department of Bioinformatics & Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, TX

JCO March 1, 2008:1186-1187; DOI:10.1200/JCO.2007.15.1951.

Editor:

A major goal of personalized medicine is to predict, before administering a treatment, whether the patient will respond to it. Recently, Dressman et al1 presented an approach that appeared to move us toward this goal in the context of ovarian cancer. Using microarray expression profiles, they first identified a set of genes that could differentiate between patients who did (CR) and did not (NR) respond to primary platinum-based chemotherapy. Then, following Bild et al,2 they scored each tumor for the levels of five different oncogenic pathways. They reported that three pathways (Src, E2F3, and Myc) stratify the NRs into subgroups with significantly different survival characteristics, suggesting how further therapies might be targeted for these patients.

We examined these data in order to help investigators at our institution make better use of this approach. We were unable to reproduce the results reported, and the structure that we did find appears driven far more by run date than by clinical response. Our findings are outlined here; supplementary reports (ovca01-ovca07) provide details.

(1) The posted mapping of numbers to samples is scrambled (eg, numbers from sample 872 are labeled as belonging to sample 2476). Only 32 of the 119 mappings are correct. Three quantifications were derived from raw array data (CEL files) not used for this study (ovca01). Whether this scrambling is fatal depends on when it occurred, which we cannot assess.

(2) We assembled clinical data by combining information on subsets of the samples from Dressman et al,1 Bild et al,2 and Berchuck et al.3 Survival status changes for 15 patients in going from Bild et al2 to Dressman et al1 revealed that 14 CR patients shifted from Alive to Dead, and one NR patient shifted from Dead to Alive. Information from Berchuck et al3 suggests that the Bild et al2 annotation is correct (ovca02).

(3) We identified 107 Affymetrix (Santa Clara, CA) probeset IDs corresponding to the “best” 100 genes reported by Dressman et al1; ambiguities in annotation led to some duplication (ovca03).

(4) The CEL files can be grouped into clearly separated batches on the basis of run date. Response and survival are confounded with run date, particularly with the samples processed earliest (ovca04).

(5) We contrasted the CR and NR samples, gene by gene, using two-sample t tests. P values from the reported “best 100” genes are uniformly distributed, suggesting results no better than chance. Clustering based on these genes fails to separate CRs from NRs. There is some evidence of differential expression in the set of all genes. Gene-by-gene analyses of variance, however, suggest strong batch effects for almost every gene. After correcting for these batch effects, separation between CRs and NRs drops to low levels (ovca05).

(6) Using data from Bild et al,2 we computed our own pathway scores for each tumor sample. Our pathway gene lists differ slightly from those of Bild et al2 due to differences in array processing (Affymetrix Microarray Analysis Suite, v.5.0 in Bild et al,2 robust multi-array analysis here). These scores are relatively robust with respect to the precise gene list selected, but they show clear confounding with run batch. After correcting for batch, the scores change substantially (ovca06).

(7) Finally, we looked for differences in survival as a function of dichotomized (high/low) pathway scores. For each pathway, we looked at results for three patient subgroups (NR, CR, and all) using all combinations of (a) our quantifications or those reported, (b) our gene list or those reported, (c) ignoring or correcting for run batch, and (d) censoring according to Dressman et al1 or Bild et al.2 After correcting for batch, the only contrasts that remain modestly significant involve E2F3 and the patient subgroups of CRs or all, though the P values do not take multiple testing into account (ovca07).

Batch effects are common in large-scale expression studies, but are not commonly addressed. When such batches are confounded with biologic contrasts of interest, problems can arise. Fortunately, as noted in Ransohoff,4 these problems can be somewhat circumvented through good experimental design. Further, batch effects can be modeled if we remember to look for them.

We would be delighted if the methods and results outlined in Dressman et al1 worked. Unfortunately, based on the results shown in this analysis, we are not yet persuaded that either the signature or the pathway stratification will lead to better patient care. While there may be differences due to pathway activation, run batch effects provide an alternative explanation here, and in our experience, batch effects are often larger than biologic ones.

Details of our analysis, including our code, documentation, figures, and results are available from http://bioinformatics.mdanderson.org...proRsch-Ovary/

Given the software (freeware statistical package R version 2.5.1) and our code, all the results we report are reproducible.

References:

1. Dressman HK, Berchuck A, Chan G, et al: An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 25:517-525, 2007

2. Bild A, Yao G, Chang JT, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353-357, 2006

3. Berchuck A, Iversen ES, Lancaster JM, et al: Patterns of gene expression that characterize long term survival in advanced serous ovarian cancers. Clin Cancer Res 11:3686-3696, 2005

4. Ransohoff DF: Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 5:142-149, 2005

Note: In this case, the problem is much more serious. This was the Duke University group (Potti the lead, dishonest, fraudulent investigator), who's work was later exposed as being grossly fraudulent and this paper, as well as many others from this group, was formally retracted.

http://jco.ascopubs.org/content/30/6/678

The results of a particular clinical trial may be true only for the batch that is used in the study. For example, batch effects, introduced by profiling samples on different days using different lots of reagents or at different sites, can introduce variations and confound such analyses. These considerations require reproducing the classification results in independent test cohorts of samples and using multiple hypothesis testing correction methods. In other words, if batch effects are not controlled, it may lead to spurious findings (Genes & Dev. 2011.25:534-555).

07-03-2013, 01:04 PM	#4
gdpawel Senior Member Join Date: Aug 2006 Location: Pennsylvania Posts: 1,080	Run Batch Effects Potentially Compromise the Usefulness of Genomic Signatures Run Batch Effects Potentially Compromise the Usefulness of Genomic Signatures for Ovarian Cancer Keith A. Baggerly, Kevin R. Coombes, and E. Shannon Neeley Department of Bioinformatics & Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, TX JCO March 1, 2008:1186-1187; DOI:10.1200/JCO.2007.15.1951. Editor: A major goal of personalized medicine is to predict, before administering a treatment, whether the patient will respond to it. Recently, Dressman et al1 presented an approach that appeared to move us toward this goal in the context of ovarian cancer. Using microarray expression profiles, they first identified a set of genes that could differentiate between patients who did (CR) and did not (NR) respond to primary platinum-based chemotherapy. Then, following Bild et al,2 they scored each tumor for the levels of five different oncogenic pathways. They reported that three pathways (Src, E2F3, and Myc) stratify the NRs into subgroups with significantly different survival characteristics, suggesting how further therapies might be targeted for these patients. We examined these data in order to help investigators at our institution make better use of this approach. We were unable to reproduce the results reported, and the structure that we did find appears driven far more by run date than by clinical response. Our findings are outlined here; supplementary reports (ovca01-ovca07) provide details. (1) The posted mapping of numbers to samples is scrambled (eg, numbers from sample 872 are labeled as belonging to sample 2476). Only 32 of the 119 mappings are correct. Three quantifications were derived from raw array data (CEL files) not used for this study (ovca01). Whether this scrambling is fatal depends on when it occurred, which we cannot assess. (2) We assembled clinical data by combining information on subsets of the samples from Dressman et al,1 Bild et al,2 and Berchuck et al.3 Survival status changes for 15 patients in going from Bild et al2 to Dressman et al1 revealed that 14 CR patients shifted from Alive to Dead, and one NR patient shifted from Dead to Alive. Information from Berchuck et al3 suggests that the Bild et al2 annotation is correct (ovca02). (3) We identified 107 Affymetrix (Santa Clara, CA) probeset IDs corresponding to the “best” 100 genes reported by Dressman et al1; ambiguities in annotation led to some duplication (ovca03). (4) The CEL files can be grouped into clearly separated batches on the basis of run date. Response and survival are confounded with run date, particularly with the samples processed earliest (ovca04). (5) We contrasted the CR and NR samples, gene by gene, using two-sample t tests. P values from the reported “best 100” genes are uniformly distributed, suggesting results no better than chance. Clustering based on these genes fails to separate CRs from NRs. There is some evidence of differential expression in the set of all genes. Gene-by-gene analyses of variance, however, suggest strong batch effects for almost every gene. After correcting for these batch effects, separation between CRs and NRs drops to low levels (ovca05). (6) Using data from Bild et al,2 we computed our own pathway scores for each tumor sample. Our pathway gene lists differ slightly from those of Bild et al2 due to differences in array processing (Affymetrix Microarray Analysis Suite, v.5.0 in Bild et al,2 robust multi-array analysis here). These scores are relatively robust with respect to the precise gene list selected, but they show clear confounding with run batch. After correcting for batch, the scores change substantially (ovca06). (7) Finally, we looked for differences in survival as a function of dichotomized (high/low) pathway scores. For each pathway, we looked at results for three patient subgroups (NR, CR, and all) using all combinations of (a) our quantifications or those reported, (b) our gene list or those reported, (c) ignoring or correcting for run batch, and (d) censoring according to Dressman et al1 or Bild et al.2 After correcting for batch, the only contrasts that remain modestly significant involve E2F3 and the patient subgroups of CRs or all, though the P values do not take multiple testing into account (ovca07). Batch effects are common in large-scale expression studies, but are not commonly addressed. When such batches are confounded with biologic contrasts of interest, problems can arise. Fortunately, as noted in Ransohoff,4 these problems can be somewhat circumvented through good experimental design. Further, batch effects can be modeled if we remember to look for them. We would be delighted if the methods and results outlined in Dressman et al1 worked. Unfortunately, based on the results shown in this analysis, we are not yet persuaded that either the signature or the pathway stratification will lead to better patient care. While there may be differences due to pathway activation, run batch effects provide an alternative explanation here, and in our experience, batch effects are often larger than biologic ones. Details of our analysis, including our code, documentation, figures, and results are available from http://bioinformatics.mdanderson.org...proRsch-Ovary/ Given the software (freeware statistical package R version 2.5.1) and our code, all the results we report are reproducible. References: 1. Dressman HK, Berchuck A, Chan G, et al: An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 25:517-525, 2007 2. Bild A, Yao G, Chang JT, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353-357, 2006 3. Berchuck A, Iversen ES, Lancaster JM, et al: Patterns of gene expression that characterize long term survival in advanced serous ovarian cancers. Clin Cancer Res 11:3686-3696, 2005 4. Ransohoff DF: Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 5:142-149, 2005 Note: In this case, the problem is much more serious. This was the Duke University group (Potti the lead, dishonest, fraudulent investigator), who's work was later exposed as being grossly fraudulent and this paper, as well as many others from this group, was formally retracted. http://jco.ascopubs.org/content/30/6/678 The results of a particular clinical trial may be true only for the batch that is used in the study. For example, batch effects, introduced by profiling samples on different days using different lots of reagents or at different sites, can introduce variations and confound such analyses. These considerations require reproducing the classification results in independent test cohorts of samples and using multiple hypothesis testing correction methods. In other words, if batch effects are not controlled, it may lead to spurious findings (Genes & Dev. 2011.25:534-555).