The process for evaluating chemical safety is inefficient, costly, and animal intensive. There is growing consensus that the current process of safety testing needs to be significantly altered to improve efficiency and reduce the number of untested chemicals. In this study, the use of short-term gene expression profiles was evaluated for predicting the increased incidence of mouse lung tumors. Animals were exposed to a total of 26 diverse chemicals with matched vehicle controls over a period of three years. Upon completion, significant batch-related effects were observed. Adjustment for batch effects significantly improved the ability to predict increased lung tumor incidence. For the best statistical model, the estimated predictive accuracy under honest five-fold cross-validation was 79.3% with a sensitivity and specificity of 71.4 and 86.3%, respectively. A learning curve analysis demonstrated that gains in model performance reached a plateau at 25 chemicals, indicating that the size of the current data set was sufficient to provide a robust classifier. The classification results showed a small subset of chemicals contributed disproportionately to the misclassification rate. For these chemicals, the misclassification was more closely associated with genotoxicity status than efficacy in the original bioassay. Statistical models were also used to predict dose-response increases in tumor incidence for methylene chloride and naphthalene. The average posterior probabilities for the top models matched the results from the bioassay for methylene chloride. For naphthalene, the average posterior probabilities for the top models over-predicted the tumor response, but the variability in predictions were significantly higher. The study provides both a set of gene expression biomarkers for predicting chemically-induced mouse lung tumors as well as a broad assessment of important experimental and analysis criteria for developing microarray-based predictors of safety-related endpoints.
Use of short-term transcriptional profiles to assess the long-term cancer-related safety of environmental and industrial chemicals.
Sex, Age, Specimen part, Disease, Subject
View SamplesThe MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Age, Specimen part, Race, Compound
View SamplesThe multiple myeloma (MM) data set (endpoints F, G, H, and I) was contributed by the Myeloma Institute for Research and Therapy at the University of Arkansas for Medical Sciences (UAMS, Little Rock, AR, USA). Gene expression profiling of highly purified bone marrow plasma cells was performed in newly diagnosed patients with MM. The training set consisted of 340 cases enrolled on total therapy 2 (TT2) and the validation set comprised 214 patients enrolled in total therapy 3 (TT3). Plasma cells were enriched by anti-CD138 immunomagnetic bead selection of mononuclear cell fractions of bone marrow aspirates in a central laboratory. All samples applied to the microarray contained more than 85% plasma cells as determined by 2-color flow cytometry (CD38+ and CD45-/dim) performed after selection. Dichotomized overall survival (OS) and eventfree survival (EFS) were determined based on a two-year milestone cutoff. A gene expression model of high-risk multiple myeloma was developed and validated by the data provider and later on validated in three additional independent data sets.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Age
View SamplesThe NIEHS data set (endpoint C) was provided by the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (Research Triangle Park, NC, USA). The study objective was to use microarray gene expression data acquired from the liver of rats exposed to hepatotoxicants to build classifiers for prediction of liver necrosis. The gene expression compendium data set was collected from 418 rats exposed to one of eight compounds (1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine, and diquat dibromide). All eight compounds were studied using standardized procedures, i.e. a common array platform (Affymetrix Rat 230 2.0 microarray), experimental procedures and data retrieving and analysis processes.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Specimen part, Compound
View SamplesThe human breast cancer (BR) data set (endpoints D and E) was contributed by the University of Texas M. D. Anderson Cancer Center (MDACC, Houston, TX, USA). Gene expression data from 230 stage I-III breast cancers were generated from fine needle aspiration specimens of newly diagnosed breast cancers before any therapy. The biopsy specimens were collected sequentially during a prospective pharmacogenomic marker discovery study between 2000 and 2008. These specimens represent 70-90% pure neoplastic cells with minimal stromal contamination. Patients received 6 months of preoperative (neoadjuvant) chemotherapy including paclitaxel, 5-fluorouracil, cyclophosphamide and doxorubicin followed by surgical resection of the cancer. Response to preoperative chemotherapy was categorized as a pathological complete response (pCR = no residual invasive cancer in the breast or lymph nodes) or residual invasive cancer (RD), and used as endpoint D for prediction. Endpoint E is the clinical estrogen-receptor status as established by immunohistochemistry. RNA extraction and gene expression profiling were performed in multiple batches over time using Affymetrix U133A microarrays. Genomic analysis of a subset of this sequentially accrued patient population were reported previously. For each endpoint, the first 130 cases were used as a training set and the next 100 cases were used as an independent validation set.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Age, Specimen part, Race
View SamplesThe Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set (GEO Series GSE6116). Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set (this Series). The training dataset had already been deposited in GEO by its provider and its accession number is GSE6116.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Specimen part, Compound
View SamplesIn the central nervous system (CNS), the microRNAs (miRNAs), small endogenous RNAs exerting a negative post-transcriptional regulation on mRNAs, are involved in major functions, such as neurogenesis, and synaptic plasticity. Moreover, they are essential to define the specific transcriptome of the tissues and cell types. However, few studies were performed to determine the miRNome of the different structures of the rat CNS, even through rat is a major model in neuroscience. We determined the miRNome profile of the hippocampus, the cortex, the striatum, the spinal cord and the olfactory bulb, by small RNA-Seq. We found a total of 365 known miRNAs' and 90 novel miRNAs expressed in the CNS of the rat. Novel miRNAs seemed to be important in defining structure-specific miRNomes. Differential analysis showed that several miRNAs were specifically enriched/depleted in these CNS structures. Then, we correlated miRNAs' expression with the expression of their mRNA targets by mRNA-Seq. This analysis suggests that the transcriptomic identity of each structure is regulated by specific miRNAs. Altogether, these results suggest the critical role played by these enriched/depleted miRNAs in the functional identities of CNS structures. Overall design: miRNA and mRNA profile of 5 structures of the central nervous system of rat, for each structurewe analyzed three biological replicates
Small RNA-Seq reveals novel miRNAs shaping the transcriptomic identity of rat brain structures.
Specimen part, Cell line, Subject
View SamplesTranscriptome profiling studies suggest that a large fraction of the genome is transcribed and many transcripts function independent of their protein coding potential. The relevance of noncoding RNAs (ncRNAs) in normal physiological processes and in tumorigenesis is increasingly recognized. Here, we describe consistent and significant differences in the distribution of sense and antisense transcripts between normal and neoplastic breast tissues. Many of the differentially expressed antisense transcripts likely represent long ncRNAs. A subset of genes that mainly generate antisense transcripts in normal but not cancer cells is involved in essential metabolic processes. These findings suggest fundamental differences in global RNA regulation between normal and cancer cells that might play a role in tumorigenesis. Overall design: Global strand-specific transcriptome profilings of 2 samples in cancer and 1 sample in normal from clinical breast tissue using asymmetrical strand-specific analysis of gene expression (ASSAGE).
Altered antisense-to-sense transcript ratios in breast cancer.
No sample metadata fields
View SamplesThe project used next generation sequencing for RNA-seq analysis, to identify transcriptome changes associated with tumorigenesis in two different caspase-2 knockout mice models. We describe key changes in both lymphoma and neuroblastoma associated genes in the two tumor types that may contribute to tumor outcome following loss of Casp2. We identified a panel of genes with altered expression in Th-MYCN/Casp2-/- tumors, that are strongly associated with neuroblastoma outcome, and which have roles in melanogenesis, Wnt and Hippo pathway signaling, that also contribute to neuronal differentiation. In addition, we found that key changes in gene expression in the EµMyc/Casp2-/- tumors, are associated with increased immune signaling and suggest that Casp2 deficiency augments immune signaling pathways that may be in turn, enhance lymphomagenesis. Overall, our study has identified new genes and pathways that contribute to the caspase-2 tumor suppressor function and highlight distinct roles for caspase-2 in different tissues. Overall design: We used tumors from EµMyc/Casp2-/- mice (which are more aggressive compared to their EµMyc counterarts) as well as tumors from Th-MycN/Casp2-/- mice (which show delayed tumour onset compared to Th-MycN mice) and compared the transcriptomes to their Casp2 wild type counterpart tumors. Sequencing was carried out with Illumina HiSeq 2000 and used short, single-end reads (1x 50bp flow cells) with 4 samples per lane. This yielded approximately 20-30 million raw reads per sample.
Transcriptome profiling of caspase-2 deficient EμMyc and Th-MYCN mouse tumors identifies distinct putative roles for caspase-2 in neuronal differentiation and immune signaling.
Specimen part, Subject
View SamplesIn this experiment, mucous neck cells from the gastric epithelium of normal, adult C57/B6 mice were laser-capture microdissected to determine gene expression in neck cells relative to pit cells, parietal cells, and zymogenic cells, whose expression profiles were previously deposited in GEO.
Evolution of the human gastrokine locus and confounding factors regarding the pseudogenicity of GKN3.
No sample metadata fields
View Samples