The process for evaluating chemical safety is inefficient, costly, and animal intensive. There is growing consensus that the current process of safety testing needs to be significantly altered to improve efficiency and reduce the number of untested chemicals. In this study, the use of short-term gene expression profiles was evaluated for predicting the increased incidence of mouse lung tumors. Animals were exposed to a total of 26 diverse chemicals with matched vehicle controls over a period of three years. Upon completion, significant batch-related effects were observed. Adjustment for batch effects significantly improved the ability to predict increased lung tumor incidence. For the best statistical model, the estimated predictive accuracy under honest five-fold cross-validation was 79.3% with a sensitivity and specificity of 71.4 and 86.3%, respectively. A learning curve analysis demonstrated that gains in model performance reached a plateau at 25 chemicals, indicating that the size of the current data set was sufficient to provide a robust classifier. The classification results showed a small subset of chemicals contributed disproportionately to the misclassification rate. For these chemicals, the misclassification was more closely associated with genotoxicity status than efficacy in the original bioassay. Statistical models were also used to predict dose-response increases in tumor incidence for methylene chloride and naphthalene. The average posterior probabilities for the top models matched the results from the bioassay for methylene chloride. For naphthalene, the average posterior probabilities for the top models over-predicted the tumor response, but the variability in predictions were significantly higher. The study provides both a set of gene expression biomarkers for predicting chemically-induced mouse lung tumors as well as a broad assessment of important experimental and analysis criteria for developing microarray-based predictors of safety-related endpoints.
Use of short-term transcriptional profiles to assess the long-term cancer-related safety of environmental and industrial chemicals.
Sex, Age, Specimen part, Disease, Subject
View SamplesPromyelocytic Leukemia Protein (PML) was first identified as a fusion product with the retinoic acid receptor alpha in Acute Promyelocytic Leukemia (APL). Although PML has previously been studied in cancer progression and various physiological processes, little is known about its functions in Embryonic Stem Cells (ESC). Here, we report that PML contributes to the maintenance of the ESC self-renewal by controlling the cell-cycle and sustaining the expression levels of crucial pluripotency factors. Transcriptomic analysis showed that the ablation of PML renders ESC prone to exit from the nave and acquire a primed-like pluripotent cell state. During differentiation PML influences cell fate decision by regulation of Tbx3. PML loss compromises the reprogramming ability of embryonic fibroblasts to induced Pluripotent Stem Cells (iPSC) by inhibiting the TGF pathway at the very early stages. Collectively, these results designate PML as a member of the regulatory network for ESC pluripotency and somatic cell reprogramming.
Promyelocytic Leukemia Protein Is an Essential Regulator of Stem Cell Pluripotency and Somatic Cell Reprogramming.
No sample metadata fields
View SamplesThe MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Age, Specimen part, Race, Compound
View SamplesThe multiple myeloma (MM) data set (endpoints F, G, H, and I) was contributed by the Myeloma Institute for Research and Therapy at the University of Arkansas for Medical Sciences (UAMS, Little Rock, AR, USA). Gene expression profiling of highly purified bone marrow plasma cells was performed in newly diagnosed patients with MM. The training set consisted of 340 cases enrolled on total therapy 2 (TT2) and the validation set comprised 214 patients enrolled in total therapy 3 (TT3). Plasma cells were enriched by anti-CD138 immunomagnetic bead selection of mononuclear cell fractions of bone marrow aspirates in a central laboratory. All samples applied to the microarray contained more than 85% plasma cells as determined by 2-color flow cytometry (CD38+ and CD45-/dim) performed after selection. Dichotomized overall survival (OS) and eventfree survival (EFS) were determined based on a two-year milestone cutoff. A gene expression model of high-risk multiple myeloma was developed and validated by the data provider and later on validated in three additional independent data sets.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Age
View SamplesThe NIEHS data set (endpoint C) was provided by the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (Research Triangle Park, NC, USA). The study objective was to use microarray gene expression data acquired from the liver of rats exposed to hepatotoxicants to build classifiers for prediction of liver necrosis. The gene expression compendium data set was collected from 418 rats exposed to one of eight compounds (1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine, and diquat dibromide). All eight compounds were studied using standardized procedures, i.e. a common array platform (Affymetrix Rat 230 2.0 microarray), experimental procedures and data retrieving and analysis processes.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Sex, Specimen part, Compound
View SamplesThe human breast cancer (BR) data set (endpoints D and E) was contributed by the University of Texas M. D. Anderson Cancer Center (MDACC, Houston, TX, USA). Gene expression data from 230 stage I-III breast cancers were generated from fine needle aspiration specimens of newly diagnosed breast cancers before any therapy. The biopsy specimens were collected sequentially during a prospective pharmacogenomic marker discovery study between 2000 and 2008. These specimens represent 70-90% pure neoplastic cells with minimal stromal contamination. Patients received 6 months of preoperative (neoadjuvant) chemotherapy including paclitaxel, 5-fluorouracil, cyclophosphamide and doxorubicin followed by surgical resection of the cancer. Response to preoperative chemotherapy was categorized as a pathological complete response (pCR = no residual invasive cancer in the breast or lymph nodes) or residual invasive cancer (RD), and used as endpoint D for prediction. Endpoint E is the clinical estrogen-receptor status as established by immunohistochemistry. RNA extraction and gene expression profiling were performed in multiple batches over time using Affymetrix U133A microarrays. Genomic analysis of a subset of this sequentially accrued patient population were reported previously. For each endpoint, the first 130 cases were used as a training set and the next 100 cases were used as an independent validation set.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Age, Specimen part, Race
View SamplesThe Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set (GEO Series GSE6116). Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set (this Series). The training dataset had already been deposited in GEO by its provider and its accession number is GSE6116.
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Specimen part, Compound
View SamplesBackground: Local recurrence is the major manifestation of treatment failure in patients with operable laryngeal carcinoma. Established clinicopathological factors cannot sufficiently predict patients that are likely to recur after treatment. Additional tools are therefore required to accurately identify patients at high risk for recurrence. Methods: Using Affymetrix U133A Genechips, we profiled fresh-frozen tumor tissues from 59 patients with operable laryngeal cancer. All patients were treated locally with surgery, with or without radiation therapy. We performed Cox regression proportional hazards modeling to identify multigene predictors of recurrence. The end-point of our analysis was disease-free survival (DFS). Gene models were directly validated in a separate, similarly treated cohort of 50 patients using Affymetrix chips. In an attempt to further validate our results, we profiled 12 selected genes of our model in formalin-fixed tumor tissues from an independent cohort of 75 patients, using quantitative real time-polymerase chain reaction (qRT-PCR). Results: We focused on genes univariately associated with DFS (p<0.05) in the training set. Among several gene models comprising different numbers of genes, a 30-gene model demonstrated optimal performance (log-rank, p<0.001). We directly applied these gene models to the validation set, after adjusting for non-biological experimental variability, and observed similar results. Specifically, median DFS, as predicted by the 30-gene model, was 34 and 80 months for high- and low-risk patients, respectively (p=0.01). Hazard Ratio (HR) for recurrence for the high-risk group was 3.87 (95% CI 1.28-11.73, p=0.017). Furthermore, unsupervised hierarchical clustering of the 75 patients, based on the qRT-PCR 12-gene profile, yielded two groups, which differed significantly in DFS (log-rank, p=0.027). HR= for recurrence was 2.26, (95% CI 1.08-4.76, p=0.031). Conclusion: We have established and validated gene models that can successfully stratify patients with laryngeal cancer, based on their risk for recurrence. Thus, patients with unfavorable prognosis, when accurately identified, could be ideal candidates for the application of more aggressive treatment modalities.
Identification and validation of a multigene predictor of recurrence in primary laryngeal cancer.
Age, Specimen part, Disease stage
View SamplesGroucho related gene 5 (GRG5) is a multifunctional protein that has been implicated in late embryonic and postnatal mouse development. Here, we describe a previously unknown role of GRG5 in early developmental stages by analyzing its function in stem cell fate decisions. By both loss and gain of function approaches we demonstrate that ablation of GRG5 deregulates the Embryonic Stem Cell (ESC) pluripotent state whereas its overexpression leads to enhanced self-renewal and acquisition of cancer cell-like properties. A pro-oncogenic potential for GRG5 is revealed by the malignant behavior of teratomas generated from ESCs that overexpress it. Furthermore, transcriptomic analysis and cell differentiation approaches underline GRG5 as a multifaceted signaling regulator that represses mesendodermal-related genes. When ES cells exit pluripotency, GRG5 promotes neuroectodermal specification via Wnt and BMP signaling pathways suppression. Moreover, GRG5 promotes the neuronal reprogramming of fibroblasts and maintains the self-renewal of Neural Stem Cell (NSC) by sustaining the activity of Notch and Jak/Stat3 pathways. In summary, our results demonstrate that GRG5 has pleiotropic roles in stem cell biology functioning as a stemness factor and a neural fate specifier. Overall design: Gene expression profiling of control and Grg5 knockdown (KD) embryonic stem cells with RNA-seq, in dublicate, using Ion Torrent Proton.
Groucho related gene 5 (GRG5) is involved in embryonic and neural stem cell state decisions.
Cell line, Subject
View SamplesRenal excretion of water and major electrolytes exhibits a significant circadian rhythm. This functional periodicity is believed to result, at least in part, from circadian changes in secretion/reabsorption capacities of the distal nephron and collecting ducts. Here, we studied the molecular mechanisms underlying circadian rhythms in the distal nephron segments, i.e. distal convoluted tubule (DCT) and connecting tubule (CNT) and, the cortical collecting duct (CCD). Temporal expression analysis performed on microdissected mouse DCT/CNT or CCD revealed a marked circadian rhythmicity in the expression of a large number of genes crucially involved in various homeostatic functions of the kidney. This analysis also revealed that both DCT/CNT and CCD possess an intrinsic circadian timing system characterized by robust oscillations in the expression of circadian core clock genes (clock, bma11, npas2, per, cry, nr1d1) and clock-controlled Par bZip transcriptional factors dbp, hlf and tef. The clock knockout mice or mice devoid of dbp/hlf/tef (triple knockout) exhibit significant changes in renal expression of several key regulators of water or sodium balance (vasopressin V2 receptor, aquaporin-2, aquaporin-4, alphaENaC). Functionally, the loss of clock leads to a complex phenotype characterized by partial diabetes insipidus, dysregulation of sodium excretion rhythms and a significant decrease in blood pressure. Collectively, this study uncovers a major role of molecular clock in renal function.
Molecular clock is involved in predictive circadian adjustment of renal function.
Sex, Specimen part
View Samples