Description
In Huntingtons disease (HD), expanded HTT CAG repeat length correlates strongly with age at motor onset, indicating that it determines the rate of the disease process leading to diagnostic clinical manifestations. Similarly, in normal individuals, HTT CAG repeat length is correlated with biochemical differences that reveal it as a functional polymorphism. Here, we tested the hypothesis that gene expression signatures can capture continuous, length-dependent effects of the HTT CAG repeat. Using gene expression datasets for 107 HD and control lymphoblastoid cell lines, we constructed mathematical models in an iterative manner, based upon CAG correlated gene expression patterns in randomly chosen training samples, and tested their predictive power in test samples. Predicted CAG repeat lengths were significantly correlated with experimentally determined CAG repeat lengths, whereas models based upon randomly permuted CAGs were not at all predictive. Predictions from different batches of mRNA for the same cell lines were significantly correlated, implying that CAG length-correlated gene expression is reproducible. Notably, HTT expression was not itself correlated with HTT CAG repeat length. Taken together, these findings confirm the concept of a gene expression signature representing the continuous effect of HTT CAG length and not primarily dependent on the level of huntingtin expression. Such global and unbiased approaches, applied to additional cell types and tissues, may facilitate the discovery of therapies for HD by providing a comprehensive view of molecular changes triggered by HTT CAG repeat length for use in screening for and testing compounds that reverse effects of the HTT CAG expansion.