Monday, July 9, 2007

DNA variation and brain region-specific expression profiles exhibit different relationships between inbred mouse strains: implications for eQTL

Hovatta et al.

The Salk Institute for Biological Studies, La Jolla, CA
Genome Biology
February 26, 2007 [PubMed]

This paper was published in an open access journal and can be read in full without a subscription.

The Hovatta el al. paper appears to be another bioinformatic exercise investigating expression quantitative trait loci mapping (eQTL). It utilizes data that they have previously published and no new biological data was added in support of this work. The two main conclusions of the paper, that strain-specific expression differences are enriched for cis eQTLs and that the different regions of the brain show distinct clustering of expression patterns, are, respectively, unsupported and unoriginal. The main strength of this paper is that it is one of the first to show the viability of applying eQTL analyses to data collected from inbred strains of mice although it has been done more thoroughly before.

Using data from 5 regions of the brain, bed nucleus of the stria terminalis, hippocampus, hypothalamus, periaqueductal gray, and pituitary gland, from 6 inbred strains, 129S6/SvEvTac, A/J, C3H/HeJ, C57BL/6J, DBA/2J, and FVB/NJ, they used a simple ANOVA analysis to identify genes whose expression significantly varied by either strain or region of the brain. 5% of the genes showed strain-specific expression variation and 44% showed brain region-specific expression. Dendrograms showed correlation between genetic history of the strains and the averaged expression patterns of the strains.

The worst part of the paper was the contention that cis eQTLs are enriched for in inbred strains. This statement may be true but they provide no justification for it here. The origin of this hypothesis is unclear. It seems that is based on the idea that only a small fraction of genes were found to be strain-specific and cis eQTLs only effect single genes. This last notion presupposes that the cis eQTL gene does not interact with any other transcription in the genome which is wrong. In fact, many of the most prominent trans eQTLs that have been found to also contain a cis eQTL that likely initiated the cascade of transcriptional disregulation that resulted in the trans band. To support this claim, they compared the percentage of cis eQTLs found in the genes identified as having significant strain-specific differences with those that had significant brain region differences. This is an apples to oranges comparison without any real meaning. First, to generate eQTLs, you are comparing strain-specific differences in expression to strain-specific sequence differences. The strain-specific gene collection is primed with all of the transcripts that could potentially produce eQTLs. The region-specific gene collection is just a random collection as far as eQTL analysis is concerned, most of which have no expression differences across the strains and therefore have no ability to produce an eQTL. So, when they express cis eQTL values for each of the two collections as percentages of the number of genes within that collection, it is misleading. Holding up the fact that 48% of the strain-specific genes produce cis eQTL associations vs. 10% of the region-specific set as proof for their hypothesis is disingenuous. Especially if only 20% or less of the region-specific set had the capacity to produce an eQTL.

In conclusion, the only value that this paper provides is proof the eQTL analysis can be done with data from inbred strains. It would have been great if they shared the cis and trans eQTLs that they mapped but they failed to do so.

Other notes from the paper

They define a cis eQTL as an association between data from an Affymetrix probeset and a SNP that mapped within 4 MB of each other.

SNPs located within probesets can create data that looks like a cis eQTL but in reality is differential hybridization with the probe. Instead of doing a sequence search to identify problem probesets, the authors used a custom built algorithm to guess at the presence of probeset SNPs.

Correctly note that cis eQTLs are most trust worthy than trans eQTLs because the likelihood that both the associated SNP and the probeset could both fall into the same limited space of the genome is small.

They site a paper that places the inheritance of expression variation at a low 0.34.

Most of the data analysis was done using software from Teragenomics.

Tuesday, June 12, 2007

An integrative genomics strategy for systematic characterization of genetic loci modulating phenotypes

Bao et al.
University of Tennessee Health Science Center, Memphis, TN
Human Molecular Genetics
April 11, 2007 [Pubmed]

This paper is largely a bioinformatic exercise. All of the biological data had been previously generated by others and conclusions from the work were not confirmed with any actual biology but the work is still interesting from a methodological standpoint. They use expression QTL mapping to extend the biologically significant finding of knockout mice and other mapped gene x phenotype interactions. This work was done using the recombinant inbred line BxD (B6xDBA) but the approach could easily be extended to inbred strains.

Bao et al. first used the Mammalian Phenotype Ontology to identify genes that had previously been shown to regulate neurological phenotypes. There were 630 genes from this list that were also present on the Affymetrix M430 microarray. Brain-wide expression variation across 42 BxD lines for these 630 genes were inspected for correlation with genetic variation genome-wide. Expression variation for 40 of these genes mapped back to 53 trans regulatory regions (some genes had more than one trans region. The association was most likely done (although the paper does not explicitly state this anywhere) with the WebQTL mapping tool. To identify candidate trans regulators, two methods were used.

One was searching the candidate region for genes known to have missense mutations between B6 and DBA strains. This points to Adcy2 as a regulator of Ntsr2 and there is literature that links them to the same biology, in this case nociception. Transcription factors within the trans peak with missense SNPs were of particular interest because they have direct method of modifying expression. Three of the initial 630 genes had a trans association with the region containing the Stat4 transcription factor and all three of those genes had Stat4 binging sites in their promoters likely confirming the association. The second method was to look for cis eQTLs within the trans region. The logic is that this will detect functional SNPs in the regulatory region of the gene and if the variation in expression of the cis gene matches the expression variation of the downstream gene that produced the original trans association, two genes are likely linked in the same pathway. This approach linked the expression of Myo7 to Myo6 and Ttc8 to Bbs4. There is literature that supports these associations.

As almost an afterthought, attempted to correlate gene expression variation with phenotypic variation using 67 behavioral or neurological phenotype that had previously been defined for BxD. One correlation, Ank2 with cocaine-induced activity in the open field, was corroborated by previously published data. This is type of association is not novel and has been done previously to a much better end.

Other useful items from the paper

Genetical Genomics - a term coined by Robert Williams to refer to the methods for the study of the genetics of gene expression on a genome wide scale.

Estimated that 25% of transcripts are highly heritable which they define as heritability > 0.5

Provides references for the public effort to generate a knock out mouse for every gene but I am not sure that they refer directly to the effort currently underway.

Recognize the difficulty of traditional QTL methods noting the phenotypes they used have not resulted in the identification of QTGs.

Paraphrasing Schadt et al give a nice break down on the reason on might see an association between gene expression and a phenotype. To quote from Bao et al,
(i) causal model, where the common QTL acts on the gene expression trait and the gene regulates the phenotype trait; (ii) reactive model, where the common QTL acts on the phenotype trait and the gene expression trait is reactive to the phenotype and (iii) independent model, where the common QTL acts on the expression trait and phenotype trait independently.
Whole proteome datasets are not yet available.