[BioC] LIMMA/VOOM: Using and Interpretating of natural splines coefficients for time series

Fri Mar 14 08:20:34 CET 2014

Ryan,
Thanks for your answer. By DE at 48h, I meant DE between the treatment group at 48h and the control group at 48h (the control group also receives stimulation, making it an active process). Following your notations, that would be 
3/4*(mouseTreatedTime50-mouseControlTime50)+1/4*(mouseTreatedTime42-mouseControlTime42)

(i think you inverted the weights in your interpolation)

Instead of a linear interpolation between two successive time points, would it make sense to perform smoothing using time point weights derived from e.g. a Gaussian kernel? 

 I've kept the human and mouse samples separated at this stage, as it's a bit harder to make micro array probes match on two different platforms and organisms. It's true that gene wise measurements with RNA-Seq would have made a multi species count matrix doable (as suggested from your reference to "count" :))  

Thanks,
Sam. 

> Le 13 Mar 2014 à 20:57, "Ryan C. Thompson" <rct at thompsonclan.org> a écrit :
> 
> I'm not very familiar with using natural splines for differential expression, but I think you need to be a little more precise about what you want. What do you mean by "genes DE at the 48h time point"? DE relative to what? If you mean genes DE betweeen 0h and 48h, then maybe you want to select whatever linear combination of basis splines yields a minimum at 0h and a maximum at 48h and test that. This is just a guess, though.
> 
> As an alternative strategy, you could linearly interpolate the 48h mouse timepoint as the weighted mean of the neighboring time points, i.e. "3/4 * MouseTime42 + 1/4 * MouseTime50", since 48 is 3/4 of the way from 42 to 50. So then the interaction contrast would be something like:
> 
> "((3/4 * MouseTime42 + 1/4 * MouseTime50) - MouseTime0) - (HumanTime48 - HumanTime0)"
> 
> By the way, this all assumes that you have a count matrix that includes only orthologous gene pairs between mouse and human.
> 
>> On Thu 13 Mar 2014 10:40:25 AM PDT, Hayssam [guest] wrote:
>> 
>> Hello,
>> Short summary version: How to interpolate a micro-array time series to get differentially expressed genes at a time point that was not measured?
>> 
>> Long version:
>> I'm analyzing a two species (human and mouse) and two groups (control and treatment) time series (9 time points) micro-array experiments using limma.
>> For each species, I can contrast and test for differential expression (hereafter DE) between the two groups at specific time points without problems. The parametrization I chose is treatment:time and I then built my contrasts manually for the time points of interest.
>> 
>> I would now like to compare the differentially expressed genes at a given time point between the two species. The strategy would be to call for DE for each species at the time point of interest and then use homology information to determine whether a pair of homologous genes is DE.
>> 
>> The problem I face is that some of the experimental time points for the two species do not match (that's a retrospective study unfortunately). As an example I have a 48h sample for human, and 42h and 50h samples for mouse, and I would like to identify genes DE at the 48h time point in mouse.
>> 
>> I'm trying to handle this by using natural splines (with a spline_basis:treatment parametrization). Would that be the way to go?
>> Once doing so, I can contrast by considering all the interaction terms to determine the general differences between the two groups (thanks for the limma user's guide section on that!).
>> But how can I test for a specific time point with the spline parametrization?
>> 
>> Finally, how can I interpret the results of the topTable output under the spline parametrization? I do see an estimate of logFC (when I restrict to a single interaction coefficient) and of AveExpr, however they do not convey any meaning with respect to the logged probe intensities. As an example, the logFC is negative, while the treated group is undoubtly above the control group. By looking at the coefficients of the fit, it seems to me that the logFC returned by topTable is simply the coefficient of the interaction term, which doesn't match with my expectation of a logFC.
>> 
>> Thanks for your help,
>> Sam.
>> 
>>  -- output of sessionInfo():
>> 
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>> 
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> 
>> attached base packages:
>>  [1] splines   grid      parallel  stats     graphics  grDevices utils     datasets  methods   base
>> 
>> other attached packages:
>>  [1] BioNet_1.23.2             RBGL_1.38.0               graph_1.40.1              plyr_1.8.1                gridExtra_0.9.1           illuminaHumanv3.db_1.20.0 org.Hs.eg.db_2.10.1
>>  [8] RSQLite_0.11.4            DBI_0.2-7                 AnnotationDbi_1.24.0      statmod_1.4.18            limma_3.18.13             GEOquery_2.28.0           Biobase_2.22.0
>> [15] BiocGenerics_0.8.0        reshape2_1.2.2            ggplot2_0.9.3.1           data.table_1.9.2
>> 
>> loaded via a namespace (and not attached):
>>  [1] AnnotationForge_1.4.4 colorspace_1.2-4      dichromat_2.0-0       digest_0.6.4          gtable_0.1.2          igraph_0.7.0          IRanges_1.20.7        labeling_0.2
>>  [9] MASS_7.3-29           munsell_0.4.2         proto_0.3-10          RColorBrewer_1.0-5    Rcpp_0.11.0           RCurl_1.95-4.1        scales_0.2.3          stats4_3.0.1
>> [17] stringr_0.6.2         tools_3.0.1           XML_3.95-0.2
>> 
>> --
>> Sent via the guest posting facility at bioconductor.org.
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor