[BioC] DEXSeq continous variable in model

Alejandro Reyes alejandro.reyes at embl.de
Sun Aug 3 19:39:12 CEST 2014


Hi Alex,

Yes, I think that the testing using DEXSeq should also work with the 
continuous model!

I had some discussions within the lab and there was a misunderstanding 
from my side with regards to the fold changes with continuous variables. 
Wolfgang (cc-ed) clarified to me that if f(x) is the expression level 
for condition x, the log fold change is then the derivate of f(x).  As 
you mention this is how DESeq2 deals with it, but the current 
implementation for DEXSeq does not deal with this at the moment.  I will 
work on this and keep you posted, in the meantime the breakdown that you 
mention is probably the best!

Alejandro

> Hi Alejandro,
>
> Thanks for the reply! For your first question I will have to get back 
> to you as I need to rerun DEXSeq with the continuous design and make 
> that comparison - it'll take a few hours. I assume from your reply 
> that the continuous variable *should* work in fact?
>
> For your second question I guess what I was expecting was to see the 
> fold change implied from the the model from a unit change in Age (in 
> this case 1 week). This is for instance how DESeq2 reports it. But I 
> agree when it comes to actually plotting then a factorial breakdown is 
> probably simplest.
>
> Alex Gutteridge
>
> On 01-08-2014 10:29, Alejandro Reyes wrote:
>> Dear Alex Gutteridge,
>>
>> Thanks for your detailed e-mails and sorry for the delay in the reply.
>>
>> About the fact that you do get significant hits if you specify factors
>> instead of your continuous variable, Is it a thresholding effect or
>> the exons that you detect when specifying the factorial design have
>> large p-values with the continuous design?
>>
>> About the exon fold changes error, this function does not support
>> continuous variables.  I am unsure how would a fold change be with a
>> continuous variable? I would maybe partition the "Age" into more than
>> 2 factors based on e.g. quantiles or simply make plots like the one
>> you showed!
>>
>> Bests,
>> Alejandro Reyes
>>
>>
>>
>>> On 30-07-2014 10:13, Alex Gutteridge wrote:
>>>> On 29-07-2014 10:00, Alex Gutteridge wrote:
>>>>> Hi,
>>>>>
>>>>> I have a time course RNASeq experiment and I'd like to detect DEU
>>>>> between early and late stages. I am trying to use 'Age' as a
>>>>> continuous variable in my design, but I'm getting an error which I
>>>>> suspect is related to this e.g:
>>>>>
>>>>>> summary(sample.data.ipsc[,1:6])
>>>>>     Sample          CellType     Subtype Donor         Age
>>>>>  Length:28          iPSC:28   iPSC   :28   4555   :28 Min.   : 2.000
>>>>>  Class :character             HBR    : 0   D4721  : 0 1st Qu.: 4.000
>>>>>  Mode  :character             L2     : 0   D4749  : 0 Median : 8.000
>>>>>                               L3     : 0   D6002  : 0 Mean   : 7.821
>>>>>                               L4     : 0   D6005  : 0 3rd Qu.:10.500
>>>>>                               L5     : 0   D6008  : 0 Max.   :14.000
>>>>>                               (Other): 0   (Other): 0
>>>>>  Passage
>>>>>  42:3
>>>>>  48:7
>>>>>  49:5
>>>>>  52:6
>>>>>  56:7
>>>>>
>>>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>>>> sampleData=sample.data.ipsc,
>>>>>                                design= ~ sample + exon + Age:exon,
>>>>> flattenedfile=flattenedFile)
>>>>>
>>>>>> BPPARAM = MulticoreParam(workers=24)
>>>>>
>>>>>> dxd = estimateSizeFactors(dxd)
>>>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>>>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", 
>>>>>> BPPARAM=BPPARAM)
>>>>> Error: 8346 errors; first error:
>>>>>   Error in FUN(1:3[[1L]], ...): Non-factor in model frame
>>>>>
>>>>> For more information, use bplasterror(). To resume calculation, 
>>>>> re-call
>>>>>   the function and set the argument 'BPRESUME' to TRUE or wrap the
>>>>>   previous call in bpresume().
>>>>>
>>>>> First traceback:
>>>>>   31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM =
>>>>> BPPARAM)
>>>>>   30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>>>>   29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>>>>   28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed,
>>>>>           mc.silent = !BPPARAM$verbose, mc.cores = 
>>>>> bpworkers(BPPARAM),
>>>>>           mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal
>>>>> else FALSE)
>>>>>   27: lapply(seq_len(cores), inner.do)
>>>>>   26: lapply(seq_len(cores), inner.do)
>>>>>   25: FUN(1:24[[1L]], ...)
>>>>>   24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
>>>>>   23: parallel:::sendMaster(...)
>>>>>   22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
>>>>>   21: tryCatch(expr, error = function(e)
>>>>>
>>>>> Can DEXSeq accept a model with a continuous variable? Does it make
>>>>> sense to do so? (I do the same thing with DESeq2 to detected DE 
>>>>> and it
>>>>> works fine). Is this error due to that? Note that all the other steps
>>>>> seem to run fine and I can get results (though I don't have many
>>>>> significant hits - not sure if this is related or not). If not 
>>>>> what is
>>>>> best practice? Just split the data set into 'early' and 'late' 
>>>>> samples
>>>>> and run that as a factor?
>>>>>
>>>>> Alex Gutteridge
>>>>
>>>> To sort of answer my own question: cutting the age vector into two
>>>> bins (early <8 weeks and late >8 weeks) and using that factor seems to
>>>> work and give many significant hits:
>>>>
>>>>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2)
>>>>
>>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>>>                                sampleData=sample.data.ipsc,
>>>>                                design= ~ sample + exon + AgeBin:exon,
>>>>                                flattenedfile=flattenedFile)
>>>>
>>>>> BPPARAM = MulticoreParam(workers=12)
>>>>
>>>>> dxd = estimateSizeFactors(dxd)
>>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>>>
>>>> Which makes me think that DEXSeq wasn't doing what I hoped with the
>>>> previous analysis (quite apart from the error). Is there any way to
>>>> rejig the design to cope with a continuous variable as opposed to a
>>>> factor? I *think* the underlying biological question makes sense: Are
>>>> there exons showing DEU that correlates with time?
>>>>
>>>> Alex Gutteridge
>>>
>>> I'm just going to bump this question one last time as I'd love to know
>>> if there's a better solution to what I have now. In short: I have a
>>> reasonable detailed time course RNASeq experiment (8 timepoints; 2-14
>>> weeks; 3-4 replicates at all time points except one) and would like to
>>> detect time dependent DEU using DEXSeq. Slicing the timecourse into two
>>> bins (at 8 weeks) and doing a straight early vs late comparison works
>>> and detects plenty of interesting stuff. Trying to model age as a
>>> continuous variable gives an error when determining fold changes and
>>> doesn't report any significant DEU.
>>>
>>> See first plot here for an example (exons 27/28 show strong DEU
>>> comparing <8 weeks to >8 week samples) shown with a DEXSeq plot:
>>>
>>> http://dx.doi.org/10.6084/m9.figshare.1124172
>>>
>>> But the 8 week split is entirely arbitrary, really time (Age) is a
>>> continuous variable and so I'd like to model it as such (maybe this is
>>> just OCD on my behalf, but I'm sure I must also be loosing some 
>>> power by
>>> crudely splitting the timecourse like this; also DESeq2 lets me model
>>> the genewise DE with a continuous variable).
>>>
>>> The second plot from the figshare link shows the RPKMs of those two
>>> exons (27/28). On top of a background of increased gene-wise expression
>>> there is clearly a time dependent effect on the relative usage of these
>>> two exons, which it must be possible to model better than a crude split
>>> at 8w (black dashed line). Is there any trick to do this with DEXSeq?
>>>
>>> Alex Gutteridge
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list