[BioC] DEXSeq continous variable in model

Alex Gutteridge alexg at ruggedtextile.com
Fri Aug 1 09:47:14 CEST 2014


On 30-07-2014 10:13, Alex Gutteridge wrote:
> On 29-07-2014 10:00, Alex Gutteridge wrote:
>> Hi,
>> 
>> I have a time course RNASeq experiment and I'd like to detect DEU
>> between early and late stages. I am trying to use 'Age' as a
>> continuous variable in my design, but I'm getting an error which I
>> suspect is related to this e.g:
>> 
>>> summary(sample.data.ipsc[,1:6])
>>     Sample          CellType     Subtype       Donor         Age
>>  Length:28          iPSC:28   iPSC   :28   4555   :28   Min.   : 2.000
>>  Class :character             HBR    : 0   D4721  : 0   1st Qu.: 4.000
>>  Mode  :character             L2     : 0   D4749  : 0   Median : 8.000
>>                               L3     : 0   D6002  : 0   Mean   : 7.821
>>                               L4     : 0   D6005  : 0   3rd Qu.:10.500
>>                               L5     : 0   D6008  : 0   Max.   :14.000
>>                               (Other): 0   (Other): 0
>>  Passage
>>  42:3
>>  48:7
>>  49:5
>>  52:6
>>  56:7
>> 
>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>                                sampleData=sample.data.ipsc,
>>                                design= ~ sample + exon + Age:exon,
>>                                flattenedfile=flattenedFile)
>> 
>>> BPPARAM = MulticoreParam(workers=24)
>> 
>>> dxd = estimateSizeFactors(dxd)
>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", 
>>> BPPARAM=BPPARAM)
>> Error: 8346 errors; first error:
>>   Error in FUN(1:3[[1L]], ...): Non-factor in model frame
>> 
>> For more information, use bplasterror(). To resume calculation, 
>> re-call
>>   the function and set the argument 'BPRESUME' to TRUE or wrap the
>>   previous call in bpresume().
>> 
>> First traceback:
>>   31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM = 
>> BPPARAM)
>>   30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>   29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>   28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed,
>>           mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM),
>>           mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal else 
>> FALSE)
>>   27: lapply(seq_len(cores), inner.do)
>>   26: lapply(seq_len(cores), inner.do)
>>   25: FUN(1:24[[1L]], ...)
>>   24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
>>   23: parallel:::sendMaster(...)
>>   22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
>>   21: tryCatch(expr, error = function(e)
>> 
>> Can DEXSeq accept a model with a continuous variable? Does it make
>> sense to do so? (I do the same thing with DESeq2 to detected DE and it
>> works fine). Is this error due to that? Note that all the other steps
>> seem to run fine and I can get results (though I don't have many
>> significant hits - not sure if this is related or not). If not what is
>> best practice? Just split the data set into 'early' and 'late' samples
>> and run that as a factor?
>> 
>> Alex Gutteridge
> 
> To sort of answer my own question: cutting the age vector into two
> bins (early <8 weeks and late >8 weeks) and using that factor seems to
> work and give many significant hits:
> 
>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2)
> 
>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>                                sampleData=sample.data.ipsc,
>                                design= ~ sample + exon + AgeBin:exon,
>                                flattenedfile=flattenedFile)
> 
>> BPPARAM = MulticoreParam(workers=12)
> 
>> dxd = estimateSizeFactors(dxd)
>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
> 
> Which makes me think that DEXSeq wasn't doing what I hoped with the
> previous analysis (quite apart from the error). Is there any way to
> rejig the design to cope with a continuous variable as opposed to a
> factor? I *think* the underlying biological question makes sense: Are
> there exons showing DEU that correlates with time?
> 
> Alex Gutteridge

I'm just going to bump this question one last time as I'd love to know 
if there's a better solution to what I have now. In short: I have a 
reasonable detailed time course RNASeq experiment (8 timepoints; 2-14 
weeks; 3-4 replicates at all time points except one) and would like to 
detect time dependent DEU using DEXSeq. Slicing the timecourse into two 
bins (at 8 weeks) and doing a straight early vs late comparison works 
and detects plenty of interesting stuff. Trying to model age as a 
continuous variable gives an error when determining fold changes and 
doesn't report any significant DEU.

See first plot here for an example (exons 27/28 show strong DEU 
comparing <8 weeks to >8 week samples) shown with a DEXSeq plot:

http://dx.doi.org/10.6084/m9.figshare.1124172

But the 8 week split is entirely arbitrary, really time (Age) is a 
continuous variable and so I'd like to model it as such (maybe this is 
just OCD on my behalf, but I'm sure I must also be loosing some power by 
crudely splitting the timecourse like this; also DESeq2 lets me model 
the genewise DE with a continuous variable).

The second plot from the figshare link shows the RPKMs of those two 
exons (27/28). On top of a background of increased gene-wise expression 
there is clearly a time dependent effect on the relative usage of these 
two exons, which it must be possible to model better than a crude split 
at 8w (black dashed line). Is there any trick to do this with DEXSeq?

Alex Gutteridge



More information about the Bioconductor mailing list