# [BioC] detecting oscillating genes from RNA-seq data

Gordon K Smyth smyth at wehi.EDU.AU
Fri Apr 4 01:19:23 CEST 2014

```On Thu, 3 Apr 2014, mali salmon wrote:

> Dear Prof. Smith
> Thank you so much for the explanation of how to do the analysis.
> If you don't mind I'll describe the design of our experiment just to be
> sure that I'm not doing any mistake.
> We work on a sea animal, and we are interested with day/night oscillating
> genes (24 hours cycle) and with tidal genes (12.4 h cycle).
> In the first week we have sacrified animals every 4 hours across 48 h, and
> then repeated the same procedure the week after.
> So in total we have 24 samples, 4 "replicates" of 6 time points
> (2am,6am,10am,2pm,6pm,10pm)
> From your mail I understand that time is a vector of time points in hours,
> but I'm not sure how to define it, it can be:
> 1) time <- rep(seq(2,46,4),2)
> 2) time <- seq(2,94,4)
> 3) time <- rep(seq(2,22,4),4)

Makes no difference.  All three choices will give exactly the same
results.

> I saw this matter for the cos vector.
> I suppose that for detecting tidal genes I just have to repeat the analysis
> with 12.4 cycle.

You need to carefully identify the tidal stage applicable to each of the
24 samples.  Clearly the tidal phase will have moved in the second week.
For example you might code 'tide' to be between 0 and 1, with 0/1 for high
tide and 0.5 for low tide.  Then:

tides <- sin(2*pi*tide)
tidec <- cos(2*pi*tide)

Then you need to fit both day/night and tidal cycles to your data at the
same time:

~s1+c1+tides+tidec

It may be difficult to distinguish a tidal cycle from a 12-hour cycle.
You might try to check check this by:

~s1+c1+s2+c2+tides+tidec

Gordon

> Thanks again
> Mali
>
>
> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Mali,
>>
>> It is easy to do such an analysis directly in edgeR.
>>
>> Suppose for example that you are looking for a daily cycle, and that you
>> have at least half a dozen time points spanning more than a full day.  The
>> method is to create basis vectors for a periodic expression pattern.
>> Suppose that 'time' is time in hours:
>>
>>   s1 <- sin(2*pi*time/24)
>>   c1 <- cos(2*pi*time/24)
>>   design <- model.matrix(~s1+c1)
>>
>> This will give a design matrix with three columns: one for the intercept
>> and two for the periodic signal.  To test for a periodic cycle:
>>
>>   fit <- glmFit(y,design)
>>   lrt <- glmLRT(fit,coef=2:3)
>>
>> If you have lots of time points and you want to make a more complex curve
>> you can add a harmonic:
>>
>>   s2 <- sin(4*pi*time/24)
>>   c2 <- cos(4*pi*time/24)
>>   design <- model.matrix(~s1+c1+s2+c2)
>>
>> There are now four coefficients parametrizing the periodic curve.  (These
>> can be viewed as representing amplitudes and phase shifts for the two
>> harmonics.)  I think you would seldom want more than two harmonics for
>> RNA-seq data.
>>
>> This method can easily be adapted to find cell-cycle genes and so on.
>>
>> Best wishes
>> Gordon
>>
>>
>>  Date: Tue, 1 Apr 2014 14:33:27 +0200
>>> From: mali salmon <shalmom1 at gmail.com>
>>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>> Subject: [BioC] detecting oscillating genes from RNA-seq data
>>>
>>> Hello List
>>> I have RNA-seq data from different time points and I would like to find
>>> oscillating genes.
>>> I thought of using the "cycle" package (which is based on Fourier score) ,
>>> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized
>>> values.
>>> Any suggestion what would be more appropriate?
>>> Thanks
>>> Mali
>>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the