[BioC] statistical test for time course data

Mon Feb 18 19:52:16 CET 2013

Q2) Can anyone explain for methe meaning of (~0+f) in
> design <- model.matrix(~0+f)

"0~f" means that the intercept for your model fit is forced to go
through the origin (i.e. 0). Normally when fitting a linear model, the
algorithm (be it the normal equation, gradient descent or whatever)
will calculate the line of best fit by adjusting both the slope and
the intercept of the model. When you force the model through the
origin, you estimate the model fit by just adjusting the slope. This
is a little difficult to get your head around if you don't have a very
solid understanding of statistical modelling. What would be really
cool is if the Limma documentation listed some resources that would
allow somebody who is coming at Limma from a biology or computer
science background to wrap their head around the fundamentals of
statistical modelling and really understand what they are doing with
the package. Just a suggestion, you guys have of course already done
an amazing job with limma and the documentation!!

P.S. One very useful resource is Andrew Ng's course on "machine
learning" on "coursera", it explains linear models very well indeed.

Paul.

On Mon, Feb 18, 2013 at 11:53 AM, Richard Friedman
<friedman at cancercenter.columbia.edu> wrote:
>
> On Feb 13, 2013, at 7:24 AM, chris Jhon wrote:
>
>> ---------- Forwarded message ----------
>> From: chris Jhon <cjhon217 at gmail.com>
>> Date: Wed, Feb 13, 2013 at 9:23 PM
>> Subject: Re: [BioC] statistical test for time course data
>> To: Richard Friedman <friedman at cancercenter.columbia.edu>
>>
>>
>> Hi ;
>>
>> Thank you Richard for help.
>> I have the data like this table
>>
>> Time  number
>> 0hr     #
>> 6hr     #
>> 24hr   #
>>
>> i tried to follow the example as in userguide and as Richard suggested
>> me,but  have the following questions:
>> in user guide
>> ***************
>>> lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr")
>>> f <- factor(targets$Target, levels=lev)
>>> design <- model.matrix(~0+f)
>>> colnames(design) <- lev
>>> fit <- lmFit(eset, design)
>> ***************
>>
>
> Chris,
>
>> Q1) what about est, in this stage i would like to test the statistical
>> significance between numbers showed in second column which represents the
>> number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number??
>>
>> when i tried so i got the following error   ---  Error in rowMeans(y$exprs,
>> na.rm = TRUE) : 'x' must be numeric
>
> It does not express the number of genes. eset contains the expression data,
>
>
>>
>> Q2) Can anyone explain for methe meaning of (~0+f) in
>> design <- model.matrix(~0+f)
>
> I myself am vague on this point, but typing
>
> design
>
> will give you your design matrix.
>
>
>>
>> Q3) how to design different matrices for different conditions,can any one
>> send me a tutorial for this.
>>
>
>
> In the targets file label all of the time points except the one to be left out as a.
> Label the others b.
>
> You can send me your targets file when you do this (send to the list as well).
>
> With hopes that this helps,
> Roc
>
>> Thank you very much in advance.
>>
>>
>>
>>
>> On Wed, Feb 6, 2013 at 10:30 PM, Richard Friedman <
>> friedman at cancercenter.columbia.edu> wrote:
>>
>>> Dear Chris,
>>>
>>> For the questions you are asking I recommend not using splines.
>>> For the comparison of  t1 vs t2 us a design matrix which makes every point
>>> a different
>>> time point and then do t2 vs t1, For t1 compared to all other points, I
>>> would
>>> label t1 A, and all other points B.
>>>
>>> If anyone on the list has a different opinion in the matter I would
>>> appreciate hearing from them.
>>>
>>> With hopes that this helps,
>>> Rich
>>>
>>>
>>>
>>> On Feb 5, 2013, at 10:09 AM, chris Jhon wrote:
>>>
>>> Hi All,
>>>
>>> Thank you Gordon and Richard very much.
>>>
>>> In my data,for each time point i have the number of expressed genes and i
>>> would like to find if the number of expressed genes at t1 is different from
>>> number of  expressed genes at t2 ,or is different from all other time point
>>> using statistical test.
>>>
>>> the data look like this:
>>>
>>> time                            t1  t2  .... tn
>>> expressed genes        #  #   ......#
>>>
>>> I have only one group,Shall i use same design matrix ? shall i use df=5
>>> as in example??
>>>
>>>
>>> Best Regards,
>>> Chris
>>>
>>> On Tue, Feb 5, 2013 at 12:02 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>>> Dear Rich,
>>>>
>>>> I have added a time course example using splines to the limma User's
>>>> Guide, see page 48:
>>>>
>>>> http://bioconductor.org/**packages/2.12/bioc/vignettes/**
>>>> limma/inst/doc/usersguide.pdf<http://bioconductor.org/packages/2.12/bioc/vignettes/limma/inst/doc/usersguide.pdf>
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>> ------------------ original message ------------------
>>>> [BioC] statistical test for time course data
>>>> Richard Friedman friedman at cancercenter.columbia.edu
>>>> Sun Feb 3 20:18:03 CET 2013
>>>>
>>>> Dear Gordon,
>>>>
>>>>        Thank you very much for the clarification. Now that I think of
>>>> it, the one-against all is straightforward. However, If there are any
>>>> worked examples you could point me towards for polynomial and spline
>>>> modeling of the time series I would greatly appreciate it. I am especially
>>>> interested in testing the hypothesis that the temporal behavior of 2
>>>> treatments are different.
>>>>
>>>> Best wishes,
>>>> Rich
>>>>
>>>> ______________________________**______________________________**
>>>> __________
>>>> The information in this email is confidential and intend...{{dropped:4}}
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives: http://news.gmane.org/gmane.**
>>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>
>>>
>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
Dr. Paul Geeleher, PhD (Bioinformatics)
Section of Hematology-Oncology
Department of Medicine
The University of Chicago
900 E. 57th St.,
KCBD, Room 7144
Chicago, IL 60637
--
www.bioinformaticstutorials.com