[BioC] Bioconductor Digest, Vol 83, Issue 23

M. Carmen Ruiz de Villa mruiz_de_villa at ub.edu
Tue Jan 26 17:26:41 CET 2010


Josep

El teu exercici es va penjar bé a la uoc, el que no havia rebut eren els 
missatges que semblava que havies enviat.
Si de cas durant un temps intentaré confirmar-te que he rebut el que enviis 
a la uoc i així verificarem que no hi ha cap problema.

Salutacions

M. Carme
----- Original Message ----- 
From: <bioconductor-request at stat.math.ethz.ch>
To: <bioconductor at stat.math.ethz.ch>
Sent: Monday, January 25, 2010 12:00 PM
Subject: Bioconductor Digest, Vol 83, Issue 23


> Send Bioconductor mailing list submissions to
> bioconductor at stat.math.ethz.ch
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> or, via email, send a message with subject or body 'help' to
> bioconductor-request at stat.math.ethz.ch
>
> You can reach the person managing the list at
> bioconductor-owner at stat.math.ethz.ch
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioconductor digest..."
>
>
> Today's Topics:
>
>   1. Re: Seeking assistance on ROC (Susan Bosco)
>   2. Re: question about lmFit model (Sunny Srivastava)
>   3. Agilent G4112A Arrays (Chuming Chen)
>   4. Re: Agilent G4112A Arrays (Prashantha Hebbar)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 25 Jan 2010 09:25:13 +0530 (IST)
> From: Susan Bosco <susanbosco86 at yahoo.com>
> To: Sean Davis <seandavi at gmail.com>
> Cc: prashantha hebbar <prashantha.hebbar at manipal.edu>,
> bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Seeking assistance on ROC
> Message-ID: <818904.90406.qm at web95305.mail.in2.yahoo.com>
> Content-Type: text/plain
>
> Dear Sean,
>
> Thanks again.
>
> I corrected the script changing the value of 'truth' variable with 
> rbinom() function. Since my data size is quite large(data is of 244K),I 
> tried with the first 200,for which I was able to find proper ROC curve. 
> However, when I include the complete data, the plot changes. For the whole 
> data,I get
> a linear graph with small variations.
>
> My sessionInfo() looks like this:
> For 100 values of the data:
> library(ROC)
> load("RGKma.RData")
> state= rbinom(length(RGKma$M[1:100,3]),1,0.33)
> data = RGKma$M[1:200,3]
> R1<-rocdemo.sca(truth=state,data,dxrule.sca)
> pdf("ROCk.pdf")
> plot(R1, show.thresh=TRUE,col = "red")
> dev.off()
>
> For the complete data:
> library(ROC)
> load("RGKma.RData")
> state= rbinom(length(RGKma$M[,3]),1,0.33)
> data = RGKma$M[,3]
> R1<-rocdemo.sca(truth=state,data,dxrule.sca)
> pdf("ROCallk.pdf")
> plot(R1, show.thresh=TRUE,col = "red")
> dev.off()
>
> I would appreciate if you could
> help me out with this problem that I encountered with a large data size.
>
> Thanking you sincerely,
> Susan.
>
>
> --- On Wed, 20/1/10, Sean Davis <seandavi at gmail.com> wrote:
>
> From: Sean Davis <seandavi at gmail.com>
> Subject: Re: [BioC] Seeking assistance on ROC
> To: "Susan Bosco" <susanbosco86 at yahoo.com>
> Cc: bioconductor at stat.math.ethz.ch, "prashantha hebbar" 
> <prashantha.hebbar at manipal.edu>
> Date: Wednesday, 20 January, 2010, 12:05 PM
>
>
>
> On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86 at yahoo.com> 
> wrote:
>
>
> Dear
> Sean,
>
> Thank you so much for the help.
>
>
> I tried with a range of thresholds from 0-0.9..As you had mentioned,the
> true positive rates no doubt increased with thresholds below
> 0.9.However I did get some false positive rates even at a minimum 
> threshold
> of 0.1.Could you kindly explain the reason?
>
>
>
> Is
> there any method of finding the optimal threshold,maximizing the true
> positive rates while minimizing the false positives,instead of randomly
> choosing between 0-0.9?
>
>
> Hi, Susan. The ROC curve IS that method. The ROC curve represents ALL 
> thresholds as applied to the data. If you plot with show.thresh=TRUE, you 
> will see the thresholds that were tried and where they are on the curve.
>
>
> If the threshold to which you are referring is the one that you used to 
> determine the variable you called "state", then we are talking about two 
> different things. The "truth" variable is meant to be assigned by some 
> source other than the data themselves. If you do not know the true state 
> of your samples and find yourself assigning the state the data, then ROC 
> curve analysis will not be of any use.
>
>
> Sean
>
>
> Thanks in advance,
>
> Susan.
>
>
>
>
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
>
>
>
>
>      The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 25 Jan 2010 00:05:18 -0500
> From: Sunny Srivastava <research.baba at gmail.com>
> To: sabrina s <sabrina.shao at gmail.com>
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] question about lmFit model
> Message-ID:
> <85bae9e21001242105x310c5ab1wc81164170b9afc6b at mail.gmail.com>
> Content-Type: text/plain
>
> Dear Sabrina,
> Experienced members of the group will have better things to say but here 
> is
> my $0.25.
> As a statistician - I would prefer Design 1. The reason is - that data
> should never be ignored.
>
> Also, more the data, Limma can take more advantage of this information in
> the Empirical Bayesian Estimation of S.D. Lower p-values are because of 
> this
> fact. (Taking less data might result in inflated SDs which can also result
> in lower p-values.)
>
> Comparing Differential expression and Fold Change is like comparing Apple
> and oranges. Differential expression has nothing to do with low fold 
> change.
> As a statistician, I would always trust differential expression than
> Fold-Change.
> If you think that fold-change is important for you then you should select
> the differentially expressed genes ONLY if their log fold-change is above
> say 2.
>
> you can do this in limma using topTable and/or decideTests.
>
> Pls correct me if I am wrong.
>
> Thx
> S.
>
> On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao at gmail.com> wrote:
>
>> Hi, Jenny:
>> Thanks for the quick reply. And thanks for pointing out about posting. I
>> thought maybe my subject was not good enough to be noticed and that is 
>> why
>> I
>> posted again. This is my first post, so long way to go!
>> Regarding your second point: I don't think my question is a general one
>> about why ANOVA is better than a series of t-tests. I actually did both,
>> but
>> realized that the result from one single model ( use all samples) gave me
>> much lower p-values, but when I looked at the expression value, the fold
>> change was nothing , like 0.5. That is why I wonder if the inflated DOF
>> gave
>> me much low p-values. Any thoughts on that?
>>
>> Thanks!
>>
>> Sabrina
>>
>> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich at illinois.edu
>> >wrote:
>>
>> > Hi Sabrina,
>> >
>> > First, a little list ettiquette. If you don't get a response to a post
>> > within a day, it's not considered polite to just repost the same 
>> > question
>> > verbatim the next day under a different Subject.
>> >
>> > Second: your question isn't specific to the modeling of lmFit. Instead,
>> > it's a general statistical question about why it's better to one ANOVA
>> model
>> > instead of a series of t-tests. I suggest you consult a basic 
>> > statistical
>> > textbook or a local statistician to find the answer.
>> >
>> > Cheers,
>> > Jenny
>> >
>> >
>> > At 10:39 AM 1/21/2010, sabrina s wrote:
>> >
>> >> Hello, everyone:
>> >>
>> >> I have a question related to conceptual understanding of lmFit.
>> >>
>> >> I have the following experiment that I want to conduct, but I am not
>> sure
>> >> which is the right way to use design matrix and contrasts. Here is the
>> >> experiment:
>> >>
>> >> say I have 3 different strains that are genetically different, A, B 
>> >> and
>> C
>> >> where A is the control. I also have two different treatments,
>> >>  T1 and T2. For each strain, I have 4 arrays for each treatment, so in
>> >> total, I have 24 arrays. What I want to find out is the significantly
>> >> differentially expressed genes for the following comparison:
>> >> 1) for control strain A:  T1 vs T2
>> >> 2) under T1, B vs. A (control)
>> >> 3) under T1, C vs. A
>> >> 4) for B, T1 vs T2
>> >> 5) for C, T1 vs T2
>> >> 6) interaction term of A and B , T1 and T2
>> >> 7) interaction term of A and C, T1 and T2.
>> >>
>> >> There are two ways I could use lmFit
>> >>
>> >> One is:
>> >>
>> >> for the design matrix, I will include all 3 strains and 2 conditions,
>> >> I use the following code:
>> >>            A_T1, A_T2, B_T1, B_T2, C_T1, C_T2
>> >> sample1:   1      ,0         ,0,        0,      0  ,         0
>> >> sample2 :
>> >>
>> >> Then make a contrast matrix and follow the code below:
>> >>
>> >> fitGene<-lmFit(gene,design=design,weights=arrayWt);
>> >>  fitGene2<-contrasts.fit(fitGene,cont.matrix)
>> >> fitGene2<-eBayes(fitGene2,proportion=p);
>> >>
>> >>
>> >> Two:
>> >> Instead of using all samples at one time to fit into a lmFit function, 
>> >> I
>> >> use
>> >> two design matrix only involves A and B, T1 and T2,
>> >> and second design matrix that involves A and C, T1 and T2, and make
>> >> contrast
>> >> matrix and fit separately. and later on I can compare these two
>> >> results if I want to.
>> >>
>> >>
>> >>
>> >> The question I have is: which one is the right one? For the first
>> method,
>> >> I
>> >> will have large DOF , and much lower p-values, but it was testing the
>> >> same thing as the second one, so am I creating an artifact? Thanks for
>> >> your help!
>> >>
>> >>
>> >>
>> >>
>> >> Sabrina
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>> > Jenny Drnevich, Ph.D.
>> >
>> > Functional Genomics Bioinformatics Specialist
>> > W.M. Keck Center for Comparative and Functional Genomics
>> > Roy J. Carver Biotechnology Center
>> > University of Illinois, Urbana-Champaign
>> >
>> > 330 ERML
>> > 1201 W. Gregory Dr.
>> > Urbana, IL 61801
>> > USA
>> >
>> > ph: 217-244-7355
>> > fax: 217-265-5066
>> > e-mail: drnevich at illinois.edu
>> >
>>
>>
>>
>> --
>> Sabrina
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 25 Jan 2010 01:32:06 -0500
> From: Chuming Chen <chumingchen at gmail.com>
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Agilent G4112A Arrays
> Message-ID: <4B5D3AE6.3050507 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dear All,
>
> I am trying to find out the differentially expressed genes from some
> Agilent Human Whole Genome (G4112A) Arrays data.
>
> I have tried LIMMA package, but LIMMA gave the error message "no
> residual degrees of freedom in linear model fits" and stopped. My guess
> is that my data has no replicates in the experiment.
>
> Is there any other packages I can use to find differentially expressed
> genes which does not require replicates in the experiment?
>
> Thanks for your help.
>
> Chuming
>
>
>
> ------------------------------
>
> Message: 4
> Date: Sun, 24 Jan 2010 22:40:12 -0800 (PST)
> From: Prashantha Hebbar <prashantha.hebbar at yahoo.com>
> To: bioconductor at stat.math.ethz.ch, Chuming Chen
> <chumingchen at gmail.com>
> Subject: Re: [BioC] Agilent G4112A Arrays
> Message-ID: <410581.88367.qm at web110108.mail.gq1.yahoo.com>
> Content-Type: text/plain
>
> Dear Chen,
>
> You need not to look for any other packages. Since, you do not have any 
> replicates, do not fit linear model, instead just do normalization with in 
> arrays and look at the M (log ratio) values.
>
> Regards,
>
> Prashantha Hebbar Kiradi,
>
> Dept. of Biotechnology,
>
> Manipal Life Sciences Center,
>
> Manipal University,
>
> Manipal, India
>
>
>
> --- On Mon, 1/25/10, Chuming Chen <chumingchen at gmail.com> wrote:
>
> From: Chuming Chen <chumingchen at gmail.com>
> Subject: [BioC] Agilent G4112A Arrays
> To: bioconductor at stat.math.ethz.ch
> Date: Monday, January 25, 2010, 6:32 AM
>
> Dear All,
>
> I am trying to find out the differentially expressed genes from some 
> Agilent Human Whole Genome (G4112A) Arrays data.
>
> I have tried LIMMA package, but LIMMA gave the error message "no residual 
> degrees of freedom in linear model fits" and stopped. My guess is that my 
> data has no replicates in the experiment.
>
> Is there any other packages I can use to find differentially expressed 
> genes which does not require replicates in the experiment?
>
> Thanks for your help.
>
> Chuming
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
> End of Bioconductor Digest, Vol 83, Issue 23
> ********************************************
>



More information about the Bioconductor mailing list