[BioC] Please comment the way I'm thinking about the way to find differentially expressed genes

Kaj Chokeshaiusaha kaj.chk at gmail.com
Sat Jul 26 09:09:17 CEST 2014


Dear Prof. David,

Thank you very much for your patience. Your indication of having three
samples really clarify everything.
I will follow the usual way.

Thank you very much again for your patience and kindness,
Kaj

2557-07-26 1:51 GMT+07:00, Sean Davis <sdavis2 at mail.nih.gov>:
> On Fri, Jul 25, 2014 at 2:32 PM, Kaj Chokeshaiusaha <kaj.chk at gmail.com>
> wrote:
>
>> Dear all,
>> Thank you very much for your comments. I now feel confident to stick
>> with the usual approach.
>> There is one thing that sticks in my mind all the time. This is
>> probably due to my lack of basic knowledge. I'm wondering about people
>> who generate sets of data using methods like leave-one-out from their
>> original data. After that applying test (like limma), and finally
>> check for top genes most repeated in differentially expressed gene
>> lists produced by all sets of data (for example, 4 out of 6).
>> Is this kind of approach better than sticking to the list of
>> differentially expressed genes list produced by original data?
>>
>>
> In general, you will want to use all your data when you have only 3 samples
> per condition.  Your power will be maximized this way.
>
> To answer your question, ad hoc approaches can be useful, but you really
> have to think about whether or not you can quantify how "good" your gene
> list is after applying such an approach (what is the p-value or
> false-discovery-rate).  Since you may have trouble doing that for your
> specific example, I doubt that you gain anything from even attempting it.
>
> Sean
>
>
>
>> Thank you very much in advance for your patience with me.
>>
>> With Respects,
>> Kaj
>>
>> 2557-07-25 22:53 GMT+07:00, Sean Davis <sdavis2 at mail.nih.gov>:
>> > Hi, Kaj.
>> >
>> > You may be overthinking things a bit.  Differential gene expression
>> > analysis has a lot of history and has developed around the constraints
>> > imposed by small sample sizes, so most modern tools for doing
>> differential
>> > expression analysis will handle your data in a rational and
>> > statistically
>> > sound way.  I would considering starting with limma; the user guide is
>> > excellent and the package is very highly utilized for experiments
>> > presumably just like yours.  I don't want to discourage
>> > experimentation,
>> > but it is often best to start with a known analysis if only for
>> comparison
>> > if you do try something more exotic.
>> >
>> > Sean
>> >
>> >
>> >
>> > On Fri, Jul 25, 2014 at 11:20 AM, Kaj Chokeshaiusaha [guest] <
>> > guest at bioconductor.org> wrote:
>> >
>> >> Dear R helpers,
>> >>
>> >> I'm a starter in gene expression analysis, and I must apologize
>> >> everyone
>> >> in the first place if I'm posting something irritated. My attemp is
>> >> just
>> >> to
>> >> figure out an alternative way to find out differentailly expressed
>> >> genes
>> >> in
>> >> low replicated datasets.
>> >>
>> >> In case that, I have very few number of replicated datasets per group
>> >> (2-3
>> >> replicates per group). I'm wondering whether I can generate several
>> >> datasets from my original datasets I have (using methods like
>> >> Bootstrap)
>> >> and then perform the test to find out the lists of differentially
>> >> expressed
>> >> genes from my created datasets. After that I count the repeated genes
>> >> from
>> >> all lists and pick the top ones as differentially expressed genes.
>> >>
>> >> Please comment the idea, I don't want to slip too far in the wrong
>> >> approach.
>> >>
>> >> With Respects,
>> >> Kaj
>> >>
>> >>
>> >>  -- output of sessionInfo():
>> >>
>> >> R version 3.1.0 (2014-04-10)
>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>> >>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>> >>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>> >>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>> >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] parallel  stats     graphics  grDevices utils     datasets
>> >> methods
>> >> [8] base
>> >>
>> >> other attached packages:
>> >> [1] CMA_1.22.0          Biobase_2.24.0      BiocGenerics_0.10.0
>> >> [4] e1071_1.6-3
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] class_7.3-10 tools_3.1.0
>> >>
>> >> --
>> >> Sent via the guest posting facility at bioconductor.org.
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>>
>



More information about the Bioconductor mailing list