[BioC] Please comment the way I'm thinking about the way to find differentially expressed genes

Fri Jul 25 17:52:11 CEST 2014

Hi Kaj,

I don't see how resampling is going to help you at all with just 2-3 
samples per group. Anyway, the bootstrap is in general used to generate 
improved estimates of the variance, not to generate 'new' data sets.

Figuring out ways to improve variance estimates was a fairly hot area of 
research about 10 years ago, and people have in general settled on the 
idea of empirical Bayesian estimates like you get with limma.

As a self-professed 'starter' in gene expression analysis, are you sure 
you are best equipped to improve on the accepted methods that were 
developed over several year by PhD statisticians? If not, I would just 
stick with using limma, especially if you want to publish your results. 
It's much easier to say 'I used the bioconductor limma package' then to 
explain your newfangled, unpublished method, especially if you are not a 
PhD statistician yourself.

Best,

Jim

On 7/25/2014 11:20 AM, Kaj Chokeshaiusaha [guest] wrote:
> Dear R helpers,
>
> I'm a starter in gene expression analysis, and I must apologize everyone in the first place if I'm posting something irritated. My attemp is just to figure out an alternative way to find out differentailly expressed genes in low replicated datasets.
>
> In case that, I have very few number of replicated datasets per group (2-3 replicates per group). I'm wondering whether I can generate several datasets from my original datasets I have (using methods like Bootstrap) and then perform the test to find out the lists of differentially expressed genes from my created datasets. After that I count the repeated genes from all lists and pick the top ones as differentially expressed genes.
>
> Please comment the idea, I don't want to slip too far in the wrong approach.
>
> With Respects,
> Kaj
>
>
>   -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] CMA_1.22.0          Biobase_2.24.0      BiocGenerics_0.10.0
> [4] e1071_1.6-3
>
> loaded via a namespace (and not attached):
> [1] class_7.3-10 tools_3.1.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099