[BioC] Generating random gene lists: does sample/resample generate random sets

Thomas Hampton Thomas.H.Hampton at Dartmouth.EDU
Wed Sep 10 22:40:32 CEST 2008


I would not have taken the curated list out. That strikes me as
a significant bias. Am I missing something?

Tom

On Sep 10, 2008, at 4:03 PM, Ochsner, Scott A wrote:

> Dear BioC,
>
> I would like feedback as to the appropriateness of the following  
> procedure to produce a set of 1000 random gene lists, each list of  
> length 2000.  The idea is to use the set of random gene lists to  
> assess how often random gene lists of size x can reproduce or  
> improve the classification performance of
> myCuratedList.
>
>
> #remove myCuratedList from the universe of possible genes.  The  
> "eset" object is your standard ExpressionSet object.
>> length(myCuratedList)
>  [1] 2000
>> Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList)
>> length(Index)
>  [1] 20277
> #generate 1000 random gene lists using the genes in Index.  The  
> code for resample is taken from the help pages for sample.
>
>> randomMatrix<-replicate(1000,resample(index,2000))
>> dim(randomMatrix)
>  [1] 2000 1000
>
>
> I've verified that each column does not contain repeated genes as  
> should be the case with resample without replacement.
>
> Is there a standard procedure for doing the above or is what I've  
> done kosher?
>
>
> Scott A. Ochsner, Ph.D.
> NURSA Bioinformatics
> Molecular and Cellular Biology
> Baylor College of Medicine
> Houston, TX. 77030
> phone: 713-798-6227
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list