[R] Selecting a subsample so that it follows a distribution.

Bryo brynedal at gmail.com
Wed Mar 2 16:14:02 CET 2011


Hi All,

I want to select rows at random from a large data.frame while achieving a
particular distribution defined my a given subset of this data.frame. How
can I do this? More details and what I've done so far is given below. 

I have gene expression data and gene sets of interest. In order to look at
enrichment of differential expression I'm doing a simple permutation
approach: Selecting a an random set of genes (same size at those diff exp)
and recording the overlap, repeating 10 000 times. The problem: The
expression level and significance in differential expression is correlated
(more power). Hence I want to do a biased permutation, selecting random
genes that together follow the same expression level distribution. 

This is what I've done so far:
geneExp is my data.frame with DE statistics. 6585 rows of genes, col one is
gene ID. 
geneSet is my gene set, column one is gene ID. 
index is the index of the genes DE in my geneExp.

dSign=density(geneExp[index,'baseMean']) #baseMean is a measure of
expressionlevel

prob=lapply(geneExp[,"baseMean"],function(x) approx(dSign$x,dSign$y,x)$y)
prob=unlist(prob)

So when I am doing my permutation I do:

overlap=vector(0,length=10000)

for (i in 1:10000) {
	index=sample(1:6585,543,prob=prob)
	overlap[i]=sum(!is.na(match(geneSet[,1],geneExp[index,1])))
	}

And thereafter look at the distribution of random overlaps compared to the
initially observed overlap.

But, the distribution of values that this permutation gives in NOT equal to
the distr of significant genes, but a lot narrower. Simple because my method
assumes a uniform distribution of values to chose from.

Sorry if this was a complicated message, I would highly appreciate any help
or comments!  

Best,
Bryo 


--
View this message in context: http://r.789695.n4.nabble.com/Selecting-a-subsample-so-that-it-follows-a-distribution-tp3331659p3331659.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list