[BioC] select genes above a threshold from a table having multiple probes

Thomas Girke thomas.girke at ucr.edu
Mon May 12 00:06:39 CEST 2008


Do you mean like in the following example?

## Create sample data set
x <- matrix(rnorm(1000), 20, 5, dimnames=list(1:20, paste("t", 1:5, sep="")))
z <- data.frame(ID=paste("g", sort(rep(1:10, 2)), sep=""), x)
myIDs <- unique(as.vector(z[,1]))

## Query A for rows
rowQ <- z[rowSums(z[, -1]>=1)>=2, ]
rowQ <- sort(unique(as.vector(rowQ[,1])))

## Query B for columns
colQ <- sapply(2:length(z[1,]), function(x) tapply(z[,x], z[,1], function(y) sum(y>=1)))
colQ <- rowSums(colQ>=2)
colQ <- sort(unique(as.vector(names(colQ[colQ>=1]))))

## Combine A & B with AND connection
qID_AND <- intersect(rowQ, colQ)
z[z[,1] %in% qID_AND, ]

## Combine A & B with OR connection
qID_OR <- unique(c(rowQ, colQ))
z[z[,1] %in% qID_OR, ]


Thomas

On Sun 05/11/08 22:09, Dr Balazs Gyorffy wrote:
> Thanks, but I think my description was somewhat loose.
> 
> So: I have a table looking like this:
> 	A	B	C	D	E
> g1	1.51	0.96	0.70	0.34	1.38
> g1	0.69	1.22	0.73	0.62	0.74
> g2	0.35	0.14	0.83	1.58	0.49
> g2	0.20	0.61	0.53	0.24	0.06
> g3	0.00	0.69	1.79	0.84	0.42
> g3	0.96	0.77	1.28	0.32	0.82
> 
> so I have multiple probes per gene (up to 16, but here only 2).
> 
> I am looking for the genes, in which at least TWO values in:
> 
> A, one probe in two or more samples: e.g. A1:E1
> B, several probes in one sample: e.g. here A1:A2
> 
> are above the threshold. (e.g. here g1 and g3 meet the criteria if the
> threshold is=1).
> 
> And THEN I would like to extract all probes for the gene (e.g. here the result
> table would include all g1 and all g3 probes).
> 
> Thank you:
> Balazs
> 
> 
> --- Thomas Girke <thomas.girke at ucr.edu> schrieb:
> 
> > Here is a simple example for finding all genes (rows) that have at least
> > x instances of a given comparison per row (x here >=1). 
> > 
> > y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""),
> > paste("t", 1:5, sep="")))
> > myQ <- y >= 1
> > myQcount <- rowSums(myQ) # counts TRUE values per row
> > y[myQcount>=1,]
> > 
> > I hope this helps. 
> > 
> > Thomas
> > 
> 
> 
> 
>       Nicht vergessen! Am Sonntag, den 11. Mai ist Muttertag 
> Geschenkideen, Gedichte & mehr: www.yahoo.de/muttertag
> 
-- 
Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list