[BioC] Subsetting Affybatch objects by gene lists.

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Mon Mar 15 18:56:22 MET 2004



> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Horswell,
> Stuart
> Sent: 15 March 2004 14:22
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Subsetting Affybatch objects by gene lists.
>
>
>
> Hi all,
>
> 	I'm trying to run an analysis on 24 Affymetrix HGu95v2 chips.
>
> I've set up, via merge.AffyBatch, an affybatch object containing
> all 24 arrays.
>
> A1 <- read.affybatch("A1.cel")
> .
> .
> .
> A24 <- read.affybatch("A24.cel")
>
> A <- merge.AffyBatch(A1, A2)
> A <- merge.AffyBatch(A, A3)
> .
> .
> .
> A<- merge.AffyBatch(A, A24)
>

Why are you doing this ? If you have all the cel files in one place, then

filenames <- list.files(path="/celfile/directory/", pattern=".cel",
full.names=TRUE)
raw <- ReadAffy(filenames=filenames)


>  I then computed MAS5 type Present/Absent calls for each array
> using mas5calls.
>
> A.calls <- mas5calls(A)
> p.a.A <- exprs(A.calls)
>
>  What I'd like to do now is remove all of those genes without a
> single present call across all 24 arrays before normalizing.
>
> I can use the p.a.A file to obtain a list of the gene names/affy
> id tags that I want to remove but I can't figure out how to
> delete the relavent probe pairs from my affybatch object.

You mean probesets and not probe pairs ? The dimension of p.a.A is n x 24;
where n is the number of probesets in HGu95v2. MAS 5.0 gives one detection
call for each probeset (which contain 11-20 probe pairs).

If you want to remove all the probesets with 0 Present call, try this :

A.expr   <- mas5(A)
A.expr   <- exprs(A.expr)

A.calls  <- mas5calls(A)
A.calls  <- exprs(A.calls)

presence <- apply( A.calls, 1, function(x) sum( x=="P" ) )
bad      <- which( presence == 0 )
length(bad)

stopifnot( identical( rownames(A.expr), rownames(A.calls) ) )
stopifnot( identical( colnames(A.expr), colnames(A.calls) ) )
# check if the dataset was permutated somehow

if(length(bad) > 0 ){
	A.expr.trimmed <- A.expr[ -bad, ]
	A.call.trimmed <- A.call[ -bad, ]
}


> In fact that only things I've been able to find on the mailing
> list archive and/or vignettes are how to subset by array or how
> to remove chunks from the cdf environment - but this presents me

The resultant of using exprs() on an Affybatch is a matrix and you can
simply proceed with subset. Try class(A.expr) before and after exprs().

> with two problems, first I'm not sure I can get the pattern
> matching working well enough to identify which entry numbers in
> the cdf file correspond to the gene list I have, and secondly,
> people have already commented that this isn't neccessarily a
> sensible approach for proper analysis anyway. So I'm kind of stumped now!

Better avoid matching whenever possible as it can be slow on large dataset.
Better to check if rownames and colnames are identical.

You can be more stringent and remove any probesets with less say 3 present
calls etc. Or you can try using other normalization method like rma, gcrma
etc.


> Any help or advice would be most greatfully received,
>
>      many thanks,
>
>            Stu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list