[BioC] aggregate_summarizing expression values over entrez gene ids

James W. MacDonald jmacdon at med.umich.edu
Thu Nov 13 14:37:11 CET 2008


Hi Vanessa,

Vanessa Vermeirssen wrote:
> Hi,
> 
> I have a dataframe containing RMA normalized and summarized expression 
> values for affymetrix probesets, av.data.
> I have looked up the Entrez gene ids for the probesets in the annotation 
> package, entrezids.
> Multiple probesets map of course to the same entrez id and I would like 
> to combine these data into one row,
> by averaging the expression values for the same entrez ids over the 
> different experiments.
> I tried the function "aggregate" to do this, but somehow it gives an 
> error that the arguments are not of the same length, but they are...???
> How can I solve this or is there any other way to do this?
> 
> See my code below...
> 
> av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = 
> "\t", header = T, na.strings = "NA", fill = T)
> av.data[1:5,1:5]
>          X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose
> 1007_s_at      9.281857      9.340795     9.151775   8.319741
> 1053_at        7.000684      6.867318     4.633061   5.101534
> 117_at         6.007608      6.124562     5.425565   5.692270
> 121_at         6.543294      6.728119     7.651856   7.692947
> 1255_g_at      3.077289      2.989938     4.622865   2.955812
>          X2_adipose_omental
> 1007_s_at           7.909480
> 1053_at             4.509407
> 117_at              6.298798
> 121_at              7.598834
> 1255_g_at           3.040816
> 
> probes <- ls(hgu133plus2ENTREZID)
> entrezids <- unlist(mget(probes,hgu133plus2ENTREZID))
> newdata <- data.frame(entrezids,av.data)
> 
> sum <- aggregate(av.data,as.list(entrezids),mean)
> Error in FUN(X[[1L]], ...) : arguments must have same length

The problem here is you need a list of vectors, each as long as 
dim(av.data)[1]. What you have given is a list of vectors, each of 
length one.

The difference is between list() and as.list(). If you use 
list(entrezids), you will get a list of length one, containing a vector 
of length 54675.

If you use as.list(entrezids) you get a list of length 54675, each item 
containing one Entrez Gene ID.

Does this make sense?

Best,

Jim


> 
>  > length(as.list(entrezids))
> [1] 54675
>  > dim(av.data)
> [1] 54675    69
> 
> sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean)
> Error in FUN(X[[1L]], ...) : arguments must have same length
>  > length(as.list(newdata$entrezids))
> [1] 54675
>  > dim(newdata)
> [1] 54675    70
> 
> 
> Thank you so much!
> Vanessa
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list