[BioC] illumina beadarray GEO files

James F. Reid james.reid at ifom-ieo-campus.it
Wed Jun 15 16:09:14 CEST 2011


Hi Nathalie,

On 06/15/2011 02:13 PM, Nathalie Conte wrote:
> HI
> I want to have a look at this experiment which is deposited in GEO under
> the reference:GSM 290549, this experiment contains 6 files
> GSM296418.csv.gz 293.0 Kb (ftp)
> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Ecsv%2Egz>(http)
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Ecsv%2Egz&is_ftp=true>
> CSV
> GSM296418.locs.gz 7.2 Mb (ftp)
> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Elocs%2Egz>(http)
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Elocs%2Egz&is_ftp=true>
> LOCS
> GSM296418.tif.gz 51.7 Mb (ftp)
> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Etif%2Egz>(http)
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Etif%2Egz&is_ftp=true>
> TIFF
> GSM296418.txt.gz 11.4 Mb (ftp)
> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Etxt%2Egz>(http)
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Etxt%2Egz&is_ftp=true>
> TXT
> GSM296418.xml.gz 665 b (ftp)
> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Exml%2Egz>(http)
> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Exml%2Egz&is_ftp=true>
> XML
>
> I am trying to find a way to reanalyse this, but is struggling to find a
> appropriate way. I just want to know whether the genes in this array are
> below or above the detection level threshold used in beadarray.
> Has anybody got any advice about a way to analyse microarray data
> deposited in GEO to get this kind of information?
> many thanks
> Nat

I am not sure what you mean by the 'detection level threshold used in 
beadarray'. But you can use GEOquery as Sean suggested in a previous 
mail/thread.
By plotting the density of the expression values of this array I would 
say that the large peak represents the 'non-expressed' genes or genes 
expressed a very low levels which could be similar to the detection 
level threshold you mention. By running the following code I get that 
roughly 30% of the 20K genes are below this threshold (5.36 on log2 
scale from signals ranging from 0 to 16).

require("GEOquery") || stop("Could not load package 'GEOquery'.")
## download single array GSM290549
gsm <- getGEO("GSM290549")

## extract 'Illumina average value' signal data
head(Table(gsm), n=3)
##      ID_REF     VALUE
##1 ILMN_10000  105.0698
##2 ILMN_10001  2355.704
##3 ILMN_10002 -9.846933
x <- as.numeric(Table(gsm)[, 'VALUE'])
range(x)
##[1]   -35.65039 53405.58000

## transform data according to authors in original study
Meta(gsm)$data_processing
##[1] "Data were extracted with Illumina BeadStudio software using
##background subtraction and cubic spline normalization. Data were then
##adjusted by shifting the absolute minimum value for each array to be
##equal to 1; and then log2 transformed."
y <- log2(x + abs(min(x)) + 1)
range(y)
##[1]  0.00000 16.25923

## plot kernel density of signals
yDens <- density(y)
plot(yDens, main=Meta(gsm)$geo_accession)
## calculate the density peak value
densPeak <- yDens$x[which.max(yDens$y)]
## draw it
abline(v=densPeak, lwd=2, lty=2)
densPeak
##[1] 5.367655
2^(densPeak)
##[1] 41.28812
sum(y < densPeak)
##[1] 5821
sum(y > densPeak)
##[1] 14768

HTH.
J.



More information about the Bioconductor mailing list