[BioC] Difference between number of probes and number of data rows using 'oligo' on Affy miRNA v3.0 arrays

James W. MacDonald jmacdon at uw.edu
Mon Feb 25 15:34:29 CET 2013


Hi Vicky,

On 2/24/2013 8:03 PM, Vicky Fan wrote:
> Dear all,
> I am using the 'oligo' package to process data from Affymetrix miRNA v3.0 arrays.  When I extract the probe names as follows, I get 243982 probes:
>
>
>> library(oligo)
>> celFiles<- list.celfiles()
>> rawData<- read.celfiles(celFiles)nn
>> pNames<- probeNames(rawData)
>> exprs.rawData<- exprs(rawData)
>
>
> However, extracting the data itself gives me a different number of rows:
>
>
>
>> length(pNames)
> [1] 243982
>
>> dim(exprs.rawData)
> [1] 292681      6
>
> I’ve verified that this result occurs using the sample CEL files from the Affymetrix website here (although there is a login required):
>
> http://www.affymetrix.com/Auth/support/downloads/demo_data/mirna_3_sample_data.zip
>
> Shouldn’t the number of probes in the CEL file be the same as the number of rows in the dataset?  I’m aware that the exprs function is for objects of type eSet and that read.celfiles returns an ExpressionFeatureSet object, not an eSet object, so maybe this has something to do with the non-matching numbers.

There are a large number of probes around the perimeter of the array (as 
well as some blocks of probes in the middle) that are primarily used for 
aligning the scanner to the array. Since these probes don't measure 
anything of interest (it's oligo-dT), they are not used in any further 
calculations.

The difference here is due to the fact that all probes are scanned by 
the scanner, and those data are available in the celfile, so the 
dimensions of the raw data will reflect the existence of these extra 
probes. But since these probes aren't used for anything else, so when 
you extract the probe names, those data only reflect the number of 
probes on the array that are intended to measure various transcripts.

Best,

Jim
>
> Regards,
> Vicky
>
> --
> Vicky Fan
> Research Programmer
> Bioinformatics Institute
> School of Biological Sciences
> University of Auckland
> Ph: 09 373 7599 x 83777
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list