[BioC] how to get probe ids??

Thu Sep 15 15:40:11 CEST 2011

Hi Anand,

On 9/14/2011 11:46 AM, anand m t wrote:
> Hi all..
>
> I'm very new to microarray analysis.
> i've been given two datasets for analysis. (experimental and control
> with 3 replicates each)
> I've encountered following errors/problems..
>
> 1.whenever i tried to run mas5calls, it throws an error saying the
> presence of NA/Inf/NAN's in the data.
>
>> affy.data=ReadAffy()
>> data.mas5calls = mas5calls(affy.data)
> Getting probe level data...
> Computing p-values
> Error in FUN(1:6[[1L]], ...) :
>    NA/NaN/Inf in foreign function call (arg 2)
>
> then i tried removing NA's using following command..
>
>> na.omit(affy.data)
> AffyBatch object
> size of arrays=1050x1050 features (18 kb)
> cdf=MoGene-1_0-st-v1 (35556 affyids)
> number of samples=6
> number of genes=35556
> annotation=mogene10stv1
> notes=

This problem arises because mas5calls() is a method for determining if 
the perfect match (PM) probes are significantly different from the 
mismatch (MM) probes. However, the mogene chip has no MM probes, so you 
cannot compute mas5calls in the conventional sense.

In fact, the affy package isn't really designed to process the newer 
version of Affy chips, so you are doing yourself a disservice by using 
it. Instead you should be using either the oligo or xps package.

The oligo package will allow you to compute DAGB calls, which are the 
successor to the mas5calls. The xps package will also allow you to do 
this, and will even allow you to compute mas5calls. However, given that 
there are no MM probes, it cannot be computing the conventional 
mas5calls, so I would recommend sticking with DAGB.

>
> But even after, the same problem exists. How do i solve this??
>
> 2. I skipped this step and proceed with next step. I calculated
> p-values and extracted all statistically significant probes. But,
> look at my probe names (rma normalized data)
>
> probe_names	control_1	control_2	control_3	experimental_1	experimental_2	experimental_3
> 10338001	11.70433113	11.09411799	11.17114406	12.3810603	11.3593078	11.30987883
> 10338002	7.455822379	7.022795366	6.977515221	7.863429983	6.659503501	6.583122799
> 10338003	9.944000269	9.329330062	9.439069933	10.87092521	9.507433404	9.421644356
> 10338004	8.807458574	8.190795944	8.336526249	9.666564028	8.555489147	
>
> It doest have any probe extensions such as "_at", etc.  what might be
> the problem?? How do i proceed now ??

The problem is that you aren't working with a 3'-biased array (which had 
probeset IDs of that form). The Gene ST arrays just have numerical 
probeset IDs, which is what you see there.

You proceed as normal with these arrays. Compute some sort of summary 
statistic for each probeset, then fit whatever univariate linear model 
you deem necessary, and extract the 'top' probesets for further exploration.

Note however that the Gene ST arrays are basically a subset of the Exon 
ST arrays, so the notion of probeset is less fixed than it was with the 
3'-biased arrays. In other words, you can define a probeset in one of 
two different ways. There is a vignette in oligo that shows how to 
process these chips 
(http://www.bioconductor.org/packages/2.8/bioc/vignettes/oligo/inst/doc/V5ExonGene.pdf).

Best,

Jim

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues