[BioC] Problem getting the exact ProbeNames

Wolfgang Huber whuber at embl.de
Thu Jan 13 17:15:27 CET 2011


Hi Karsten,

if you created an AffyBatch x with ReadAffy, then exprs(x) is a matrix 
whose rows correspond to the probes on the array, one after the other as 
they physically on the chip. The mapping between row-index in the 
AffyBatch and (x,y)-coordinates is provided by the functions indices2xy 
and xy2indices in the 'affy' package (whose code you can see by typing 
their name). Essentially, it is very simple:

     x = (i - 1) %% nr
     y = (i - 1) %/% nr
and in reverse:
     i = x + 1 + nr * y

where nr is the width of the chip. So one strategy is to compute the 
(x,y) index of each probe on your array by

     indices2xy(seq_len(nrow(mitdata)), abatch=mitdata)

and use this to merge with your probe-sequence table. This might be 
easier and more transparent than going through probeNames.

Probe sequences for many Affymetrix chips are obtained through the 
'probe' packages (whose content is complementary to the smaller 'cdf' 
packages):

  library(hgu95av2probe)
  head(as.data.frame(hgu95av2probe))


	Best wishes
	Wolfgang


Karsten Voigt scripsit 12/01/11 15:28:
> Hi all,
>
> On 01/11/2011 07:36 PM, James W. MacDonald wrote:
>> Hi Karsten,
>>
>> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>>> Dear all,
>>>
>>> I am currently working on a project where I need to get the exact IDs of
>>> probes of a custom Affymetrix Chip in order to merge it with another
>>> list containing the sequence.
>>>
>>> I am using this small R script for creating the list:
>>>
>>> mitdata <- ReadAffy();
>>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>>> nrmdata <- normalize.quantiles(stddata);
>>> namedata <- probeNames(mitdata);
>>> enddata <- cbind(namedata, nrmdata);
>>> write.table(enddata, file="probesdata.txt",sep="\t");
>>>
>>> This is an output example
>>>
>>> ...
>>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>>> ...
>>>
>>> As you can see, a number of probes have the same name but refer to
>>> different oligos. The number in front of the row is just added by me,
>>> therefore you can ignore it.
>>>
>>> I received a list containing the probe name, a couple of other
>>> information AND the sequence.
>>>
>>> This is a part of it:
>>>
>>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>>>
>>> This should be the same area.
>>>
>>> In this received list, I can identify the unique probes using the 2
>>> numbers right after the exclamation mark, which are referring to the
>>> position on the chip, I guess. How can I extract those coordinates for
>>> my own list? I tried it with indices2xy, however I failed to get it
>>> running since I don't understand how to use this function correctly.
>>
>> Using the hgu95av2cdf as an example:
>>
>> > library(hgu95av2cdf)
>> > x <- as.list(hgu95av2cdf)
>> > x <- x[order(names(x))]
>> > x <- unlist(sapply(x, function(x) x[,1]))
>> > xys <- indices2xy(x, cdf="hgu95av2cdf")
>> > head(xys)
>> x y
>> 1000_at1 399 559
>> 1000_at2 544 185
>> 1000_at3 530 505
>> 1000_at4 617 349
>> 1000_at5 459 489
>> 1000_at6 408 545
>>
>> Best,
>>
>> Jim
>>
>
> first of all, many thanks to Jim for the quick and good answer. I runned
> your script on my own cdf and it is exactly extracting what I am looking
> for.
>
> However I still cannot identify the probes in my CEL-files loaded by the
> ReadAffy() function. If I run probeNames on it, the probes will be
> exported alphabetically. I cannot imagine that the CEL file probe values
> are also sorted alphabetically in the way I gained it.
>
> I think my way of creating this list is wrong since it is highly
> unlikely and impossible to prove that the probe names and the normalized
> data are listed in the same order:
>
> How can I prove that the probeNames are fitting to the probe values? Is
> it also possible to extract the x y values out of the cdf file?
>
> One other question: Is there any possibility to extract the sequence out
> of the cdf file?
>
> Many thanks in advance again,
>
> Karsten
>
>


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list