[BioC] question regarding MAS5 normalization with reduced probes

James W. MacDonald jmacdon at med.umich.edu
Tue Aug 31 22:45:29 CEST 2010


Hi James,

On 8/31/2010 2:04 PM, James Anderson wrote:
> Hi Jim,
>
> Thanks a bunch for your help on this, it works. Sorry to bother you
> again, but is there a function to convert the probe indices into the
> probe characters you described? For example, for U133A, the probe
> indices is 1:247965, is there a function to convert it to
> Probeset1_1, ProbeSet1_2, ...,ProbeSet1_11, ProbeSet2_1, ProbeSet2_2,
> ...ProbSet2_11, ...., ProbeSet22283_1, ProbeSet22283_2,
> ProbeSet22283_11?

So you want *all* of the probes? Not sure where you are heading with 
this, but it isn't difficult to get them. I don't have the hgu133acdf 
installed, so for an example I will use the hgu95av2cdf:

 > library(hgu95av2cdf)
 > x <- as.list(hgu95av2cdf)
 > y <- sort(unlist(sapply(x, function(q) q[,1])))
 > head(y)
31483_g_at16    33941_at4    33941_at5
          646          647          648
    31977_at2    32448_at1   38227_at12
          649          650          651
 > head(names(y))
[1] "31483_g_at16" "33941_at4"    "33941_at5"
[4] "31977_at2"    "32448_at1"    "38227_at12"
 > length(y)
[1] 201800

Note that there are actually 403600 usable probe positions on this 
particular chip, but the other 201800 are MM probes, and have the same 
exact name, so we don't need those.

Also note that there are 6000 probes on this chip that we ignore (there 
are actually 409600 rows in the exprs slot of the AffyBatch). These 
extra probes are the oligo-B2 probes that are on the outside of the 
chip, used by the scanner to align to the chip.

Best,

Jim




>
> Thanks again,
>
> -James
>
> --- On Tue, 8/31/10, James W. MacDonald<jmacdon at med.umich.edu>
> wrote:
>
> From: James W. MacDonald<jmacdon at med.umich.edu> Subject: Re: [BioC]
> question regarding MAS5 normalization with reduced probes To: "James
> Anderson"<janderson_net at yahoo.com> Cc:
> "bioconductor"<bioconductor at stat.math.ethz.ch> Date: Tuesday, August
> 31, 2010, 1:15 PM
>
> Hi James,
>
> On 8/31/2010 12:17 PM, James Anderson wrote:
>> Hi Jim,
>>
>> Thanks a lot for the link. I've tried the code in the link, it
>> works without any problem if I were to take the whole probesets
>> out. However, I do encounter some problem when I need to take not
>> only some probe sets, but also some probes (but not the whole probe
>> set) out, maybe because I did not provide the correct format of the
>> probes.
>>
>> (I assume you are familiar with the content in the script provided
>> in the link).
>>
>> If I randomly take out 2000 probe sets from U133A, maskedprobeSets
>> = rownames(MAS5_matrix)[sample(1:22283,2000)]
>> RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobeSets,
>> cleancdf)
>>
>> It works fine and whatever affyBatch object read using the cleancdf
>> has a reduced dimension.
>>
>> However, if I do
>>
>> maskedprobeSets = rownames(MAS5_matrix)[sample(1:22283,2000)]
>> maskedprobes = rownames(pm(A))[1:2000]
>
> Assuming that 'A' is an AffyBatch, what you will get back from that
> call to rownames is a bunch of numbers in character format.
>
> An example using the Dilution dataset:
>
>> rownames(pm(Dilution))[1:10]
> [1] "175218" "356689" "227696" "237919" "275173" "203444" "357984"
> "368524" [9] "285352" "304510"
>
> Which you can see is not very useful. What you want are the probeset
> IDs, along with an appended number (which is equal to the position
> of the probe in the probeset).
>
> Now, say we are concerned about the "100_g_at" probeset in the
> Dilution dataset:
>
>> pm(Dilution, "100_g_at")
> 20A   20B    10A   10B 100_g_at1   221.3 146.3  192.0 116.0 100_g_at2
> 685.0 479.0  493.0 328.3 100_g_at3  1126.3 724.3  849.0 498.3
> 100_g_at4   205.0 126.5  136.0  97.0 100_g_at5   580.8 341.8  374.0
> 226.0 100_g_at6   161.3 109.5  139.0  92.3 100_g_at7  1645.3 992.3
> 1006.8 670.0 100_g_at8   624.0 348.0  336.3 224.5 100_g_at9   274.0
> 156.0  203.8 119.0 100_g_at10  240.0 156.3  223.0 122.0 100_g_at11
> 438.0 278.3  362.5 198.0 100_g_at12  554.0 334.8  421.5 220.0
> 100_g_at13  235.0 148.0  151.0 107.5 100_g_at14  571.3 415.0  508.0
> 271.0 100_g_at15  904.0 562.0  689.0 330.0 100_g_at16  141.0  93.0
> 113.5  75.5
>
> And we don't like the third and seventh probes. We could use
>
>> rownames(pm(Dilution, "100_g_at"))[c(3,7)]
> [1] "100_g_at3" "100_g_at7"
>
> And feed that into RemoveProbes(), which will then work.
>
> Best,
>
> Jim
>
>
>
>> RemoveProbes(listOutProbes=maskedprobes,
>> listOutProbeSets=maskedprobeSets, cleancdf)
>>
>> The error msg shows as: Error in get(pset[i], env =
>> get(cdfpackagename)) : object '315997at' not found
>>
>> Do you know what is the correct format of the input for the probes
>> (not probe sets) to be taken out?
>>
>>
>>
>> Thanks a lot,
>>
>>
>> -James
>>
>>
>> --- On Mon, 8/30/10, James W. MacDonald<jmacdon at med.umich.edu>
>> wrote:
>>
>> From: James W. MacDonald<jmacdon at med.umich.edu> Subject: Re: [BioC]
>> question regarding MAS5 normalization with reduced probes To:
>> "James Anderson"<janderson_net at yahoo.com> Cc:
>> "bioconductor"<bioconductor at stat.math.ethz.ch> Date: Monday, August
>> 30, 2010, 12:25 PM
>>
>> Hi James,
>>
>> I misunderstood your question. I thought you already had a reduced
>> set of probes you wanted to run mas5() on.
>>
>> So yeah, if you want to use a reduced set of probes you could use
>> some code written by Ariel Chernomoretz (and modified by Jenny
>> Drnevitch) that has been posted and referenced many times on this
>> list:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>>
>>
>>
Alternatively, you could play with the affxparser package, which has the
>> capability (IIRC) to do the same.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 8/30/2010 10:29 AM, James Anderson wrote:
>>> Hi Jim,
>>>
>>> Thanks for your email. I've run mas5 before, but only using
>>> default setting. From the help, it does not look like there is a
>>> way to specify which reduced set of probes you can use. In
>>> addition, from the file, it looks like it has more to do with
>>> whether the "object" is read using a reduced set of probes. (I
>>> believe if the "object" is read using only the reduced set, mas5
>>> will do the job), so don't know whether it has more to do with
>>> the function ReadAffy, but from that, it does not look like it
>>> has the option of specifying which reduced set of probes, if we
>>> don't use alternative CDF file. Below is the usage of mas5
>>> function. mas5(object, normalize = TRUE, sc = 500, analysis =
>>> "absolute", ...) Thanks,
>>>
>>> -James
>>>
>>> --- On Fri, 8/27/10, James W. MacDonald<jmacdon at med.umich.edu>
>>> wrote:
>>>
>>> From: James W. MacDonald<jmacdon at med.umich.edu>   Subject: Re:
>>> [BioC] question regarding MAS5 normalization with reduced probes
>>> To: "James Anderson"<janderson_net at yahoo.com>   Cc:
>>> "bioconductor"<bioconductor at stat.math.ethz.ch>   Date: Friday,
>>> August 27, 2010, 10:04 AM
>>>
>>> Hi James,
>>>
>>> On 8/26/2010 1:05 PM, James Anderson wrote:
>>>> Hi,
>>>>
>>>> I am trying to use MAS5 to normalize some cel files with
>>>> reduced set of probes (some probes whose PM is not
>>>> significantly higher than MM is filtered), does anyone know how
>>>> to do this? Does that require creating a new CDF file?
>>>
>>> Have you tried running mas5() from the affy package? Having
>>> never tried, I don't know, but it seems a simple enough test.
>>>
>>> If you do need to create a new cdf, you will want to use the
>>> affxparser package.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> thanks a bunch,
>>>>
>>>> -James
>>>>
>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________ Bioconductor
>>>> mailing list Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>> archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>>
>>>>
_______________________________________________
>> Bioconductor mailing list Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


More information about the Bioconductor mailing list