[BioC] Affy chip annotation changes

James W. MacDonald jmacdon at med.umich.edu
Tue Nov 2 15:46:44 CET 2010


Hi Yuan,

On 11/2/2010 9:27 AM, Yuan Hao wrote:
> Hi Jim,
>
> Very helpful! Thanks very much! However, "244194_at" still points to
> "ADAM22" on NetAFFX. As these two probe sets are informative in terms of
> my data set, I prefer keeping "244194_at" to dropping it. Would it be a
> problem even it maps to the 3'end of the gene?

I wouldn't presume to tell you how to do your analysis.

But I would note that this probeset doesn't map _to_ the 3' end (all of 
these probesets are supposed to map to the 3' UTR or possibly the first 
exon if the UTR is really short). Instead, according to hg18, and 
presumably hg19, the probeset maps to the intergenic region just off the 
start of the 3' UTR, so shouldn't measure anything.

However, if you are getting signal consistent with the other probesets 
that are intended to measure ADAM22, and you want to use multiple 
probesets per gene, then it may be reasonable to include it.

Best,

Jim


>
> Cheers,
> Yuan
>
> On 2 Nov 2010, at 13:08, James W. MacDonald wrote:
>
>>
>>
>> On 11/1/2010 5:18 PM, Mark Cowley wrote:
>>> Hi Yuan, what does 244194_at map to in the annotation csv file from
>>> netaffx? There may be some discrepancy between the netaffx search
>>> engine& the netaffx csv files.
>>>
>>> in these situations, I like to convince myself that affy's doing the
>>> right thing. Like Jim said, you could investigate further using
>>> various R tools listed below, or, if you haven't used those tools
>>> before, then for about 10 minutes work, you can align the 25mer
>>> probes for each probeset (which you can get from netaffx) to the
>>> genome using BLAT at the UCSC genome browser. This will tell you
>>> whether the probes will target ADAM22 or not.
>>
>> I have a little function that I use to do this sort of thing:
>>
>> blatGene <- function(affyid, probe, filename){
>> ## affyid == Affy probeset ID
>> ## probe == BioC probe package name
>> ## filename == output file name
>> require(probe, quietly = TRUE, character.only = TRUE)
>> tmp <- data.frame(get(probe))
>> if(length(affyid) > 1){
>> seqnc <- vector()
>> for(i in seq(along = affyid))
>> seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
>> }else{
>> seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
>> }
>> out <- vector()
>> if(length(seqnc) > 25) warning("Blat will only return values for 25 or
>> fewer sequences!",
>> call. = FALSE)
>> for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste("> Probe",
>> i, sep=""), seqnc[i]))
>> write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE,
>> col.names=FALSE)
>> }
>>
>> I used this to look at the two probesets that apparently no longer
>> target ADAM22, and for hg18 both probesets are off the 3' end of the
>> gene.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> HTH Mark
>>>
>>> On 02/11/2010, at 1:42 AM, Yuan Hao wrote:
>>>
>>>> Hi Jim and Martin,
>>>>
>>>> Thank you very much for your reply! I have a feeling that this
>>>> question is too vague to be answered unless down to the very end of
>>>> chip design. However, just for you information. I actually checked
>>>> these two missing probe sets on NetAFFX from the Affymetrix
>>>> website, and one of them, "244194_at", seems still linking to
>>>> ADAM22, while another is not any more.
>>>>
>>>> Regards, Yuan
>>>>
>>>> I actually checked on NetAffx from the affymetrix website for two
>>>> missed probe sets corresponding to ADAM22. "2 On 1 Nov 2010, at
>>>> 14:05, James W. MacDonald wrote:
>>>>
>>>>> Hi Yuan,
>>>>>
>>>>> The annotation packages we provide are simply a re-packaging of
>>>>> data that we get from Affymetrix, so the simple answer is that
>>>>> Affy decided that two of those probesets don't really interrogate
>>>>> the transcript for that gene.
>>>>>
>>>>> You could investigate this further by either going to the
>>>>> Affymetrix website and downloading the most recent annotation csv
>>>>> file, and the csv file that was contemporaneous with the older
>>>>> annotation package (2.2.11), and comparing the two. This will
>>>>> likely not be that enlightening, as I would bet the annotation
>>>>> simply changes for the two probesets that no longer point to
>>>>> ADAM22.
>>>>>
>>>>> You could also investigate further using e.g., the rtracklayer,
>>>>> Biostrings, and BSgenome.Hsapiens.UCSC.hg17
>>>>> (BSgenome.Hsapiens.UCSC.hg19) packages.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>> On 11/1/2010 7:14 AM, Yuan Hao wrote:
>>>>>> Dear list,
>>>>>>
>>>>>> I've downloaded two versions of hgu133plus2 array annotation
>>>>>> package: v2.2.11(R 2.9.2) and v2.4.5 (R 2.12.0). Some probe
>>>>>> sets annotated in the former version are no longer supported in
>>>>>> the recent version. For example, "ADAM22" was annotated by
>>>>>> eight probe sets ("1555024_at" "206615_s_at" "206616_s_at"
>>>>>> "208226_x_at" "208227_x_at" "208237_x_at" "213411_at"
>>>>>> "244194_at"), which have been reduced to six in the latest
>>>>>> annotation (the last two probe sets have been missing). I would
>>>>>> be grateful if someone could provide some insight about it,
>>>>>> especially for this case!
>>>>>>
>>>>>> Regards, Yuan
>>>>>>
>>>>>> _______________________________________________ Bioconductor
>>>>>> mailing list Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>>>> archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>>>
>> --
>>>>> James W. MacDonald, M.S. Biostatistician Douglas Lab University
>>>>> of Michigan Department of Human Genetics 5912 Buhl 1241 E.
>>>>> Catherine St. Ann Arbor MI 48109-5618 734-615-7826
>>>>> **********************************************************
>>>>> Electronic Mail is not secure, may not be read every day, and
>>>>> should not be used for urgent or sensitive issues
>>>>
>>>> _______________________________________________ Bioconductor
>>>> mailing list Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>> archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should
>> not be used for urgent or sensitive issues
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list