[BioC] hs133phsentrezg metadata

Tue Oct 17 22:31:50 CEST 2006

Hi Manhong,

OK, I understand that part. However, for most of the annotation data 
(including the chromosomal location), what is normally supplied is the 
information at the gene level, rather than the probe level. I guess one 
could argue that knowing where exactly the probesets are supposed to 
bind might be of interest, but the annotation packages are intended to 
annotate probesets to genes.

While it is true that some of the probes might bind to different parts 
of the genome, this can be handled by supplying multiple locations. For 
instance, in the hgu133plus2 package we have:

 > get("1007_s_at", hgu133plus2CHRLOC)
 > get("1007_s_at", hgu133plus2CHRLOC)
6_qbl_hap2          6 6_cox_hap1 6_qbl_hap2 6_cox_hap1
    2098794   30959839    2300465    2099260    2300931
          6 6_cox_hap1          6 6_qbl_hap2
   30960305    2305069   30964443    2103398

Best,

Jim

Manhong Dai wrote:
> Hi Jim,
> 
> 	In our custom cdf, some hits<1 probes would be used. For example, when
> a probe has a hit with an allele of a snp, and the snp's another allele
> has hits=1 match with genome, although the probe has no hit with genome
> at all, we would use this probe and its genome location as a candidate
> for all custom CDFs, although the portion of this kind of probes is
> small.
> 
> 
> 	Our UG and ENTREZG custom CDF does have a rule that each probe must
> only hit one genome location and one UG cluster.
> 
> 
> 	But in REFSEQ custom cdf, when a probe has match to a REFSEQ sequence,
> but no match to genome at all. The probe would still be used because
> REFSEQ is more reliable than genome.
> 
> 	For example, probe 4 of
> http://arrayanalysis.mbni.med.umich.edu/ps/ps_pb.jsp?p=NM_000019_at&c=Hs133P_Hs_REFSEQ_8  has no match to genome.
> 
> 
> Best,
> Manhong Dai
> 
> 
> 	
> On Tue, 2006-10-17 at 14:46 -0400, James W. MacDonald wrote:
> 
>>Hi Manhong,
>>
>>Manhong Dai wrote:
>>
>>>Hi An,
>>>
>>>	Our custom CDF annotation package has only gene name for each probeset
>>>because we designed it this way.
>>>
>>>	A probeset's probes could have matches on different location or
>>>chromosomes, even some probes have no match on genome at all, but they
>>>belong to this probeset because they all have perfect match on the
>>>gene's sequence.
>>
>>This doesn't make sense to me. How can a probe not match to the genome, 
>>yet have a perfect match to a gene's sequence?
>>
>>I was also under the impression that the matching for the probes that 
>>remain in an MBNI cdf was first done to the genome, and those probes 
>>that didn't blast to the genome were discarded. From
>>
>>http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/cdfreadme.htm
>>
>>I get:
>>
>>A probe must only hit one UniGene cluster and one genomic location
>>
>>A probe must hit only one genomic location
>>
>>Does this mean a probe that hits < 1 genomic location will be included? 
>>I assumed this meant a probe had to hit exactly one location.
>>
>>Best,
>>
>>Jim
>>
>>
>>

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.