[BioC] Annotating HGU133plus2 genes with number of coding changes

Thu Apr 19 15:43:23 CEST 2007

On Thursday 19 April 2007 09:33, marco zucchelli wrote:
> Hi Steffen,
>
>  one more question: In the example i reported before seems like some probes
> are reported twice,
> i.e. 207893_at is listed 2 times matched to the same gene ID. Totally the
> "probes" vector contains the probes from hgu133plus2 (54675) but the query
> returns 66565 rows.
>
> I do not understand really the meaning of this ..
>
> Regards
>
> Marco
>
> probe.list       <-
> getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filters="affy_h
>g_u133_plus_2", values=probes, mart=mart)
>
> head(probes.list)
>
>   ensembl_gene_id affy_hg_u133_plus_2
> 1 ENSG00000184895           207893_at
> 2 ENSG00000184895           207893_at
> 3 ENSG00000129824           201909_at
> 4 ENSG00000129824           201909_at
> 5 ENSG00000067646         207247_s_at
> 6 ENSG00000067646           207246_at
>
> On 4/3/07, Steffen Durinck <durincks at mail.nih.gov> wrote:
> > Hi Marco,
> >
> > It matches the transcripts and then maps those transcripts to the genes,
> > even if you don't include the transcript id in the query.
> > To see this you could set attributes =
> > c("ensembl_gene_id","ensembl_transcript_id","affy_hg_u133_plus_2") in
> > your query.  Also if Ensembl didn't find a match for the affy probe then
> > it won't be included in the output and if they find multiple matches
> > then all of them will be returned.

Marco,

Try the suggestion that Steffen gave above (setting the attributes to include 
the transcript).  The mapping is NOT done to the gene, but to the transcript, 
and there may be multiple transcripts for the same gene, each of which may be 
mapped to one or more affy_ids.  

Sean