[BioC] from using biomaRt and r10kcod

Weiwei Shi helprhelp at gmail.com
Tue May 15 07:43:00 CEST 2007


Hi,

I just checked again about this 502674 and found it is a rat gene id.
Ooops....But I remember I saw it were shown as human one last
afternoon, not once. So, I pointed that question out. Probably it was
due to some web explorer's problem: I am using Safari on Mac; or I
must have been "drunk", :)

sorry about that,

Weiwei

On 5/15/07, Diego Diez <diez at kuicr.kyoto-u.ac.jp> wrote:
> Hi Weiwei and James,
>
> (sorry Weiwei, as I sent this email the first time only to you when
> my intention was to send it to the list too).
>
>
> On May 15, 2007, at 5:29 AM, Weiwei Shi wrote:
> > Hi, there:
> >
> > I happened to re-address this question of codelink probe id to human
> > entrezgene id. I describe my question using an example:
> >
> > by using r10kcod package, you can find probe "GE16490" mapped to
> > "502674", which I assume it is rat entrezgene id. However, when I use
> > biomaRt to convert all rat entrezgene id in this array to human ones,
> > I found the following maps involving 502674:
> >
> >          id MappedID rat.count human.count
> > 4167 296197    11034         1           2
> > 7021 502674    11034         1           2
> >
>
> I'm not too familiar with the biomaRt package but I guess that this
> result what is telling you is that you have two rat entrez id's
> 296197 and 502674 (each appearing only once), which map to one human
> entrez id 11034 (appearing twice, one time for each rat id).
>
> > so, basically, 296197, 502674 and 11034 are all associated with
> > protein "destrin". To be accurate, 296197 is a rat protein which is
> > similar to destrin.
> >
> > However, as shown in
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene
> > , the other two (11034 and *502674*) are human ids (if I am wrong
> > here, please correct me).
> >
>
> Well, for me searching 502674 using Entrez Gene comes up a link to
> the Destrin rat gene:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
> db=gene&cmd=search&term=502674
>
> clicking on this entry I can see the information about the Dstn
> (destrin) gene. In the bottom of the page there are mappings to
> different sequences (Related sequences). One is CB785830.1 and the
> other CF111187.1 The later one is the one used in r10kcod to map from
> Codelink probe to Genbank,
>
> GE16490 -> CF111187.1
>
> and then, this is used to map to Entrez Gene, if and understand a
> little how AnnBuilder works (that may not be the case). Of course, I
> use also the  mappings provided from the manufacturer from probe ids
> to Entrez Gene and Unigene but for this particular probe, there is no
> such mapping in the current mappings provided (last updated March 31,
> 2006 so they are pretty old).
>
> In fact, in those files, there is also the information about
> homologues in the other two organisms (from human, mouse and rat) and
> in the human probes that map to Entrez Gene 11034 I can find that
> they map to rat Entrez Gene 502674, in agreement with the biomaRt
> results.
>
> > so my questions are:
> >
> > 1. whether 502674 is a rat entrezgene id or human one?
> >
>
> I would definitely say that it is a rat id.
>
> > 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i
> > assume the last one :)
> >
>
> neither are wrong from my point of view, but let first see if we are
> seeing the same thing when we look for 502674 in Entrez Gene.
>
> > 3. i found many many-2-many maps in this process of rat to human
> > entrezgene ids. Like the following:
> >
> >> t0[t0[,1]== 396527,]
> >>
> >          id MappedID rat.count human.count
> > 6608 396527    54576         9           4
> > 6609 396527    54575         9           4
> > 6610 396527    54600         9           4
> > 6611 396527    54577         9           4
> > 6612 396527    54578         9           4
> > 6613 396527    54579         9           4
> > 6614 396527    54657         9           4
> > 6615 396527    54659         9           4
> > 6616 396527    54658         9           4
> >
> >> t0[t0[,2]== 54576,]
> >>
> >          id MappedID rat.count human.count
> > 2494 113992    54576         9           4
> > 6608 396527    54576         9           4
> > 6617 396551    54576         9           4
> > 6626 396552    54576         9           4
> >
> >> t0[t0[,2]== 54577,]
> >>
> >          id MappedID rat.count human.count
> > 2497 113992    54577         9           4
> > 6611 396527    54577         9           4
> > 6620 396551    54577         9           4
> > 6629 396552    54577         9           4
> >
> > so, basically all the ids are related to different polypeptides
> > associated with UDP glucuronosyltransferase 1 family. Are there some
> > other situations causing this many2many mappings?
> >
> >
>
> As for this, James has already answered (thanks for that). The probes
> are 30 base pair long, so it is not strange, but on the contrary,
> very common to find those probes mapping to multiple genes that can
> have related or unrelated functions. Is less common in the Codelink
> arrays to have multiple probes mapping to the same gene, but again,
> you can have multiple probes mapping to different Genbank ids that
> correspond to the same Entrez Gene identifier. The fact that you can
> have different paralogues and orthologues sequences and even
> sometimes unrelated sequences sharing the same piece of 30 base pair
> oligonucleotides makes this a very complex problem with no easy
> solution.
>
> Regards,
>
> Diego.
>
> -----------------------------------------------
>   Diego Diez, PhD.
>
>   Bioknowledge systems, Kanehisa lab.
>   Bioinformatics center,
>   Institute for Chemical Research,
>   Kyoto University.
>   Gokasho, Uji, Kyoto 611-0011 JAPAN.
>
>   e-mail:  diez at kuicr.kyoto-u.ac.jp
>   url:     http://web.kuicr.kyoto-u.ac.jp/~diez
>   tlf:     +81-774-38-3296
>   fax:     +81-774-38-3269
> -----------------------------------------------
>
>
>
> > Sorry for the long questions,
> >
> > Regards,
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/
> > gmane.science.biology.informatics.conductor
> >
>
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the Bioconductor mailing list