[BioC] biomaRt manual

Thu Mar 29 13:45:25 CEST 2007

Here is another question:
> length(unique(ids2))
[1] 12558
> length(ids2)
[1] 12558
> head(ids2)
[1] "31307_at"   "31308_at"   "31309_r_at" "31310_at"   "31311_at"
[6] "31312_at"
> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=(ids2), mart=human)
> dim(t1)
[1] 26360     2
> t1[1:20,]
   affy_hg_u95a entrezgene
1      32864_at       6736
2      32864_at       6736
3      41214_at       6192
4      41214_at       6192
5      31534_at       7544
6      31534_at       7544
7      36367_at      83259
8      36367_at      83259
9      36367_at      83259
10     36367_at      83259
11      1199_at         NA
12   35929_s_at      64591
13   35929_s_at      64591
14   35929_s_at         NA

Please look at line 12-14.
Why are there so many duplications? Why is there some inconsistency
between line12-14?

Thanks for the previous prompt replies from every "hardworking"
people. I am now at China and it should be about 6am at US.

Cheers,

Weiwei

On 3/29/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Thursday 29 March 2007 07:28, James W. MacDonald wrote:
> > Hi Weiwei,
> >
> > Weiwei Shi wrote:
> > > Sorry :) when I am composing the following email, I did not realize
> > > there are a couple of replies now. I read the manual carefully but I
> > > am still having some questions like this:
> > >
> > > For example,
> > >
> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a",
> > >> values=head(ids2), mart=human)
> > >
> > >   affy_hg_u95a entrezgene
> > > 1     31308_at         NA
> > > 2     31310_at       2741
> > > 3     31312_at       9312
> > >
> > >>head(ids2)
> > >
> > > [1] "31307_at"   "31308_at"   "31309_r_at" "31310_at"   "31311_at"
> > > [6] "31312_at"
> > >
> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a",
> > >> values="31307_at", mart=human)
> > >
> > > NULL
> > >
> > > I am confused by "NULL" and "NA". I am wondering about the difference b/w
> > > them.
> >
> > Steffen Durinck will know better, but I believe NULL means that Ensembl
> > doesn't think that probeset maps to anything (e.g., there is nothing
> > available), and NA means that there is no Entrez Gene ID for that probeset.
> >
> > For instance, if you pull the Entrez Gene ID for 31307_at from the
> > hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene
> > for that ID it says it has been discontinued.
> >
> > > Another question is how to make >8000 queries faster though I read
> > > some from previous posts.
>
> Make sure that you really need to make 8000 queries.  It is much faster to
> make one or a few large queries than to make many small ones.
>
> Sean
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III