[BioC] Understanding the randomness of Biomart

Nathan Harmston iwanttobeabadger at googlemail.com
Mon Aug 18 00:15:59 CEST 2008


Hi,

I basically want to return a list of probe, hugo identifier (or ""). I
have just tried the line of code you suggested and I get the following
error:

Error in values[[i]] : subscript out of bounds

So I'm afraid it doesnt work. Any ideas what the change to it should
be? I tried:

getBM(c("affy_hg_u133_plus_2", "hgnc_symbol"), filters =
c("chromosome_name", "start", "end", "with_affy_hg_u133_plus_2"),
values = list(9, 19198907, 19357826), mart = ensembl)

getBM(c("affy_hg_u133_plus_2", "hgnc_symbol"), filters =
c("chromosome_name", "start", "end", "affy_hg_u133_plus_2"), values =
list(9, 19198907, 19357826), mart = ensembl)

Any ideas? I couldn't find a description of how to change this
behaviour in the vignette.

Nathan


2008/8/17 Stephen Henderson <to.stephen.henderson at googlemail.com>:
> Hi
> getBM is returning all genes  (or transcripts??) within that area (i.e.
> filters =...) including some that do not have affy probes for them. If you
> wanted to see them then you would have to put more 'attributes' into the
> getBM function e.g.
> getBM(attributes = c("affy_hg_u133_plus_2", "entrezgene"),...
> Alternatively if you wanted only those transcripts with affy ids then you
> would need to specify this in the filters e.g.
> fetched = getBM(c("affy_hg_u133_plus_2", "hgnc_symbol"), filters =
> c("chromosome_name", "start", "end", "affy_hg_u133_plus_2"), values =
> list(as.numeric("9"),
> 19198907, 19357826), mart = ensembl)
> The repeats are inherent to the ensembl database and arise for many reasons.
> Stephen
>
>
>
> On 17 Aug 2008, at 20:23, Nathan Harmston wrote:
>
> Hi everyone,
>
> I have been playing with the biomaRt package a bit more and I am
> trying to work out what is going on here:
>
> ensembl = useMart("ensembl_mart_47", dataset =
> "hsapiens_gene_ensembl", archive = TRUE)
>
> fetched = getBM(c("affy_hg_u133_plus_2", "hgnc_symbol"), filters =
> c("chromosome_name", "start", "end"), values = list(as.numeric("9"),
> 19198907, 19357826), mart = ensembl)
>  affy_hg_u133_plus_2 hgnc_symbol
> 1           226867_at
> 2         205684_s_at
> 3           226867_at     DENND4C
> 4         205684_s_at     DENND4C
> 5           234968_at
> 6           234968_at     DENND4C
>
> fetched = getBM(c("affy_hg_u133_plus_2", "hgnc_symbol"), filters =
> c("chromosome_name", "start", "end"), values = list(as.numeric("9"),
> 33925736, 34088257), mart = ensembl)
>
>  affy_hg_u133_plus_2 hgnc_symbol
> 1
> 2           224789_at
> 3           224789_at      WDR40A
>
> I cannot understand why I am getting 2 rows for some probesets one
> containing a hugo identifier and the other not? And whether there is
> any relevance to this result ( probeset  234968_at ) and why I have
> some results which don't show any probeset at all? Is there a specific
> reason for this or is this just a something that needs to be post
> filtered?
>
> Many thanks in advance.
>
> Nathan
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list