[BioC] biomaRt problem

Wolfgang Huber whuber at embl.de
Tue Aug 3 23:21:40 CEST 2010


Hi Anupam Singha

as always, following the posting guide (output sessionInfo(), and a 
reproducible example that does not depend on a private file that exists 
only on your computer) would be useful.

Your getBM query seems incomplete, since you do specify an argument for 
'values', but not for 'filters'. So, in effect, your values are ignored 
and no filtering is performed - the query is made on all ~51,000 genes 
in the dataset.

Third, why the attribute "hsapiens_dn" is NA for all genes in the 
dataset is a question I need to pass to someone more familiar with this 
particular dataset - I will forward it to the Ensembl helpdesk.

Here's a code example:

#----------------
library("biomaRt")
mart = useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")

filters = listFilters(mart)
attrs = listAttributes(mart)

print("hsapiens_dn" %in% filters$name)
print("hsapiens_dn" %in% attrs$name)

res = getBM(attributes = c("ensembl_gene_id", "hsapiens_dn"),
                  mart = mart)

print(table(is.na(res$hsapiens_dn)))
print(sessionInfo())
#----------------

and its output

#----------------
[1] FALSE
[1] TRUE

  TRUE
51726

R version 2.12.0 Under development (unstable) (2010-08-02 r52661)
Platform: x86_64-apple-darwin10.4.0 (64-bit)

locale:
[1] C/C/C/C/C/it_IT

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] biomaRt_2.5.1  fortunes_1.3-7

loaded via a namespace (and not attached):
[1] RCurl_1.4-3 XML_3.1-0
#----------------

	Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber




On Jul/27/10 3:51 PM, anupam sinha wrote:
> Hi all,
>             I am trying to download dN/dS values for human genes from
> ensemble using biomaRt. I have used codes from the website :
>
> http://www.r-bloggers.com/biomart-and-biomart/
>
>> library("biomaRt")
>> mart<- useMart(biomart="ensembl", dataset="hsapiens_gene_
> ensembl")
>> genes<- read.csv("file.txt") (this file contains hgnc gene symbols for
> Homo sapiens)
>> results<- getBM(attributes = c("ensembl_gene_id","hsapiens_dn"),values =
> genes$hsapiens_dn, mart = mart)
>
> But all the values of hsapiens_dn are shown to be "NA". The first few lines
> of the output
>
>> head(results,10)
>     ensembl_gene_id hsapiens_dn
> 1  ENSG00000215781          NA
> 2  ENSG00000243259          NA
> 3  ENSG00000225566          NA
> 4  ENSG00000189096          NA
> 5  ENSG00000215750          NA
> 6  ENSG00000212884          NA
> 7  ENSG00000212886          NA
> 8  ENSG00000229617          NA
> 9  ENSG00000241176          NA
> 10 ENSG00000215705          NA
>
> Can anyone please tell me where am I going wrong ? . Thanks in advance for
> any suggestions
>



More information about the Bioconductor mailing list