[BioC] BioMart and Ensembl questions !!!

Paul Leo p.leo at uq.edu.au
Mon Sep 21 16:23:13 CEST 2009


HI Rhoda, 
Yes a different version is probably it . There is STILL something
wrong, based on your suggestions:

library(biomaRt)
listMarts(host="may2009.archive.ensembl.org",path="/biomart/martservice",archive=TRUE)
mart=useMart("ensembl_mart_51", dataset="hsapiens_gene_ensembl",
host="may2009.archive.ensembl.org",path="/biomart/martservice",archive=TRUE)

works BUT queries then fail:

ann<-getBM(attributes =
c( "ensembl_gene_id","external_gene_id","chromosome_name","start_position","end_position","strand","hgnc_symbol","gene_biotype"), filters = a.filter, values=fil.vals, mart = mart)
> ann
[1] ensembl_gene_id  external_gene_id chromosome_name  start_position  
[5] end_position     strand           hgnc_symbol     
<0 rows> (or 0-length row.names)


> a.filter
[1] "chromosome_name" "start"           "end"            
> fil.vals
[[1]]
[1] NA

[[2]]
[1] 67325000

[[3]]
[1] 67620000


I will try again tomorrow... it's late  at night in Australia....



-----Original Message-----
From: Rhoda Kinsella <rhoda at ebi.ac.uk>
To: Paul Leo <p.leo at uq.edu.au>
Cc: bioconductor <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] BioMart and Ensembl questions !!!
Date: Mon, 21 Sep 2009 15:10:42 +0100

Hi Paul
I'm not really sure why you get this error... I am using the following  
version:

 > sessionInfo()
R version 2.8.0 (2008-10-20)
i386-apple-darwin8.11.1

locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_1.16.0

loaded via a namespace (and not attached):
[1] RCurl_0.92-0 XML_1.98-1

Does anyone know why Paul is getting this error?
Regards,
Rhoda


On 21 Sep 2009, at 14:53, Paul Leo wrote:

> HI Rhoda ,
> Thanks that seems exactly like I want but .. but it does not work for
> me...
>
>  library(biomaRt)
>> listMarts(host="nov2008.archive.ensembl.org/biomart/martservice")
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Entity 'copy' not defined
> Entity 'nbsp' not defined
> Entity 'nbsp' not defined
> Error in names(x) <- value :
>  'names' attribute [2] must be the same length as the vector [0]
>>
>
>
>
>
> http://www.ensembl.org/info/website/archives/
>
>
> Once you are there, click on the release you would like to look at and
> then on the biomart button. This will give you the
> URI you need to use
> in the biomaRt package to get access to that archive. For example  
> the release 51 archive biomart is
> available at:
>
>
> http://nov2008.archive.ensembl.org/biomart/martview/
>
>
> If you then
> plug this into biomart you can get access to the information you  
> require:
>
>
>> library(biomaRt)
>> listMarts(host="may2009.archive.ensembl.org/biomart/martservice")
>               biomart              version
> 1 ENSEMBL_MART_ENSEMBL           Ensembl 54
> 2     ENSEMBL_MART_SNP Ensembl Variation 54
> 3    ENSEMBL_MART_VEGA              Vega 35
> 4             REACTOME   Reactome(CSHL US)
> 5     wormbase_current   WormBase (CSHL US)
> 6                pride       PRIDE (EBI UK)
>> mart=useMart("ENSEMBL_MART_ENSEMBL",
> host="may2009.archive.ensembl.org/biomart/martservice")
>
>
> etc....
>
>
> I hope that helps,
> Regards,
> Rhoda
>
>
>
>
>
>
> On 21 Sep 2009, at 14:25, Paul Leo wrote:
>
>> Wow that is fairly terrible , I was surprised this thread was not
>> followed... did I miss something?
>>
>> You can't access hg18 via BioMART only CRCh37!!
>>
>> 1)listMarts(archive=TRUE)   # shows mart back to 43 are there
>>
>> I'll start tracking back
>>
>>
>> 2)mart<-
>> useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive
>> ### WORKS FINE but is CRCh37
>>
>> 3)mart<-
>> useMart 
>> ("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>
>> Error in value[[3L]](cond) :
>> Request to BioMart web service failed. Verify if you are still
>> connected to the internet.  Alternatively the BioMart web service is
>> temporarily down.
>> In addition: Warning message:
>> In file(file, "r") : unable to resolve 'july2008.archive.ensembl.org'
>>> #####  THAT's JUST BAD !
>>
>> 4)mart<-
>> useMart 
>> ("ensembl_mart_49",dataset="hsapiens_gene_ensembl",archive=TRUE)
>> Checking attributes ... ok
>> Checking filters ... ok
>> Warning message:
>> In bmAttrFilt("filters", mart) :
>> biomaRt warning: looks like we're connecting to an older version of
>> BioMart suite. Some biomaRt functions might not work.
>>
>> . ### works but that is NCBI36 but the attributes have old
>> descriptions
>> but may work for you (and me)
>>
>>
>>
>> I think 'july2008.archive.ensembl.org'  SHOULD BE
>> 'jul2008.archive.ensembl.org'
>> (three letter month name)
>>
>> Anyway to fix that?
>>
>> Cheers
>> Paul
>>
>> NOTE also broken in production version 2.9.2 I think
>>
>>> sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-09-20 r49770)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>> [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> base
>>
>> other attached packages:
>> [1] biomaRt_2.1.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.2-0 XML_2.6-0
>>
>> -----Original Message-----
>> From: jiayu wen <jiayu.jwen at gmail.com>
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] BioMart and Ensembl questions
>> Date: Tue, 1 Sep 2009 09:11:09 +0200
>>
>>
>> Dear list,
>>
>> About over a year ago, I extracted 3'UTR sequences for about 7000
>> genes using Biomart for my project. This is the command that I used:
>>
>> (my gene_list is in gene symbol)
>>> my_mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>> seq_3utr = getSequence(id = unique(gene.symbol),
>> type="hgnc_symbol",seqType="3utr",mart = my_mart)
>>> seq_3utr = seq_3utr[seq_3utr[,"3utr"] != "Sequence unavailable",]
>>> here: extract longest 3'UTR for each unique gene symbol
>>> exportFASTA(seq_3utr, file=paste("s3utr.fa",sep=""))
>>
>> As my project goes, I now need 3'UTR genomic coordinates to get
>> phastcons conservation for some regions in 3'UTR.
>> To do that, I first convert hgnc_symbol back to ensembl_gene_id, then
>>
>> get 3'UTR coordinates using getBM like this:
>>
>>> s3utr = read.DNAStringSet(paste("s3utr.fa",sep=""),format="fasta")
>>> gene_names = names(s3utr)
>>> hgnc2ensembl  =
>>> getBM(attributes=c("hgnc_symbol","ensembl_gene_id"),
>> filters="hgnc_symbol", values=gene_names, mart=my_mart)
>>> s3utr_pos  = getBM(attributes=c("ensembl_gene_id",
>> "chromosome_name","strand","3_utr_start", "3_utr_end"),
>> filters="ensembl_gene_id", values=as.character(hgnc2ensembl
>> $ensembl_gene_id), mart=my_mart)
>>> s3utr_pos = s3utr_pos[complete.cases(s3utr_pos),]
>>
>> By doing that, now I can only get about 5000 gene symbols with 3'UTR
>> coordinates (converting from hgnc_symbol back to ensembl_gene_id
>> looses about 250 genes). I was thinking it might be version
>> difference? So I tried to use ensembl archive but it gives me error
>> as
>> below:
>>
>>> my_mart =
>> useMart("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=T)
>> Error in value[[3L]](cond) :
>>  Request to BioMart web service failed. Verify if you are still
>> connected to the internet.  Alternatively the BioMart web service is
>> temporarily down.
>> In addition: Warning message:
>> In file(file, "r") : cannot open: HTTP status was '404 Not Found'
>>
>> Is there anyway that I can get 3'UTR coordinates for all my gene  
>> list?
>>
>> Thanks for any help.
>>
>> Jean
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



More information about the Bioconductor mailing list