[BioC] BioMart and Ensembl questions !!!

Rhoda Kinsella rhoda at ebi.ac.uk
Mon Sep 21 16:35:52 CEST 2009


Hi Paul,
It looks like you are using an unstable version of biomaRt (R version  
2.10.0 Under development (unstable) (2009-09-20 r49770))
so can you try this with the 2.9.0 version and see if that works? Let  
me know how you get on.
Regards,
Rhoda

On 21 Sep 2009, at 15:23, Paul Leo wrote:

> HI Rhoda,
> Yes a different version is probably it . There is STILL something
> wrong, based on your suggestions:
>
> library(biomaRt)
> listMarts(host="may2009.archive.ensembl.org",path="/biomart/ 
> martservice",archive=TRUE)
> mart=useMart("ensembl_mart_51", dataset="hsapiens_gene_ensembl",
> host="may2009.archive.ensembl.org",path="/biomart/ 
> martservice",archive=TRUE)
>
> works BUT queries then fail:
>
> ann<-getBM(attributes =
> c 
> ( "ensembl_gene_id 
> ","external_gene_id 
> ","chromosome_name 
> ","start_position 
> ","end_position","strand","hgnc_symbol","gene_biotype"), filters =  
> a.filter, values=fil.vals, mart = mart)
>> ann
> [1] ensembl_gene_id  external_gene_id chromosome_name  start_position
> [5] end_position     strand           hgnc_symbol
> <0 rows> (or 0-length row.names)
>
>
>> a.filter
> [1] "chromosome_name" "start"           "end"
>> fil.vals
> [[1]]
> [1] NA
>
> [[2]]
> [1] 67325000
>
> [[3]]
> [1] 67620000
>
>
> I will try again tomorrow... it's late  at night in Australia....
>
>
>
> -----Original Message-----
> From: Rhoda Kinsella <rhoda at ebi.ac.uk>
> To: Paul Leo <p.leo at uq.edu.au>
> Cc: bioconductor <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] BioMart and Ensembl questions !!!
> Date: Mon, 21 Sep 2009 15:10:42 +0100
>
> Hi Paul
> I'm not really sure why you get this error... I am using the following
> version:
>
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> i386-apple-darwin8.11.1
>
> locale:
> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_1.16.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_0.92-0 XML_1.98-1
>
> Does anyone know why Paul is getting this error?
> Regards,
> Rhoda
>
>
> On 21 Sep 2009, at 14:53, Paul Leo wrote:
>
>> HI Rhoda ,
>> Thanks that seems exactly like I want but .. but it does not work for
>> me...
>>
>> library(biomaRt)
>>> listMarts(host="nov2008.archive.ensembl.org/biomart/martservice")
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Entity 'copy' not defined
>> Entity 'nbsp' not defined
>> Entity 'nbsp' not defined
>> Error in names(x) <- value :
>> 'names' attribute [2] must be the same length as the vector [0]
>>>
>>
>>
>>
>>
>> http://www.ensembl.org/info/website/archives/
>>
>>
>> Once you are there, click on the release you would like to look at  
>> and
>> then on the biomart button. This will give you the
>> URI you need to use
>> in the biomaRt package to get access to that archive. For example
>> the release 51 archive biomart is
>> available at:
>>
>>
>> http://nov2008.archive.ensembl.org/biomart/martview/
>>
>>
>> If you then
>> plug this into biomart you can get access to the information you
>> require:
>>
>>
>>> library(biomaRt)
>>> listMarts(host="may2009.archive.ensembl.org/biomart/martservice")
>>              biomart              version
>> 1 ENSEMBL_MART_ENSEMBL           Ensembl 54
>> 2     ENSEMBL_MART_SNP Ensembl Variation 54
>> 3    ENSEMBL_MART_VEGA              Vega 35
>> 4             REACTOME   Reactome(CSHL US)
>> 5     wormbase_current   WormBase (CSHL US)
>> 6                pride       PRIDE (EBI UK)
>>> mart=useMart("ENSEMBL_MART_ENSEMBL",
>> host="may2009.archive.ensembl.org/biomart/martservice")
>>
>>
>> etc....
>>
>>
>> I hope that helps,
>> Regards,
>> Rhoda
>>
>>
>>
>>
>>
>>
>> On 21 Sep 2009, at 14:25, Paul Leo wrote:
>>
>>> Wow that is fairly terrible , I was surprised this thread was not
>>> followed... did I miss something?
>>>
>>> You can't access hg18 via BioMART only CRCh37!!
>>>
>>> 1)listMarts(archive=TRUE)   # shows mart back to 43 are there
>>>
>>> I'll start tracking back
>>>
>>>
>>> 2)mart<-
>>> useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive
>>> ### WORKS FINE but is CRCh37
>>>
>>> 3)mart<-
>>> useMart
>>> ("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>>
>>> Error in value[[3L]](cond) :
>>> Request to BioMart web service failed. Verify if you are still
>>> connected to the internet.  Alternatively the BioMart web service is
>>> temporarily down.
>>> In addition: Warning message:
>>> In file(file, "r") : unable to resolve  
>>> 'july2008.archive.ensembl.org'
>>>> #####  THAT's JUST BAD !
>>>
>>> 4)mart<-
>>> useMart
>>> ("ensembl_mart_49",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>> Checking attributes ... ok
>>> Checking filters ... ok
>>> Warning message:
>>> In bmAttrFilt("filters", mart) :
>>> biomaRt warning: looks like we're connecting to an older version of
>>> BioMart suite. Some biomaRt functions might not work.
>>>
>>> . ### works but that is NCBI36 but the attributes have old
>>> descriptions
>>> but may work for you (and me)
>>>
>>>
>>>
>>> I think 'july2008.archive.ensembl.org'  SHOULD BE
>>> 'jul2008.archive.ensembl.org'
>>> (three letter month name)
>>>
>>> Anyway to fix that?
>>>
>>> Cheers
>>> Paul
>>>
>>> NOTE also broken in production version 2.9.2 I think
>>>
>>>> sessionInfo()
>>> R version 2.10.0 Under development (unstable) (2009-09-20 r49770)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>>> [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>>> [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>>> [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>> other attached packages:
>>> [1] biomaRt_2.1.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] RCurl_1.2-0 XML_2.6-0
>>>
>>> -----Original Message-----
>>> From: jiayu wen <jiayu.jwen at gmail.com>
>>> To: bioconductor at stat.math.ethz.ch
>>> Subject: [BioC] BioMart and Ensembl questions
>>> Date: Tue, 1 Sep 2009 09:11:09 +0200
>>>
>>>
>>> Dear list,
>>>
>>> About over a year ago, I extracted 3'UTR sequences for about 7000
>>> genes using Biomart for my project. This is the command that I used:
>>>
>>> (my gene_list is in gene symbol)
>>>> my_mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>> seq_3utr = getSequence(id = unique(gene.symbol),
>>> type="hgnc_symbol",seqType="3utr",mart = my_mart)
>>>> seq_3utr = seq_3utr[seq_3utr[,"3utr"] != "Sequence unavailable",]
>>>> here: extract longest 3'UTR for each unique gene symbol
>>>> exportFASTA(seq_3utr, file=paste("s3utr.fa",sep=""))
>>>
>>> As my project goes, I now need 3'UTR genomic coordinates to get
>>> phastcons conservation for some regions in 3'UTR.
>>> To do that, I first convert hgnc_symbol back to ensembl_gene_id,  
>>> then
>>>
>>> get 3'UTR coordinates using getBM like this:
>>>
>>>> s3utr = read.DNAStringSet(paste("s3utr.fa",sep=""),format="fasta")
>>>> gene_names = names(s3utr)
>>>> hgnc2ensembl  =
>>>> getBM(attributes=c("hgnc_symbol","ensembl_gene_id"),
>>> filters="hgnc_symbol", values=gene_names, mart=my_mart)
>>>> s3utr_pos  = getBM(attributes=c("ensembl_gene_id",
>>> "chromosome_name","strand","3_utr_start", "3_utr_end"),
>>> filters="ensembl_gene_id", values=as.character(hgnc2ensembl
>>> $ensembl_gene_id), mart=my_mart)
>>>> s3utr_pos = s3utr_pos[complete.cases(s3utr_pos),]
>>>
>>> By doing that, now I can only get about 5000 gene symbols with 3'UTR
>>> coordinates (converting from hgnc_symbol back to ensembl_gene_id
>>> looses about 250 genes). I was thinking it might be version
>>> difference? So I tried to use ensembl archive but it gives me error
>>> as
>>> below:
>>>
>>>> my_mart =
>>> useMart("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=T)
>>> Error in value[[3L]](cond) :
>>> Request to BioMart web service failed. Verify if you are still
>>> connected to the internet.  Alternatively the BioMart web service is
>>> temporarily down.
>>> In addition: Warning message:
>>> In file(file, "r") : cannot open: HTTP status was '404 Not Found'
>>>
>>> Is there anyway that I can get 3'UTR coordinates for all my gene
>>> list?
>>>
>>> Thanks for any help.
>>>
>>> Jean
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> Rhoda Kinsella Ph.D.
>> Ensembl Bioinformatician,
>> European Bioinformatics Institute (EMBL-EBI),
>> Wellcome Trust Genome Campus,
>> Hinxton
>> Cambridge CB10 1SD,
>> UK.
>>
>>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



More information about the Bioconductor mailing list