[BioC] BioMart and Ensembl questions !!!

Steffen at stat.Berkeley.EDU Steffen at stat.Berkeley.EDU
Mon Sep 21 19:56:02 CEST 2009


Hi Paul, Rhoda,

Jim's earlier suggestion should fix this.  You need to specify a value for
the chromosome name you're interested in.

fil.vals = list(1,67325000,67620000)

Then your query should return results (if there are any genes in this
region).

Cheers,
Steffen

> Hi Paul,
> It looks like you are using an unstable version of biomaRt (R version
> 2.10.0 Under development (unstable) (2009-09-20 r49770))
> so can you try this with the 2.9.0 version and see if that works? Let
> me know how you get on.
> Regards,
> Rhoda
>
> On 21 Sep 2009, at 15:23, Paul Leo wrote:
>
>> HI Rhoda,
>> Yes a different version is probably it . There is STILL something
>> wrong, based on your suggestions:
>>
>> library(biomaRt)
>> listMarts(host="may2009.archive.ensembl.org",path="/biomart/
>> martservice",archive=TRUE)
>> mart=useMart("ensembl_mart_51", dataset="hsapiens_gene_ensembl",
>> host="may2009.archive.ensembl.org",path="/biomart/
>> martservice",archive=TRUE)
>>
>> works BUT queries then fail:
>>
>> ann<-getBM(attributes =
>> c
>> ( "ensembl_gene_id
>> ","external_gene_id
>> ","chromosome_name
>> ","start_position
>> ","end_position","strand","hgnc_symbol","gene_biotype"), filters =
>> a.filter, values=fil.vals, mart = mart)
>>> ann
>> [1] ensembl_gene_id  external_gene_id chromosome_name  start_position
>> [5] end_position     strand           hgnc_symbol
>> <0 rows> (or 0-length row.names)
>>
>>
>>> a.filter
>> [1] "chromosome_name" "start"           "end"
>>> fil.vals
>> [[1]]
>> [1] NA
>>
>> [[2]]
>> [1] 67325000
>>
>> [[3]]
>> [1] 67620000
>>
>>
>> I will try again tomorrow... it's late  at night in Australia....
>>
>>
>>
>> -----Original Message-----
>> From: Rhoda Kinsella <rhoda at ebi.ac.uk>
>> To: Paul Leo <p.leo at uq.edu.au>
>> Cc: bioconductor <bioconductor at stat.math.ethz.ch>
>> Subject: Re: [BioC] BioMart and Ensembl questions !!!
>> Date: Mon, 21 Sep 2009 15:10:42 +0100
>>
>> Hi Paul
>> I'm not really sure why you get this error... I am using the following
>> version:
>>
>>> sessionInfo()
>> R version 2.8.0 (2008-10-20)
>> i386-apple-darwin8.11.1
>>
>> locale:
>> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] biomaRt_1.16.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_0.92-0 XML_1.98-1
>>
>> Does anyone know why Paul is getting this error?
>> Regards,
>> Rhoda
>>
>>
>> On 21 Sep 2009, at 14:53, Paul Leo wrote:
>>
>>> HI Rhoda ,
>>> Thanks that seems exactly like I want but .. but it does not work for
>>> me...
>>>
>>> library(biomaRt)
>>>> listMarts(host="nov2008.archive.ensembl.org/biomart/martservice")
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'copy' not defined
>>> Entity 'nbsp' not defined
>>> Entity 'nbsp' not defined
>>> Error in names(x) <- value :
>>> 'names' attribute [2] must be the same length as the vector [0]
>>>>
>>>
>>>
>>>
>>>
>>> http://www.ensembl.org/info/website/archives/
>>>
>>>
>>> Once you are there, click on the release you would like to look at
>>> and
>>> then on the biomart button. This will give you the
>>> URI you need to use
>>> in the biomaRt package to get access to that archive. For example
>>> the release 51 archive biomart is
>>> available at:
>>>
>>>
>>> http://nov2008.archive.ensembl.org/biomart/martview/
>>>
>>>
>>> If you then
>>> plug this into biomart you can get access to the information you
>>> require:
>>>
>>>
>>>> library(biomaRt)
>>>> listMarts(host="may2009.archive.ensembl.org/biomart/martservice")
>>>              biomart              version
>>> 1 ENSEMBL_MART_ENSEMBL           Ensembl 54
>>> 2     ENSEMBL_MART_SNP Ensembl Variation 54
>>> 3    ENSEMBL_MART_VEGA              Vega 35
>>> 4             REACTOME   Reactome(CSHL US)
>>> 5     wormbase_current   WormBase (CSHL US)
>>> 6                pride       PRIDE (EBI UK)
>>>> mart=useMart("ENSEMBL_MART_ENSEMBL",
>>> host="may2009.archive.ensembl.org/biomart/martservice")
>>>
>>>
>>> etc....
>>>
>>>
>>> I hope that helps,
>>> Regards,
>>> Rhoda
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 21 Sep 2009, at 14:25, Paul Leo wrote:
>>>
>>>> Wow that is fairly terrible , I was surprised this thread was not
>>>> followed... did I miss something?
>>>>
>>>> You can't access hg18 via BioMART only CRCh37!!
>>>>
>>>> 1)listMarts(archive=TRUE)   # shows mart back to 43 are there
>>>>
>>>> I'll start tracking back
>>>>
>>>>
>>>> 2)mart<-
>>>> useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive
>>>> ### WORKS FINE but is CRCh37
>>>>
>>>> 3)mart<-
>>>> useMart
>>>> ("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>>>
>>>> Error in value[[3L]](cond) :
>>>> Request to BioMart web service failed. Verify if you are still
>>>> connected to the internet.  Alternatively the BioMart web service is
>>>> temporarily down.
>>>> In addition: Warning message:
>>>> In file(file, "r") : unable to resolve
>>>> 'july2008.archive.ensembl.org'
>>>>> #####  THAT's JUST BAD !
>>>>
>>>> 4)mart<-
>>>> useMart
>>>> ("ensembl_mart_49",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>>> Checking attributes ... ok
>>>> Checking filters ... ok
>>>> Warning message:
>>>> In bmAttrFilt("filters", mart) :
>>>> biomaRt warning: looks like we're connecting to an older version of
>>>> BioMart suite. Some biomaRt functions might not work.
>>>>
>>>> . ### works but that is NCBI36 but the attributes have old
>>>> descriptions
>>>> but may work for you (and me)
>>>>
>>>>
>>>>
>>>> I think 'july2008.archive.ensembl.org'  SHOULD BE
>>>> 'jul2008.archive.ensembl.org'
>>>> (three letter month name)
>>>>
>>>> Anyway to fix that?
>>>>
>>>> Cheers
>>>> Paul
>>>>
>>>> NOTE also broken in production version 2.9.2 I think
>>>>
>>>>> sessionInfo()
>>>> R version 2.10.0 Under development (unstable) (2009-09-20 r49770)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>>>> [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>>>> [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>>>> [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] biomaRt_2.1.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] RCurl_1.2-0 XML_2.6-0
>>>>
>>>> -----Original Message-----
>>>> From: jiayu wen <jiayu.jwen at gmail.com>
>>>> To: bioconductor at stat.math.ethz.ch
>>>> Subject: [BioC] BioMart and Ensembl questions
>>>> Date: Tue, 1 Sep 2009 09:11:09 +0200
>>>>
>>>>
>>>> Dear list,
>>>>
>>>> About over a year ago, I extracted 3'UTR sequences for about 7000
>>>> genes using Biomart for my project. This is the command that I used:
>>>>
>>>> (my gene_list is in gene symbol)
>>>>> my_mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>>> seq_3utr = getSequence(id = unique(gene.symbol),
>>>> type="hgnc_symbol",seqType="3utr",mart = my_mart)
>>>>> seq_3utr = seq_3utr[seq_3utr[,"3utr"] != "Sequence unavailable",]
>>>>> here: extract longest 3'UTR for each unique gene symbol
>>>>> exportFASTA(seq_3utr, file=paste("s3utr.fa",sep=""))
>>>>
>>>> As my project goes, I now need 3'UTR genomic coordinates to get
>>>> phastcons conservation for some regions in 3'UTR.
>>>> To do that, I first convert hgnc_symbol back to ensembl_gene_id,
>>>> then
>>>>
>>>> get 3'UTR coordinates using getBM like this:
>>>>
>>>>> s3utr = read.DNAStringSet(paste("s3utr.fa",sep=""),format="fasta")
>>>>> gene_names = names(s3utr)
>>>>> hgnc2ensembl  =
>>>>> getBM(attributes=c("hgnc_symbol","ensembl_gene_id"),
>>>> filters="hgnc_symbol", values=gene_names, mart=my_mart)
>>>>> s3utr_pos  = getBM(attributes=c("ensembl_gene_id",
>>>> "chromosome_name","strand","3_utr_start", "3_utr_end"),
>>>> filters="ensembl_gene_id", values=as.character(hgnc2ensembl
>>>> $ensembl_gene_id), mart=my_mart)
>>>>> s3utr_pos = s3utr_pos[complete.cases(s3utr_pos),]
>>>>
>>>> By doing that, now I can only get about 5000 gene symbols with 3'UTR
>>>> coordinates (converting from hgnc_symbol back to ensembl_gene_id
>>>> looses about 250 genes). I was thinking it might be version
>>>> difference? So I tried to use ensembl archive but it gives me error
>>>> as
>>>> below:
>>>>
>>>>> my_mart =
>>>> useMart("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=T)
>>>> Error in value[[3L]](cond) :
>>>> Request to BioMart web service failed. Verify if you are still
>>>> connected to the internet.  Alternatively the BioMart web service is
>>>> temporarily down.
>>>> In addition: Warning message:
>>>> In file(file, "r") : cannot open: HTTP status was '404 Not Found'
>>>>
>>>> Is there anyway that I can get 3'UTR coordinates for all my gene
>>>> list?
>>>>
>>>> Thanks for any help.
>>>>
>>>> Jean
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> Rhoda Kinsella Ph.D.
>>> Ensembl Bioinformatician,
>>> European Bioinformatics Institute (EMBL-EBI),
>>> Wellcome Trust Genome Campus,
>>> Hinxton
>>> Cambridge CB10 1SD,
>>> UK.
>>>
>>>
>>
>> Rhoda Kinsella Ph.D.
>> Ensembl Bioinformatician,
>> European Bioinformatics Institute (EMBL-EBI),
>> Wellcome Trust Genome Campus,
>> Hinxton
>> Cambridge CB10 1SD,
>> UK.
>>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list