[BioC] BioMart and Ensembl questions !!!

Rhoda Kinsella rhoda at ebi.ac.uk
Mon Sep 21 15:44:12 CEST 2009


Sorry, I should have put the release 51 URI into the biomaRt package  
instead of release 54 :).
  Here it is:

 > library(biomaRt)
 > listMarts(host="nov2008.archive.ensembl.org/biomart/martservice")
                biomart      version
1 ENSEMBL_MART_ENSEMBL   Ensembl 51
2     ENSEMBL_MART_SNP Variation 51
3    ENSEMBL_MART_VEGA      Vega 32
 >

etc....

Rhoda



On 21 Sep 2009, at 14:40, Rhoda Kinsella wrote:

> Hi Paul,
> You can get access to archived ensembl marts by going to the following
> link:
>
> http://www.ensembl.org/info/website/archives/
>
> Once you are there, click on the release you would like to look at and
> then on the biomart button. This will give you the
> URI you need to use in the biomaRt package to get access to that
> archive. For example the release 51 archive biomart is
> available at:
>
> http://nov2008.archive.ensembl.org/biomart/martview/
>
> If you then plug this into biomart you can get access to the
> information you require:
>
>> library(biomaRt)
>> listMarts(host="may2009.archive.ensembl.org/biomart/martservice")
>                biomart              version
> 1 ENSEMBL_MART_ENSEMBL           Ensembl 54
> 2     ENSEMBL_MART_SNP Ensembl Variation 54
> 3    ENSEMBL_MART_VEGA              Vega 35
> 4             REACTOME   Reactome(CSHL US)
> 5     wormbase_current   WormBase (CSHL US)
> 6                pride       PRIDE (EBI UK)
>> mart=useMart("ENSEMBL_MART_ENSEMBL",
> host="may2009.archive.ensembl.org/biomart/martservice")
>
> etc....
>
> I hope that helps,
> Regards,
> Rhoda
>
>
>
> On 21 Sep 2009, at 14:25, Paul Leo wrote:
>
>> Wow that is fairly terrible , I was surprised this thread was not
>> followed... did I miss something?
>>
>> You can't access hg18 via BioMART only CRCh37!!
>>
>> 1)listMarts(archive=TRUE)   # shows mart back to 43 are there
>>
>> I'll start tracking back
>>
>>
>> 2)mart<-
>> useMart("ensembl_mart_51",dataset="hsapiens_gene_ensembl",archive
>> ### WORKS FINE but is CRCh37
>>
>> 3)mart<-
>> useMart
>> ("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=TRUE)
>>
>> Error in value[[3L]](cond) :
>> Request to BioMart web service failed. Verify if you are still
>> connected to the internet.  Alternatively the BioMart web service is
>> temporarily down.
>> In addition: Warning message:
>> In file(file, "r") : unable to resolve 'july2008.archive.ensembl.org'
>>> #####  THAT's JUST BAD !
>>
>> 4)mart<-
>> useMart
>> ("ensembl_mart_49",dataset="hsapiens_gene_ensembl",archive=TRUE)
>> Checking attributes ... ok
>> Checking filters ... ok
>> Warning message:
>> In bmAttrFilt("filters", mart) :
>> biomaRt warning: looks like we're connecting to an older version of
>> BioMart suite. Some biomaRt functions might not work.
>>
>> . ### works but that is NCBI36 but the attributes have old
>> descriptions
>> but may work for you (and me)
>>
>>
>>
>> I think 'july2008.archive.ensembl.org'  SHOULD BE
>> 'jul2008.archive.ensembl.org'
>> (three letter month name)
>>
>> Anyway to fix that?
>>
>> Cheers
>> Paul
>>
>> NOTE also broken in production version 2.9.2 I think
>>
>>> sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-09-20 r49770)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>> [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> base
>>
>> other attached packages:
>> [1] biomaRt_2.1.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.2-0 XML_2.6-0
>>
>> -----Original Message-----
>> From: jiayu wen <jiayu.jwen at gmail.com>
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] BioMart and Ensembl questions
>> Date: Tue, 1 Sep 2009 09:11:09 +0200
>>
>>
>> Dear list,
>>
>> About over a year ago, I extracted 3'UTR sequences for about 7000
>> genes using Biomart for my project. This is the command that I used:
>>
>> (my gene_list is in gene symbol)
>>> my_mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>> seq_3utr = getSequence(id = unique(gene.symbol),
>> type="hgnc_symbol",seqType="3utr",mart = my_mart)
>>> seq_3utr = seq_3utr[seq_3utr[,"3utr"] != "Sequence unavailable",]
>>> here: extract longest 3'UTR for each unique gene symbol
>>> exportFASTA(seq_3utr, file=paste("s3utr.fa",sep=""))
>>
>> As my project goes, I now need 3'UTR genomic coordinates to get
>> phastcons conservation for some regions in 3'UTR.
>> To do that, I first convert hgnc_symbol back to ensembl_gene_id, then
>> get 3'UTR coordinates using getBM like this:
>>
>>> s3utr = read.DNAStringSet(paste("s3utr.fa",sep=""),format="fasta")
>>> gene_names = names(s3utr)
>>> hgnc2ensembl  = getBM(attributes=c("hgnc_symbol","ensembl_gene_id"),
>> filters="hgnc_symbol", values=gene_names, mart=my_mart)
>>> s3utr_pos  = getBM(attributes=c("ensembl_gene_id",
>> "chromosome_name","strand","3_utr_start", "3_utr_end"),
>> 		filters="ensembl_gene_id", values=as.character(hgnc2ensembl
>> $ensembl_gene_id), mart=my_mart)
>>> s3utr_pos = s3utr_pos[complete.cases(s3utr_pos),]
>>
>> By doing that, now I can only get about 5000 gene symbols with 3'UTR
>> coordinates (converting from hgnc_symbol back to ensembl_gene_id
>> looses about 250 genes). I was thinking it might be version
>> difference? So I tried to use ensembl archive but it gives me error  
>> as
>> below:
>>
>>> my_mart =
>> useMart("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=T)
>> Error in value[[3L]](cond) :
>>  Request to BioMart web service failed. Verify if you are still
>> connected to the internet.  Alternatively the BioMart web service is
>> temporarily down.
>> In addition: Warning message:
>> In file(file, "r") : cannot open: HTTP status was '404 Not Found'
>>
>> Is there anyway that I can get 3'UTR coordinates for all my gene  
>> list?
>>
>> Thanks for any help.
>>
>> Jean
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



More information about the Bioconductor mailing list