[BioC] reverse complement or no reverse complemnt on biomaRt / biomart.org

James W. MacDonald jmacdon at med.umich.edu
Mon Oct 12 17:57:07 CEST 2009


Hi Tefina,

Tefina Paloma wrote:
> Dear list,
> 
> having a look at the vegfc gene (located on the reverse strand) on the
> website of biomart and querying the 5utr and the flanking sequence yields
> the following:
> 
> http://www.ensembl.org/Homo_sapiens/Transcript/Export?db=core;g=ENSG00000150630;output=fasta;r=4:177604691-177713895;strand=feature;t=ENST00000280193;param=utr5;genomic=5_flanking;_format=HTML
> 
> Doing the same in R, yields essentially the same with the only difference
> that in the case of the flanking sequence the reverse complement is given:
> 
> library(biomaRt)
> library(Biostrings)
> 
> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> 
> vegfc_fs = getSequence(id = c("ENST00000280193"), type =
> "ensembl_transcript_id",
>                     seqType = "transcript_flank", upstream = 3000,
>                     mart = ensembl)
> 
> vegfc_utr = getSequence(id = c("ENST00000280193"), type =
> "ensembl_transcript_id",
>                     seqType = "5utr",  mart = ensembl)
> 
> 
> As the gene is located on the reverse strand, one would probably be
> interested in the reverse complement of the sequence returned by
> ensemble/biomart.
> 
> Although it's nice that the flanking sequence is already reverse
> complemented in R, it should be somehow documented.

The flanking sequence isn't reverse complemented in R, it is reported 
exactly as it is received from the Biomart server.

I am a bit confused here as well; AFAICT, the sequence for the 5' flank 
and UTR are identical from all sources (Ensembl, Biomart and biomaRt).

5' flank:
Ensembl

ccgccgccagcgcccccgccgcagcgcccgcggcccggctcctctcactt

Biomart

CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT

biomaRt

CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT

5'UTR

Ensembl

CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC

Biomart

CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC

biomaRt

CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC

Best,

Jim




> 
> And the question arises, why does biomaRt only return the reverse complement
> of the flanking sequence but not of the utr?
> 
> I would appreciate any hints!
> Thanks a lot in advance,
> Best,
> Tefina
> 
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Biostrings_2.12.8 IRanges_1.2.3     biomaRt_2.0.0
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 RCurl_0.98-1  XML_2.5-3
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list