[BioC] getSequence ensmebl biomaRt

James W. MacDonald jmacdon at med.umich.edu
Thu Aug 13 14:53:17 CEST 2009


Hi Mayra,

Mayra Eduardoff wrote:
> Hi Steffen
> 
> 
> I want to retrieve a genomic sequence with biomaRt:
> 
> 
> Session(info)
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] BSgenome_1.12.3         cureos_0.3              Biostrings_2.12.8
> IRanges_1.2.3           zfv2.db_1.0.0           RSQLite_0.7-1
>  [7] DBI_0.2-4               Agi4x44PreProcess_1.4.0 genefilter_1.24.2
> annotate_1.22.0         AnnotationDbi_1.6.1     venn_1.5
> [13] multtest_2.1.1          vsn_3.12.0              Biobase_2.5.5
> xtable_1.5-5            limma_2.18.2            biomaRt_2.0.0
> 
> 
> 
>> mart <- useMart("ensembl")
>> mart <- useDataset(mart=mart, "drerio_gene_ensembl")
> 
> seq <- getSequence(chromosome = 15, start = 18357968, end = 18360987, mart =
> mart)
> 
> Fehler in getSequence(chromosome = 15, start = 18357968, end = 18360987,  :
>   Please specify the type of sequence that needs to be retrieved when using
> biomaRt in web service mode.  Choose either gene_exon,
> transcript_exon,transcript_exon_intron, gene_exon_intron, cdna,
> coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide,
> 3utr or 5utr
> 
> Apart from the fact that I want a genomic region even if I specify type it
> doesn t seem to work :
> 
> seq <- getSequence(chromosome = 15, start = 18357968, end = 18360987,
> type="gene_exon", mart = mart)
> Fehler in getSequence(chromosome = 15, start = 18357968, end = 18360987,  :
>   Please specify the type of sequence that needs to be retrieved when using
> biomaRt in web service mode.  Choose either gene_exon,
> transcript_exon,transcript_exon_intron, gene_exon_intron, cdna,
> coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide,
> 3utr or 5utr
> 
> 
> or  as in documentation (although this doesn t make any sense to me to
> specify seqType and type...)

You have to specify seqType and type because the sequences don't come 
back in the same order you requested, so the type argument is used to 
label the sequences.

Also, I don't see any way to get inter-genic sequences. For instance:

 > getSequence(15,18357968,18360987,seqType="cdna", mart=mart, 
type="ensembl_transcript_id")
[1] cdna                  ensembl_transcript_id
<0 rows> (or 0-length row.names)

Because this portion of the zebrafish genome contains no known genes. 
However, if I pick a region that does contain a gene:

 > getSequence(15,18723006,18741517,seqType="cdna", mart=mart, 
type="ensembl_transcript_id")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                                                                     cdna
1 
AGGAGCCGCTCAGACCACACCAGTGCAGGGTCAGAACCTGGTGACAAATAATGTCTCAGTGGTGGAGGGCGAGACGGCCATCATCAGCTGCCGGGTGAAAAACAACGACGACTCCGTCATCCAACTGCTCAACCCCAACCGGCAGACTATCTACTTCAGAGACGTTAGACCTTTGAAGGACAGTCGGTTTCAGCTGGTAAACTTCTCCGACAACGAGCTCTTGGTGTCCCTGTCCAACGTGTCTCTGTCGGACGAGGGCCGCTACGTGTGTCAACTCTACACGGATCCACCGCAAGAAGCCTACGCCGACATCACTGTACTGGTTCCACCAGGCAACCCAATCTTAGAGTCCCGCGAGGAAATCGTGAGCGAGGGGAATGAGACCGAGATAACCTGCACCGCCATGGGCAGCAAACCTGCTTCCACCATCAAATGGATGAAAGGCGACCAACCACTGCAAGGTGAGGCGACTGTGGAGGAGTTATACGACAGGATGTTCACTGTCACCAGCCGGCTCAGGCTCACCGTCTCTAAGGAGGACGATGGAGTGGCCGTCATCTGCATCATTGACCATCCAGCCGTGAAGGACTTCCAGGCCCAGAAATACCTGGAAGTGCAGTATAAACCAGAAGTGAAGATTGTGGTGGGATTCCCAGAGGGTTTGACCAGAGAAGGAGAAAATCTCGAGCTGACATGCAAAGCTAAAGGAAAACCGCAGCCTCATCAAATTAACTGGCTCAAAGTGGATGATGATTTCCCCTCCCACGCCTTGGTAACTGGCTCTGATCTCTTCATCGAAAACCTTAACAAGTCCTACAACGGAACGTACCGCTGTGTGGCATCTAACTTAGTGGGAGAAGCCTACGATGATTACATCCTTTATGTATACGATTCAAGAGCAGATGGAGCGCCACAGAAAATTGATCATGCCGTCATCGGCGGAGTTGTCGCAGTGGTTGTGTTCGCCATGCTTTGTCTCCTGATTGTTC
TTGGCCGATATTTCGCCAGACACAAAGGGACCTACTTCACCCACGAAGCTAAAGGAGCGGATGACGCGGCGGACGCCGACACTGCCATCATCAACGCAGAGGGCGGACACAACAATTCGGATGACAAGAAGGAATACTACATTTAA
   ensembl_transcript_id
1    ENSDART00000062603

Best,

Jim



> 
> seq <- getSequence(chromosome = 15, start = 18357968, end = 18360987,
> type="entrez", seqType="cdna", mart = mart)
> Fehler in getBM(c(seqType, type), filters = c("chromosome_name", "start",  :
> 
> Invalid attribute(s): entrez
> Please use the function 'listAttributes' to get valid attribute names
> 
> 
> 
> 
> I  can t load  in msyql mode either anymore :
>  mart <- useMart("ensembl", mysql=TRUE)
> Fehler: mysql access to Ensembl is no longer available through this package
> the web service mode supports all queries.  If mysql is needed a separate
> package will become available with limited mysql query support.
> 
> 
> I would be very greatful for you help !
> 
> 
> kind regards,
> 
> Mayra
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list