[BioC] R: R: BioMart error occurred again

Steffen at stat.Berkeley.EDU Steffen at stat.Berkeley.EDU
Mon Nov 23 20:00:11 CET 2009


Hi Maura,

With "query in batch" I meant querying multiple IDs at once, not one at a
time.

There should be a way to convert your query from querying every id one by
one to a query for everything in batch and then combine the results in R.

For example:

1) you make a vector with all the target transcript ID's that are in your
miRNA set and retrieve all 3utrs for all of them at once.:

library(biomaRt)
hmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")

targets =
c("ENST00000014914","ENST00000044462","ENST00000006101","ENST00000164305")
targets3UTR=
getSequence(id=targets,type="ensembl_transcript_id",seqType="3utr",mart=hmart)

2) in a second query retrieve the gene symbols and ensembl gene ids for
this set:

idmap =
getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript_id"),filters
= "ensembl_transcript_id",values=targets, mart=hmart)

Then in a next step you combine the information from targets3UTR and idmap
in R.  So all you need is two queries to biomaRt and then loop over the
results in R to combine the data.


Let me know if this solves your problem.

Cheers,
Steffen




Cheers,
Steffen

> I read that message and asked for some guidelines to query  biomaRt in
> batch mode.
> The PDF file available from biomaRt on-line pages shows a number of useful
> ways to extract useful data but it
> does not mention any batch interrogation mode.
> I thought R CMD BATCH would be the way to do that. If so it will take a
> while.
>
> Basically I am trying to extract the 3utr sequence for each target gene
> transcript listed in data set hsTargets.
> Since I have to save to a file the miRNA identifier, the miRNA sequence,
> followed by all its target  gene transcripts with their 3utr sequences, my
> R script  loops on each miRNA identifier, reads out all its target gene
> transcript identifiers  from
> hsTargets, and subits such an ENST  list to biomaRt  to get the relative
> 3UTR sequences:
>
> ## -------------------- GET 3UTR SEQUENCES FOR TARGET GENE TRANSCRIPTS
>     gene_seq <- getSequence
> (id=tmp[,"target"],type="ensembl_transcript_id",seqType="3utr",mart=hmart)
>
> In addition, to the purpose of identifying the target transcripts in the
> output file I also ask biomaRt for some other target identifiers providing
> the ENST filter:
>
>  gene_map <-
> getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript_id"),
>                                         filters = "ensembl_transcript_id",
> values=gene_seq[j,"ensembl_transcript_id"],
> mart=hmart)
>
> The  typical output file looks like the  example pasted at the bottom.
> My question is: how can I rewrite my R script so as to accomplish my task
> in batch mode ?
> I hope I won't have to get all the 3utr sequences  for all the target gene
> transcripts listed in hsTargets. together.
>
> Thank you,
> Maura
>
>>hsa-miR-7
> UGGAAGACUAGUGAUUUUGUUGU   UGGAAGACUAGUGAUUUUGUUGU
>>GPRC5A|ENSG00000013588|ENST00000014914
> CTCTGTCCTGAA
> ............................................................................................................................................................................
> .............................................................................................................................................................................................................
>>PSMA4|ENSG00000041357|ENST00000044462
> AATCAGAGATTTTATTACTCATTTGGGGCACCATTTCAGTGTAAAAGCAGTCCTACTCTTCCACACTAGGAAGGCTTTAC
> TTTTTTTAACTGGTGCAGTGGGAAAATA.......................................................................................................................................
> .............................................................................................................................................................................................................
>>COPZ2|ENSG00000005243|ENST00000006101
> AGGCTGTGGATTCAAGGCTCCCTGCCCCCCAGATCATTTCCCCAA...................................................................................
> .............................................................................................................................................................................................................
>>PIGB|ENSG00000069943|ENST00000164305
> ACTTTCCTAGATAAATTAACATT....................................................................................................................................................
> .............................................................................................................................................................................................................
>>ZNF275|ENSG00000063587|ENST00000095634
> AAACGCCCTGTGGTCCCGCGGGACAGGGACGGAGTCCCCAGAGGGGATGGCAGAGTCAAAGGAGATGAACAGTTTT
> GTAGCGCTTATATATTTTGT..........................................................................................................................................................
> ............................................................................................................................................................................................................
>
>
>
>
>
>
>
>
>
>
>
>
> tutti i telefonini TIM!
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list