[BioC] GoStats and microRNA pipeline using Biomart

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Mar 30 16:56:21 CEST 2011


Hi,

On Wed, Mar 30, 2011 at 9:43 AM, David martin <vilanew at gmail.com> wrote:
> Hi,
> I open this new discussion so not to confuse with the previous one.
>
> The objective here is to look for overrepresented GoTerms from microRNA
> targets. One microRNA can have several targets (genes)  and one single gene
> can be targeted by several microRNAs. The assumption is to check for a
> specific microRNAs which GoTerms are overrepresented.
>
>
> Ok so let's say me my microRNA of interest is mir-A.
>
> Step1: based on my favorite prediction algorithm i have managed to get a
> list of genes targeted by mir-A. The genes are ensembl transcripts and as i
> said before miR-A can target several times the same transcript (at different
> location) so i need to account for this.
>
> miR-A targets ->
> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up to 300
> different transcripts.

I don't get why you'd want to have the same transcript multiple times
as a target for the miRNA -- if the miRNA targets the same transcript
in two different locations, you then want to double count the GO terms
associated with that transcript?

Somehow that seems wrong to me -- if the "hit count" of the miRNA to
the transcript is important to you, one thing you can do is store your
miR-A vector as its "table()" so the names will the the transcripts,
and the values will be the number of hits.

> I use biomart to get the corresponding GoIds for these transcripts
>
> ....
> #Select mart database
> mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>
> #Get go for a specific transcript
> # First problem as Biomart will not return twice GoTerms for duplicated
> transcripts. The example below show that for transcript
> c("ENST00000347770","ENST00000347770") i get the same goTerms than for
> transcript c("ENST00000347770").
> # As i said before a microRNA can target several times the same microRNA so
> twice the number of goterms associated to this particular microRNA. Can we
> force biomart to return redundant GoTerms ????

I'm actually still not sure what you want to do, but if you follow my
advice above, you can manipulate the data.frame you get from getBM to
replicate rows (or whatever you're trying to do).

You will also want to add "ensembl_transcript_id" to your vector of
attributes so you can reassociate the rows in the table that is
returned to you with your original ensembl transcripts you are
querying for, eg:

R> gomir <- getBM(attributes=c('ensembl_transcript_id', 'go..', ...),
    filters='ensemble_transcript_id', values=c("ENST..."), mart=mart)

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list