[BioC] using a gtf file to map reads

Alejandro Reyes alejandro.reyes at embl.de
Mon Jul 1 10:20:41 CEST 2013


Thanks Valerie!

I will make sure to change all the documentation to use "disjointExons" 
instead.

Best regards,
Alejandro


> Mike and Alejandro,
>
> disjointExons() is now in GenomicFeatures 1.13.18. It's documented on 
> the same man page as ?transcripts, ?exons, ?cds ,etc.
>
> The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a 
> few performance modifications. Primary changes were a helper function 
> with a more vectorized approach to create 'geneNames', 'transcripts' 
> and reducing the number of times metadata columns were added to the 
> GRanges.
>
> > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
>    user  system elapsed
>   8.205   0.028   8.397
> > system.time(disjointExons(txdb, TRUE, TRUE))
>    user  system elapsed
>   5.596   0.032   5.764
>
> > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
>    user  system elapsed
>  37.286   0.228  37.954
> > system.time(disjointExons(txdb, TRUE, TRUE))
>    user  system elapsed
>  28.130   0.228  28.822
>
> The multiple requests on the mailing list for a function of this type 
> made me think it should be included in one of the infrastructure 
> packages. The goal of adding this to GenomicFeatures was to increase 
> visibility and make it more readily available to packages that don't 
> depend on DEXSeq. I hope having the function in two places doesn't add 
> a layer of confusion. I have a suite of unit tests to ensure 
> disjointExons and prepareAnnotationForDEXSeq are in sync.
>
> Let me know if you have any questions or would like any changes made 
> to the man page.
>
> Thanks,
> Valerie
>
>
>
> On 06/11/13 12:30, Valerie Obenchain wrote:
>> Great. I'll go ahead and put this in GenomicFeatures with credit to you
>> and Alejandro.
>>
>> Thanks.
>> Valerie
>>
>> On 06/11/2013 09:39 AM, Michael Love wrote:
>>> hi Valerie,
>>>
>>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain
>>> <vobencha at fhcrc.org> wrote:
>>>> Hi Mike (Love),
>>>>
>>>> Would you be interested in contributing a disjointExons() to
>>>> GenomicFeatures? I think this extraction would be useful to many.
>>>> Maybe we
>>>> also want a disjointExonsBy() but the only 'by' would be genes ...?
>>>>
>>>> Valerie
>>>>
>>>
>>> I'd be happy to contribute any of this code in parathyroidSE.
>>> Alejandro Reyes has formalized this part of the code into the
>>> following function in DEXSeq >= 1.6.0:
>>>
>>>> prepareAnnotationForDEXSeq
>>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts =
>>> TRUE)
>>> {
>>>      stopifnot(is(transcriptDb, "TranscriptDb"))
>>>      exonsByGene <- exonsBy(transcriptDb, by = "gene")
>>>      exonicParts <- disjoin(unlist(exonsByGene))
>>>      if (!aggregateGenes) {
>>>          overlaps <- findOverlaps(exonicParts, exonsByGene)
>>>          geneNames <- names(exonsByGene)[subjectHits(overlaps)]
>>> .....
>>>
>>> best,
>>>
>>> Mike
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list