[BioC] using a gtf file to map reads

Michael Love michaelisaiahlove at gmail.com
Mon Jul 1 12:16:07 CEST 2013


hi Valerie,

That's great, thanks. After the next release, I will change the
parathyroidSE vignette to use disjointExons() and the new arguments
for summarizeOverlaps():

1. use inter.feature=FALSE for the exon counting rather than the
custom counting mode
2. use fragments=TRUE rather than counting pairs and singletons separately

best,

Mike

On Mon, Jul 1, 2013 at 10:20 AM, Alejandro Reyes
<alejandro.reyes at embl.de> wrote:
> Thanks Valerie!
>
> I will make sure to change all the documentation to use "disjointExons"
> instead.
>
> Best regards,
> Alejandro
>
>
>
>> Mike and Alejandro,
>>
>> disjointExons() is now in GenomicFeatures 1.13.18. It's documented on the
>> same man page as ?transcripts, ?exons, ?cds ,etc.
>>
>> The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a few
>> performance modifications. Primary changes were a helper function with a
>> more vectorized approach to create 'geneNames', 'transcripts' and reducing
>> the number of times metadata columns were added to the GRanges.
>>
>> > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
>> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
>>    user  system elapsed
>>   8.205   0.028   8.397
>> > system.time(disjointExons(txdb, TRUE, TRUE))
>>    user  system elapsed
>>   5.596   0.032   5.764
>>
>> > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
>>    user  system elapsed
>>  37.286   0.228  37.954
>> > system.time(disjointExons(txdb, TRUE, TRUE))
>>    user  system elapsed
>>  28.130   0.228  28.822
>>
>> The multiple requests on the mailing list for a function of this type made
>> me think it should be included in one of the infrastructure packages. The
>> goal of adding this to GenomicFeatures was to increase visibility and make
>> it more readily available to packages that don't depend on DEXSeq. I hope
>> having the function in two places doesn't add a layer of confusion. I have a
>> suite of unit tests to ensure disjointExons and prepareAnnotationForDEXSeq
>> are in sync.
>>
>> Let me know if you have any questions or would like any changes made to
>> the man page.
>>
>> Thanks,
>> Valerie
>>
>>
>>
>> On 06/11/13 12:30, Valerie Obenchain wrote:
>>>
>>> Great. I'll go ahead and put this in GenomicFeatures with credit to you
>>> and Alejandro.
>>>
>>> Thanks.
>>> Valerie
>>>
>>> On 06/11/2013 09:39 AM, Michael Love wrote:
>>>>
>>>> hi Valerie,
>>>>
>>>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain
>>>> <vobencha at fhcrc.org> wrote:
>>>>>
>>>>> Hi Mike (Love),
>>>>>
>>>>> Would you be interested in contributing a disjointExons() to
>>>>> GenomicFeatures? I think this extraction would be useful to many.
>>>>> Maybe we
>>>>> also want a disjointExonsBy() but the only 'by' would be genes ...?
>>>>>
>>>>> Valerie
>>>>>
>>>>
>>>> I'd be happy to contribute any of this code in parathyroidSE.
>>>> Alejandro Reyes has formalized this part of the code into the
>>>> following function in DEXSeq >= 1.6.0:
>>>>
>>>>> prepareAnnotationForDEXSeq
>>>>
>>>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts =
>>>> TRUE)
>>>> {
>>>>      stopifnot(is(transcriptDb, "TranscriptDb"))
>>>>      exonsByGene <- exonsBy(transcriptDb, by = "gene")
>>>>      exonicParts <- disjoin(unlist(exonsByGene))
>>>>      if (!aggregateGenes) {
>>>>          overlaps <- findOverlaps(exonicParts, exonsByGene)
>>>>          geneNames <- names(exonsByGene)[subjectHits(overlaps)]
>>>> .....
>>>>
>>>> best,
>>>>
>>>> Mike
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>



More information about the Bioconductor mailing list