[BioC] Deleting object rows while looping - II

Hervé Pagès hpages at fhcrc.org
Wed May 1 19:23:30 CEST 2013


Hi Steve,

On 05/01/2013 09:50 AM, Steve Lianoglou wrote:
> Hi folks,
>
> Wow -- that's a lot of great suggestions coming straight from the
> bioc-wizards themselves.
>
> I'm still not going to jump into the details on the logic of what
> Daniel *really* wants, just want to make a comment on Martin's last
> point:
>
>> This type of operation is well-suited to data.table, though I'm not sure
>> enough of the syntax and implementation to know whether Steve's
>>
>>
>> dat <- data.table(chr=chr, pos=posi, seqs=seqs, key=c('chr', 'pos'))
>> result <- dat[, {
>>    list(n.reads=.N, n.unique=length(unique(seqs)))
>> }, by=c('chr', 'pos')]
>>
>> is implemented efficiently -- I'm sure the .N is; just not whether clever
>> thinking is used behind the scenes to avoid looping through function(x)
>> length(unique(x)). The syntax is certainly clearer than my 'view' approach.
>
> I only really used this as a pedagogical example to show that one
> could access subsets of the columns directly by name within the `j`
> expression of the `[.data.table` function.
>
> The .N is essentially a no-op to call as it is already computed for
> you, but repeatedly calling a function within each grouped subset will
> incur the overhead of a function call within each subgroup.
>
> Still, I think the OP would notice a significant boost in performance
> by simply naively translating his code using data.table -- if you
> really wanted to eek out the last bit of performance (which isn't
> really necessary if you're just doing things once, but if you're
> building a pipeline, fell free) that'd be another convo ...
>
> Anyway, it looks like there's a lot of good stuff in this thread
> already. I'd be curious to here back from Daniel when he tries a few
> of these things. Also, wasn't aware of the new(?) `SplitDataFrame`
> mojo -- very nice stuff.

This has been in IRanges since the beginning of the package (2008).
IIRC when Michael added DataFrame/SplitDataFrameList to the package,
the primary use case was to store the values of a RangedData object
(the "values" slot of RangedData is SplitDataFrameList).

If *you* were not aware of SplitDataFrameList, then that means that
*we* are failing to properly communicate/document/expose/advertise
the IRanges/GenomicRanges infrastructure :-/

H.

>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Department of Bioinformatics and Computational Biology
> Genentech
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list