[R] Programming R to avoid loops

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sat Apr 18 21:30:18 CEST 2015


Oh, great. An app [1] that introduces "me too" emails with a click and uses HTML to tell us all about it. Jim, this is probably not a good place to use that function. Read the posting guide about mailing list nettiquette.

[1] http://readwrite.com/2013/06/05/new-boxer-ios-email-app-is-all-about-adding-features
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On April 18, 2015 10:55:14 AM PDT, Jim Mankin <sammankin at gmail.com> wrote:
>Jim Mankin liked your message with Boxer. On April 18, 2015 at 10:48:17
>AM MST, Charles C. Berry <ccberry at ucsd.edu> wrote:On Sat, 18 Apr 2015,
>Brant Inman wrote:> I have two large data frames with the following
>structure:>>> df1> id date test1.result> 1 a 2009-08-28 1> 2 a
>2009-09-16 1> 3 b 2008-08-06 0> 4 c 2012-02-02 1> 5 c 2010-08-03 1> 6 c
>2012-08-02 0>>> df2> id date test2.result> 1 a 2011-02-03 1> 2 b
>2011-09-27 0> 3 b 2011-09-01 1> 4 c 2009-07-16 0> 5 c 2009-04-15 0> 6 c
>2010-08-10 1>> I need to match items in df2 to those in df1 with
>specific matching > criteria. I have written a looped matching
>algorithm that works, but it > is very slow with my large datasets. I
>am requesting help on making a > version of this code that is faster
>and “vectorized" so to speak.As I see in your posted code, you match
>id's exactly, dates according to a range, and count the number of
>positive test result in the second data.frame.For this, the
>countOverlaps() function of the GenomicRanges package will do the trick
>with suitably defined GRanges objects. Something
>like:require(GenomicRanges)date1 date2 lagdays predays gr1 gr2 
>IRanges(start=date2+predays,end=date2+lagdays), strand="*")[
>df2$test2.result==1,]df1$test2.count For the example data.frames (as
>rendered by Jim Lemon's code), this yields> df1 id date test1.result
>test2.count1 a 2009-08-28 1 02 a 2009-09-16 1 03 b 2008-08-06 0 04 c
>2012-02-02 1 05 c 2010-08-03 1 16 c 2012-08-02 0 0The GenomicRanges
>package is
>athttp://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.htmlwhere
>you will find installation instructions and links to
>vignettes.HTH,Chuck______________________________________________R-help at r-project.org
>mailing list -- To UNSUBSCRIBE and more,
>seehttps://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the
>posting guide http://www.R-project.org/posting-guide.htmland provide
>commented, minimal, self-contained, reproducible code.     
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list