[R] generating multiple sequences in subsets of data

Jason Baucom jason.baucom at ateb.com
Fri Sep 11 22:36:46 CEST 2009


My apologies for bringing up an old topic, but still having some problems!

I got this code to work, and it was running perfectly fine. I tried it with a larger data set and it crashed my machine, slowly chewing up memory until it could not allocate any more for the process. The following line killed me:

merged_cut_col$pickseq<-with(merged_cut_col,ave(as.numeric(as.Date(pickts)),cpid,FUN=seq))

So, I thought I'd try it another way, using the transformBy in the doBy package:

merged_cut_col<-transformBy(~cpid,data=merged_cut_col,pickseqREDO=seq(cpid))

This too ran for hours until eventually running out of memory. I've tried it on a beefier machine and I run in to the same problem.

Is there an alternative to these methods that would be less memory/time intensive? This is a fairly simple routine I'm trying, just generating sequence numbers based on simple criteria. I'm surprised it's bringing my computer to its knees. I'm running about 1M rows now, but doing other operations such as merges or adding new columns/rows seems fine.

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help at r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:

> I got this to work. Thanks for the insight! row7 is what I need.
>
>
>
>> checkLimit <-function(x) x<3
>
>> stuff$row6<-checkLimit(stuff$row1)

You don't actually need those intermediate steps:

 > stuff$row7 <- with(stuff, ave(row1, row2, row1 < 3, FUN = seq))
 > stuff
    row1 row2 row7
1     0    1    1
2     1    1    2
3     2    1    3
4     3    1    1
5     4    1    2
6     5    1    3
7     1    2    1
8     2    2    2
9     3    2    1
10    4    2    2

The expression row1 < 3 gets turned into a logical vector that ave()  
is perfectly happy with.

-- 
David Winsemius

>
>> stuff$row7 <- with(stuff, ave(row1,row2, row6, FUN = sequence))
>
>> stuff
>
>   row1 row2 row3 row4 row5  row6 row7
>
> 1     0    1    1    1    1  TRUE    1
>
> 2     1    1    2    2    2  TRUE    2
>
> 3     2    1    3    3    3  TRUE    3
>
> 4     3    1    4    1    4 FALSE    1
>
> 5     4    1    5    1    5 FALSE    2
>
> 6     5    1    6    1    6 FALSE    3
>
> 7     1    2    1    1    1  TRUE    1
>
> 8     2    2    2    2    2  TRUE    2
>
> 9     3    2    3    1    3 FALSE    1
>
> 10    4    2    4    1    4 FALSE    2
>
>
>
> Jason
>
>
>
> ________________________________
>
> From: Henrique Dallazuanna [mailto:wwwhsd at gmail.com]
> Sent: Thursday, August 27, 2009 11:02 AM
> To: Jason Baucom
> Cc: r-help at r-project.org; Steven Few
> Subject: Re: [R] generating multiple sequences in subsets of data
>
>
>
> Try this;
>
> stuff$row3 <- with(stuff, ave(row1, row2, FUN = seq))
>
> I don't understand the fourth column
>
> On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
> <jason.baucom at ateb.com> wrote:
>
> I'm running into a problem I can't seem to find a solution for. I'm
> attempting to add sequences into an existing data set based on subsets
> of the data.  I've done this using a for loop with a small subset of
> data, but attempting the same process using real data (200k rows) is
> taking way too long.
>
>
>
> Here is some sample data and my ultimate goal
>
>> row1<-c(0,1,2,3,4,5,1,2,3,4)
>
>> row2<-c(1,1,1,1,1,1,2,2,2,2)
>
>> stuff<-data.frame(row1=row1,row2=row2)
>
>> stuff
>
>  row1 row2
>
> 1     0    1
>
> 2     1    1
>
> 3     2    1
>
> 4     3    1
>
> 5     4    1
>
> 6     5    1
>
> 7     1    2
>
> 8     2    2
>
> 9     3    2
>
> 10    4    2
>
>
>
>
>
> I need to derive 2 columns. I need a sequence for each unique row2,  
> and
> then I need a sequence that restarts based on a cutoff value for row1
> and unique row2. The following table is what is -should- look like  
> using
> a cutoff of 3 for row4
>
>
>
>  row1 row2 row3 row4
>
> 1     0    1    1    1
>
> 2     1    1    2    2
>
> 3     2    1    3    3
>
> 4     3    1    4    1
>
> 5     4    1    5    2
>
> 6     5    1    6    3
>
> 7     1    2    1    1
>
> 8     2    2    2    2
>
> 9     3    2    3    1
>
> 10    4    2    4    2
>
>
>
> I need something like row3<-sequence(nrow(unique(stuff$row2))) that
> actually works :-) Here is the for loop that functions properly for
> row3:
>
>
>
> stuff$row3<-c(1)
>
> for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
> stuff$row3[i] = stuff$row3[i-1]+1}}
>
> Thanks!
>
>
>
> Jason Baucom
>
> Ateb, Inc.
>
> 919.882.4992 O
>
> 919.872.1645 F
>
> www.ateb.com <http://www.ateb.com/>
>
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> -- 
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list