[R] generating multiple sequences in subsets of data

David Winsemius dwinsemius at comcast.net
Sat Sep 12 06:27:54 CEST 2009


Have you tried running merged_cut_col$pickts through something that is  
less complex? Perhaps:

table(merged_cut_col$pickts)

... to see if there are problems with the "inner" functions? Also I  
think the as.numeric might be superfluous, since Dates are really just  
integers with some attitude,  er, attributes.

-- 
David.

On Sep 11, 2009, at 4:36 PM, Jason Baucom wrote:

> My apologies for bringing up an old topic, but still having some  
> problems!
>
> I got this code to work, and it was running perfectly fine. I tried  
> it with a larger data set and it crashed my machine, slowly chewing  
> up memory until it could not allocate any more for the process. The  
> following line killed me:
>
> merged_cut_col$pickseq<- 
> with(merged_cut_col,ave(as.numeric(as.Date(pickts)),cpid,FUN=seq))
>
> So, I thought I'd try it another way, using the transformBy in the  
> doBy package:
>
> merged_cut_col<- 
> transformBy(~cpid,data=merged_cut_col,pickseqREDO=seq(cpid))
>
> This too ran for hours until eventually running out of memory. I've  
> tried it on a beefier machine and I run in to the same problem.
>
> Is there an alternative to these methods that would be less memory/ 
> time intensive? This is a fairly simple routine I'm trying, just  
> generating sequence numbers based on simple criteria. I'm surprised  
> it's bringing my computer to its knees. I'm running about 1M rows  
> now, but doing other operations such as merges or adding new columns/ 
> rows seems fine.
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Thursday, August 27, 2009 12:48 PM
> To: Jason Baucom
> Cc: Henrique Dallazuanna; r-help at r-project.org; Steven Few
> Subject: Re: [R] generating multiple sequences in subsets of data
>
>
> On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:
>
>> I got this to work. Thanks for the insight! row7 is what I need.
>>
>>
>>
>>> checkLimit <-function(x) x<3
>>
>>> stuff$row6<-checkLimit(stuff$row1)
>
> You don't actually need those intermediate steps:
>
>> stuff$row7 <- with(stuff, ave(row1, row2, row1 < 3, FUN = seq))
>> stuff
>    row1 row2 row7
> 1     0    1    1
> 2     1    1    2
> 3     2    1    3
> 4     3    1    1
> 5     4    1    2
> 6     5    1    3
> 7     1    2    1
> 8     2    2    2
> 9     3    2    1
> 10    4    2    2
>
> The expression row1 < 3 gets turned into a logical vector that ave()
> is perfectly happy with.
>
> -- 
> David Winsemius
>
>>
>>> stuff$row7 <- with(stuff, ave(row1,row2, row6, FUN = sequence))
>>
>>> stuff
>>
>>  row1 row2 row3 row4 row5  row6 row7
>>
>> 1     0    1    1    1    1  TRUE    1
>>
>> 2     1    1    2    2    2  TRUE    2
>>
>> 3     2    1    3    3    3  TRUE    3
>>
>> 4     3    1    4    1    4 FALSE    1
>>
>> 5     4    1    5    1    5 FALSE    2
>>
>> 6     5    1    6    1    6 FALSE    3
>>
>> 7     1    2    1    1    1  TRUE    1
>>
>> 8     2    2    2    2    2  TRUE    2
>>
>> 9     3    2    3    1    3 FALSE    1
>>
>> 10    4    2    4    1    4 FALSE    2
>>
>>
>>
>> Jason
>>
>>
>>
>> ________________________________
>>
>> From: Henrique Dallazuanna [mailto:wwwhsd at gmail.com]
>> Sent: Thursday, August 27, 2009 11:02 AM
>> To: Jason Baucom
>> Cc: r-help at r-project.org; Steven Few
>> Subject: Re: [R] generating multiple sequences in subsets of data
>>
>>
>>
>> Try this;
>>
>> stuff$row3 <- with(stuff, ave(row1, row2, FUN = seq))
>>
>> I don't understand the fourth column
>>
>> On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom
>> <jason.baucom at ateb.com> wrote:
>>
>> I'm running into a problem I can't seem to find a solution for. I'm
>> attempting to add sequences into an existing data set based on  
>> subsets
>> of the data.  I've done this using a for loop with a small subset of
>> data, but attempting the same process using real data (200k rows) is
>> taking way too long.
>>
>>
>>
>> Here is some sample data and my ultimate goal
>>
>>> row1<-c(0,1,2,3,4,5,1,2,3,4)
>>
>>> row2<-c(1,1,1,1,1,1,2,2,2,2)
>>
>>> stuff<-data.frame(row1=row1,row2=row2)
>>
>>> stuff
>>
>> row1 row2
>>
>> 1     0    1
>>
>> 2     1    1
>>
>> 3     2    1
>>
>> 4     3    1
>>
>> 5     4    1
>>
>> 6     5    1
>>
>> 7     1    2
>>
>> 8     2    2
>>
>> 9     3    2
>>
>> 10    4    2
>>
>>
>>
>>
>>
>> I need to derive 2 columns. I need a sequence for each unique row2,
>> and
>> then I need a sequence that restarts based on a cutoff value for row1
>> and unique row2. The following table is what is -should- look like
>> using
>> a cutoff of 3 for row4
>>
>>
>>
>> row1 row2 row3 row4
>>
>> 1     0    1    1    1
>>
>> 2     1    1    2    2
>>
>> 3     2    1    3    3
>>
>> 4     3    1    4    1
>>
>> 5     4    1    5    2
>>
>> 6     5    1    6    3
>>
>> 7     1    2    1    1
>>
>> 8     2    2    2    2
>>
>> 9     3    2    3    1
>>
>> 10    4    2    4    2
>>
>>
>>
>> I need something like row3<-sequence(nrow(unique(stuff$row2))) that
>> actually works :-) Here is the for loop that functions properly for
>> row3:
>>
>>
>>
>> stuff$row3<-c(1)
>>
>> for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
>> stuff$row3[i] = stuff$row3[i-1]+1}}
>>
>> Thanks!
>>
>>
>>
>> Jason Baucom
>>
>> Ateb, Inc.
>>
>> 919.882.4992 O
>>
>> 919.872.1645 F
>>
>> www.ateb.com <http://www.ateb.com/>
>>
>>
>>
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> -- 
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list