[R] Manage an unknown and variable number of data frames

David Winsemius dwinsemius at comcast.net
Sun Sep 13 06:35:14 CEST 2009


On Sep 12, 2009, at 10:13 PM, Mark Knecht wrote:

> Hi,
>   In the code below I create a small data.frame (dat) and then cut it
> into different groups using CutList. The lists in CutList allow to me
> choose whatever columns I want from dat and allow me to cut it into
> any number of groups by changing the lists. It seems to work OK but
> when I'm done I have a variable number of data frames what I need to
> do further operations on and I don't know how to manage them as a
> collection.

List processing.

>
>   How do experience R coders handle keeping all this straight so that
> if I add another column from dat and more groups in the cuts it all
> stays straight? I need to send each dataf rame to another function to
> add columns of specific data calcuations to each of them.
>
>   Best for me (I think) would be to enumerate each data frame using
> the row.name number from CutTable if possible, but that's just my
> thought. If each data frame became an element of CutTable then I'd
> always know where they are. Really I'm needing to get a handle on
> keeping a variable and unknown number of these things straight.
>
> Thanks,
> Mark
>
> dat = data.frame(
> 	a=round(runif(100,-20,30),2),
> 	b=round(runif(100,-40,50),2)
> 	)
>
> # Give each cut list a name matching the column in dat that you
> # want to use as criteria for making the cut.
> # Create any number of cuts in each row.
>
> CutList = list(
> 	a=c(-Inf,-10,10,Inf),
> 	b=c(-Inf,0,20,Inf)
> 	)
>
> CutResults = mapply(cut,x=dat[,names(CutList)],CutList,SIMPLIFY=FALSE)
> CutTable = as.data.frame(table(CutResults))
>
> CutResultsDF = as.data.frame(CutResults)
> head(CutResultsDF, n=15)
>
> dat$aRange = CutResultsDF$a
> dat$bRange = CutResultsDF$b
> head(dat, 15)

You could have gotten the same labeling of columns into categories  
with a combination of ave and cut.

 > dat$arng2 <- ave(dat$a, FUN=function(x) cut(x, breaks=CutList$a) )
 > dat
          a      b     aRange    bRange arng2
1   -10.45  43.30 (-Inf,-10] (20, Inf]     1
2     9.09 -33.66   (-10,10]  (-Inf,0]     2
3    29.27  18.34  (10, Inf]    (0,20]     3
4    28.92  46.55  (10, Inf] (20, Inf]     3
5     2.07  -8.23   (-10,10]  (-Inf,0]     2
6    18.28 -35.13  (10, Inf]  (-Inf,0]     3
7   -16.26  40.59 (-Inf,-10] (20, Inf]     1
snip


>
>
> # I don't want to do the following as it doesn't
> # get managed automatically.
>

It is possibly unclear what you are hoping to accomplish with that  
subset(subset(.)) construction. Are you trying to accomplish what a  
logical conjunction for subset= , coupled with a select= parameter  
would do inside a single subset?

 > subset(dat, aRange==CutTable$a[1] & bRange==CutTable$b[1],  
select=c("a","b") )
         a      b
26 -17.50 -18.46
28 -15.48 -34.37
31 -10.04 -21.55
38 -11.73 -29.40
46 -18.28 -17.42
95 -11.62 -22.94
96 -12.16  -1.57
97 -15.44 -19.89

> Subset1 = subset(subset(dat, ,
> Subset2 = subset(subset(dat, aRange==CutTable$a[2]), bRange==CutTable 
> $b[2])[1:2]
> Subset3 = subset(subset(dat, aRange==CutTable$a[3]), bRange==CutTable 
> $b[3])[1:2]
> Subset4 = subset(subset(dat, aRange==CutTable$a[4]), bRange==CutTable 
> $b[4])[1:2]

You could "automate" that with
 > work.list <- lapply(1:4, function(x) subset(dat, aRange==CutTable 
$a[x] & bRange==CutTable$b[x], select=c("a","b")  )  )
 > work.list[[1]]  # first element of a 4 element list
         a      b
26 -17.50 -18.46
28 -15.48 -34.37
31 -10.04 -21.55
38 -11.73 -29.40
46 -18.28 -17.42
95 -11.62 -22.94
96 -12.16  -1.57
97 -15.44 -19.89


> Subset1
> Subset2
> Subset3
> Subset4
>
> CutTable
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list