[R] how to "apply" functions to unbalanced data in long format by factors......cant get "by" or "aggregate" to work

Petr Pikal petr.pikal at precheza.cz
Thu Mar 8 08:00:53 CET 2007


Hi

you can use aggregate to create table of medians

with(mydata, aggregate(Y, list(time, tratment, expRep,....), median)

repeats of unique factors
either by rle or aggregate with length function

Then you can do replication by

norep <- rep(your.median, each = your replicates)

Regards
Petr

On 7 Mar 2007 at 17:25, ALAN SMITH wrote:

Date sent:      	Wed, 7 Mar 2007 17:25:54 -0600
From:           	"ALAN SMITH" <alansmith2 at gmail.com>
To:             	r-help at stat.math.ethz.ch
Subject:        	[R] how to "apply" functions to unbalanced data in long format by
	factors......cant get "by" or "aggregate" to work

> Hello R users,
> 
> Problem.......I do not understand how to use "aggregate","by", or the
> appropriate "apply" to perform a function on data with more than one
> factor on unbalanced data... I have a data frame in the long format
> that does not contain balanced data. The ID is a unique identifier
> corresponding to the experimental unit that will later be examined by
> ANOVA, T-tests etc. Y is the data generated from the experiment.  The
> factors represent the differences between each sample or "run"
> measured.
> 
> str(mydata)  ### sample of table at bottom of email ###
> 'data.frame':   129982 obs. of  6 variables:
>  $ ID    : num  7 7 7 7 7 7 8 8 8 8 ...
>  $ time     : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1
>  ... $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2
>  1 1 ... $ expREP   : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1
>  3 1 1 1 1 2 2 ... $ techREP  : Factor w/ 3 levels
>  "techREP1","techREP2",..: 3 2 1 1 1 3
> 1 3 3 2 ...
>  $ Y             : num  14.4 14.1 14.2 13.8 14.1 ...
> 
> Could someone please help with doing something like the following 1. I
> would like to find the median for each unique combination of factors
> using the data in the   long format (like finding the median of a
> single column of data). 2. Create a new column where the median is
> repeated for the number of rows of the unique factor combination 3. I
> would like to learn the most efficient way to do this because I want
> to avoid recreating the table from scratch with many commands like the
> series below. I will have to perform this operation on many different
> data sets some, with many more factors then this example.
> 
> ### help me learn to use an apply or other command that will do the
> following #####
> m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" &
> mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
> m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" &
> mydata$expREP=="expREP1" & mydata$techREP=="techREP1"] m2<-median(m1)
> m3<-cbind(ID=m0,time=rep("24hr",length(m1)),
> treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)),
> techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1)))
> ######### I would like to avoid writing the above hundreds of times
> ######
> 
> I am able to reshape into wide format and then find the column
> medians. However restacking the data and regenerating the factors
> becomes very very messy on data sets with 150 columns.  I am able to
> preform this analysis is SAS easily using BY, but I would like to know
> how to do it in R.
> 
> 
> I have tried these commands in a number of different variations with
> no luck and similar error messages
>  test1<-aggregate(mydata[,-1],
> list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
> ,median, na.rm=T)
> Error in median.default(X[[1]], ...) : need numeric data ### Y in
> numeric####
> 
> test1<-by(mydata[,-1],
> list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
> ,median, na.rm=T)
> Error in median.default(data[x, ], ...) : need numeric data
> 
> Thanks
> Alan
> winXP R 2.4.1
> 
> 
> #####Example data frame######
> mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8,
> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23,
> 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 47, 47,
> 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
> 47, 47), time = structure(as.integer(c(1, 1, 1, 1, 2, 2, 2, 1, 1, 1,
> 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2,
> 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2,
> 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1,
> 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1,
> 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2,
> 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2,
> 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class = "factor"),
>     treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2,
>     1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1,
>     2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2,
>     1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1,
>     1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2,
>     2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2,
>     1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2,
>     2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1,
>     1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
>     2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1,
>     2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
>     2, 1)), .Label = c("control", "trt"), class = "factor"),
>     expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2,
>     2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3,
>     3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1,
>     1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2,
>     2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2,
>     3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1,
>     1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1,
>     3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2,
>     2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1,
>     1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3,
>     3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2,
>     3)), .Label = c("expREP1", "expREP2", "expREP3"), class =
>     "factor"), techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1,
>     3, 3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2, 3, 1,
>     1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2, 3, 1, 3, 2, 1,
>     1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3, 2, 3, 1, 3, 2, 2, 1, 2,
>     1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3, 1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3,
>     2, 1, 3, 1, 3, 2, 3, 1, 1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2,
>     3, 2, 2, 1, 2, 2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1,
>     3, 2, 3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1, 2,
>     1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1, 3, 2, 3, 3,
>     2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2, 2, 3)), .Label =
>     c("techREP1", "techREP2", "techREP3"), class = "factor"), log2Abun
>     = c(14.4233129144089, 14.052822741429, 14.2281422686467,
>     13.8492096005693, 14.076481601207, 14.2139395740777,
>     14.3399195756207, 14.3625602954496, 14.0141948668145,
>     14.0980320829605, 14.3152203363759, 14.4528846974866,
>     13.9591869268449, 14.4064043323413, 14.0403753485321,
>     14.2285932517829, 14.1259784261721, 13.5925738310379,
>     13.5830827675029, 13.0280787227049, 15.0198078807043,
>     12.8423503434138, 12.645883554519, 13.4644181177386,
>     12.8399910705399, 12.7879025593914, 12.4978518369511,
>     14.3949985145017, 12.8670856466168, 12.9749522735341,
>     13.3456824481868, 13.4557125040673, 12.8989792046225,
>     16.0609491915918, 13.6795900568273, 16.456466720182,
>     13.6145948287653, 13.2604785448039, 14.8573006848798,
>     13.1382718001722, 13.690761908446, 14.0557060971613,
>     13.7495552174335, 13.6336764098923, 13.7844303674846,
>     15.9518993688317, 13.2452555803066, 13.1930632791304,
>     12.1919845133603, 13.8710388986595, 13.6375305515253,
>     12.5919897676151, 17.4797250127015, 17.4014712120155,
>     17.5948202702163, 12.6031626795344, 17.8287811089804,
>     11.3613955331659, 15.8064741020529, 15.1007855146758,
>     16.0553036215393, 15.7553570530353, 15.9747058600332,
>     15.776715745005, 15.8588066550904, 16.2935434944118,
>     16.271207673964, 16.3660489506706, 16.3273070282017,
>     15.7632383068689, 14.6030467398838, 14.7118820283521,
>     14.7577545959238, 14.7315311764619, 14.8250084466403,
>     15.6652803936783, 15.8249587405285, 15.6558660906456,
>     15.5387042614836, 14.8487696278309, 15.5477380355109,
>     15.9451465974129, 16.196755792715, 15.9999119421954,
>     15.8660714836595, 15.9406577104549, 15.8754613979164,
>     16.0358944927638, 16.1785092456522, 16.1992122284106,
>     15.8087128474547, 15.9373968104322, 16.1432636222427,
>     16.2412011305004, 15.9488234774507, 15.7820255767261,
>     15.7730361533934, 15.7459893802453, 20.7777738189812,
>     21.7489122647969, 21.0374490930058, 20.9765158780184,
>     21.0464959041766, 21.6790715518273, 21.8021013715842,
>     20.7652083875471, 20.6663696521617, 20.3963413756589,
>     20.7983642126234, 20.1864915044977, 20.4422216681915,
>     20.59064186918, 20.6964531077756, 20.6822196619653,
>     20.4532414913665, 20.8126113450884, 20.4397608946311,
>     21.4603719009067, 21.5318145314919, 21.0400816517662,
>     21.0466431076593, 20.7459819969019, 20.6723053403015,
>     20.4793421418014, 20.6432035537608, 20.6831942471622,
>     21.6913537667357, 20.6562913013787, 21.0940693071186,
>     20.9473294479256, 20.5087271424267, 16.0871520250047,
>     16.3816612332698, 16.998645516939, 15.7912392142223,
>     14.5058735666446, 13.6035104425928, 14.4369066987207,
>     14.6998435295626, 14.6818972267862, 14.1086877961546,
>     14.3539049235617, 15.40862828087, 15.0657947671893,
>     14.8615716011254, 14.5538692431961, 14.2397476835569,
>     13.4381420777437, 13.4499224158638, 13.6887966810545,
>     14.6550275257018, 13.500966330283, 14.9271297886953,
>     14.7405186421119, 15.0047910398043, 14.7051463678038,
>     14.8325933769599, 12.9854861991046, 13.4203550220891,
>     15.399010832952, 15.4064707685293, 15.0953970227926,
>     15.0712109416537, 15.7587957644032, 15.0013202225009,
>     15.7608498673217, 14.7604080920677, 14.2478533598602,
>     14.4140245098782, 14.7936541075062, 14.7684428120549,
>     14.595607155062, 16.1507389488284, 16.4915712924337,
>     14.490161446684, 14.721633263063, 14.4341721012904,
>     15.8747652729112, 14.543333961671, 14.8633635585377,
>     14.6696601802386, 13.3020676725265, 14.0190694293311,
>     15.2168973938334, 12.6304946615056, 12.1972166931101,
>     12.7960396088298, 14.4285564621952, 14.5308330346953,
>     14.1496677436943, 14.0823985634278, 12.8407779235951,
>     14.6543003749437, 14.3202364452416, 15.1723493709662,
>     14.0744760007345, 14.8132801684508, 12.9183042336999,
>     14.5225202325766, 13.742309436084)), .Names = c("cpdID", "time",
>     "treatment",
> "expREP", "techREP", "Y")))
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list