[R] adjusted values for CI in a within-subjects design

Thu Jun 18 14:16:30 CEST 2015

Dear R-ers,
I am trying to adjust the values of a within-items, within-subjects design
(the experimental conditions are within subjects), to calculate between
subjects confidence intervals (CI). I am following the recommendations from
O'Brien & Cousineau (2014; see also Cousineau, 2005; Morey, 2008 for
similar solutions). So, formula is the following.

# formula for corrected CI:
# Y = Xsj - Xj. + X..
# where...
# Xsj = single value of a trial of a time window per participant
# Xj. = participants mean on that conditions across trials
# X.. = overall mean on that condition
# W = sqrt(J/(J-1))*(Y - Y.j) + Y.j
# where...
# J = sqrt(f/(f-1))
# f = total number of measures per subject
# Y.j = mean for a condition across participants
# W = the corrected value from which we can calculate the CI between
subjects.

I've written a code that does that using a dataset of random values (0,1),
but with the same structure that the actual dataset for which I hope to
calculate corrected CI.

fixprop subj trial time
        1    1     1    1
        0    1     1    2
        1    1     1    3
        0    1     1    4
        0    1     1    5
        0    1     1    6

The experiments deliver the time course of an effect (similar to
longitudinal data), meaning, we have N time steps in which the effect is
modulated. I've tested the script with this dummy dataset of 4
participants, 10 items, and 400 time steps, and it works nicely. The tricky
part here is that in the real experiments, we have many more participants,
items and time steps. Thus, the adjustment needs to be done many many
times.

With the dummy dataset the process takes about 6 seconds,
> proc.time() - ptm
   user  system elapsed
   4.53    0.06    6.03

but when I've added a bit more data (10 participants, 125 trials, 400 time
steps), the scritp takes more than an hour,
> proc.time() - ptm
   user  system elapsed
3483.64  879.31 4456.86

So, I don't even want to try doing this with real data, in which we have
thousands of times steps, and generally over 50 participants (although less
items in general, perhaps 40 or 50).

QUESTION: does anyone know how could I optimize my script, such as it does
not take forever?

Here is the script.

library(doBy)
library(plotrix)
library(matrixStats)
library(doBy)
library(bear)
library(ggplot2)
library(reshape)

rm(list=ls())       # clear memory
setwd (??) # set directory
infile = "test.txt"                                                    #
"test.txt" is the name of the fixation report
data = read.delim(file=infile, header=T, sep="\t")      # load the file
data = data[with(data, order(subj,trial)), ]                # data need to
be organized by part, by trials
head(data)

subj = unique(data$subj)
np=length(subj); np # how many participants
trial = unique(data$trial)
nt=length(trial); nt # how many items
timewindows = unique(data$time)
twsn=length(timewindows); twsn # how many time steps

critcoln = 1 #column in which we find the dependent variable
ncoln = 4 #total number of columns of your file
f = 2 #total number of conditions per subject

tm <- cbind(rep(c(critcoln:twsn), each=(nt*np)))
newvar <- cbind(rep(c((critcoln+ncoln):(critcoln+ncoln)),
each=(nt*np*twsn)))
subj <- cbind(rep(1:np, each=nt, times=twsn))
count <-cbind(rep(c(1:1), each=(nt*np*twsn)))

####################################

X..data = summaryBy(fixprop ~ time, FUN = mean, keep.names=T, data=data)
Xj.data = summaryBy(fixprop ~ subj + time, FUN = mean, keep.names=T,
data=data)

ptm <- proc.time()
prev_tw = 0
prev_subj = 0
j = 0
t = 0
for (i in 1:(nrow(data)))
{
  curr_tw = tm[i]
  curr_subj = subj[i]
  if (prev_subj < curr_subj)
  {j = j + 1}
  Y. = data[i,critcoln] - Xj.data[j,3]
  if (prev_tw < curr_tw)
  {t = t + 1}
  Y = Y. + X..data[t,2]
  data[i,newvar[i]] <- Y
  prev_tw = curr_tw
  prev_subj = curr_subj
}
proc.time() - ptm
colnames(data)[ncoln+1] <- 'fixprop_adj'

Y.jdata = summaryBy(fixprop_adj ~ subj + time, FUN = mean, keep.names=T,
data=data)

J = sqrt(f/(f-1)) #correction factor
newvar <- cbind(rep(c((critcoln+ncoln+1):(critcoln+ncoln+1)),
each=(nt*np*twsn)))

prev_tw = 0
t = 0
for (i in 1:(nrow(data)))
{
  curr_tw = tm[i]
  if (prev_tw < curr_tw)
  {t = t + 1}
  W = J*((data[i,ncoln+1]) - Y.jdata[t,3]) + Y.jdata[t,3]
  data[i,newvar[i]] <- W
  prev_tw = curr_tw
}
proc.time() - ptm
colnames(data)[ncoln+2] <- 'fixprop_final'

That's all. The processes that really take long are the "for" loops, I know
loops are not the best, but I couldn't think of a process that can do this
better so far...

Any comments, suggestions, criticisms and questions are welcome...
Cheers,
Ernesto.

	[[alternative HTML version deleted]]