[R] dynamic variable creation in lists and data frames

Marc Schwartz marc_schwartz at comcast.net
Tue Dec 5 21:21:48 CET 2006


On Tue, 2006-12-05 at 14:41 -0500, Daniel Lee Rabosky wrote:
> Hi
> 
> I have a question about the creation of variables within lists in R.  I am
> running simulations and am interested in two parameters, ESM and ESMM (the
> similarity of these names is important for my question).  I do simulations
> to generate ESMM, then plug these values into a second simulation function
> to get ESM:
> 
> x <- list()
> 
> for (i in 1:nsimulations)
> {
> 	x$ESMM[i] <- do_simulation1()
> 	x$ESM[i] <- do_simulation2(x$ESMM[i])
> }
> 
> and I return everything as a dataframe, x <- as.data.frame(x)
> 
> When I do this, I find that x$ESMM is overwritten by x$ESM for the first
> simulation.  However, x$ESM is nonetheless correctly generated using
> x$ESMM.
> 
> Thus, x$ESM[1] =  x$ESMM[1], but for the other n-thousand simulations,
> ESMM is not overwritten; the error only occurs on the first instance of
> ESM.
> 
> I think I know why this is occurring: I am creating a new variable in a
> list and assigning it a value, but when R can’t find the variable, it
> overwrites the next most similar variable (ESMM).  But it still proceeds
> to create the new variable ESM, having overwritten x$ESMM[1].  And it
> doesn’t happen for subsequent simulations, because both variables then
> exist in the list.
> 
> My questions are:
> 1) how different do variable names have to be to avoid this problem?  What
> exactly is R using to decide that ESMM is the same as ESM?
> 
> or
> 
> 2) is there something fundamentally flawed with the manner in which I
> dynamically create variables in lists, without initializing them in some
> fashion?  This approach worked fine until I noticed this issue with
> variables having similar names.
> 
> Thanks very much in advance for your help.
> 
> Dan Rabosky

This has to do with partial matching to index data frame columns and
list elements. It is the default behavior in R and if you search the
archives using:

  RSiteSearch("partial matching")

you will note prior discussions on this.

A simple example:

> x <- list()
> x
list()

> x$ESMM[1] <- 1
> x
$ESMM
[1] 1

> x$ESM[1] <- 2
> x
$ESMM
[1] 2

$ESM
[1] 2


Both values are changed, since x$ESM does not yet exist and the
assignment partially matches x$ESMM. Then x$ESM is created.

I think that in this particular situation, you might want to try:

# Create a simple function that returns pairs of random samples from 
# 'letters', which is a:z
Sim <- function()
{
   list(ESMM = letters[sample(26, 1)], 
        ESM = letters[sample(26, 1)])
}

# Run it once
> Sim()
$ESMM
[1] "l"

$ESM
[1] "z"


Now use replicate() to do this 10 times. Note the default behavior is to
simplify the returned values into a matrix. 

> x <- replicate(10, Sim())
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ESMM "x"  "q"  "c"  "f"  "e"  "f"  "y"  "d"  "z"  "h"  
ESM  "u"  "c"  "j"  "v"  "u"  "j"  "o"  "p"  "g"  "g"  


So, in your case create a function Sim() like this:

Sim <- function()
{
  ESMM <- do_simulation1()
  ESM <- do_simulation2(ESMM)
  
  list(ESMM = ESMM, ESM = ESM)
}


and then use replicate() as above.  See ?replicate for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list