[R] flexible approach to subsetting data

arun smartpink111 at yahoo.com
Tue Jul 23 18:04:45 CEST 2013


Sorry, a mistake
It should be:
res<-reshape(df1,sep=".",varying=list(c("sim","sim.1"),c("X1","X1.1"),c("X2","X2.1"),c("X3","X3.1")),direction="long",timevar="m")[,-6]
row.names(res)<- 1:nrow(res)
head(res,2)
#  m sim X1 X2 X3
#1 1   1  5  4  5
#2 1   1  4  3  2
A.K.




----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Andrea Lamont <alamont082 at gmail.com>
Cc: R help <r-help at r-project.org>
Sent: Tuesday, July 23, 2013 12:00 PM
Subject: Re: [R] flexible approach to subsetting data

Hi,
It is better to provide a reproducible example using ?dput()
df1<- read.table(text="
sim   X1   X2   X3   sim.1   X1.1    X2.1    X3.1
1     5    4     5        1           4          3        7
1     4    3     2        1           7          4         1
1     3    9     4        1           5          8         4
2     6    4     8        2           3          9         5
2     7    8     4        2           5          4         8
2     9    6     7        2           9          5         6
",sep="",header=TRUE)
res<-reshape(df1,sep=".",varying=list(c("sim","sim.1"),c("X1","X1.1"),c("X2","X2.1"),c("X3","X3.1")),direction="long",timevar="m")[,-5]
res
    m sim X1 X2 id
1.1 1   1  5  4  1
2.1 1   1  4  3  2
3.1 1   1  3  9  3
4.1 1   2  6  4  4
5.1 1   2  7  8  5
6.1 1   2  9  6  6
1.2 2   1  4  3  1
2.2 2   1  7  4  2
3.2 2   1  5  8  3
4.2 2   2  3  9  4
5.2 2   2  5  4  5
6.2 2   2  9  5  6
 
A.K.




----- Original Message -----
From: Andrea Lamont <alamont082 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 23, 2013 10:35 AM
Subject: [R] flexible approach to subsetting data

Hello:

I am running a simulation study and am stuck with a subsetting problem.

Here is the basic issue:
I generated data and am running a simulation that uses multiple imputation.
For each generated dataset, I used multiple imputation.  The resultant
dataset is in wide for where each imputation is recorded as a separate
column (though the different simulations are stacked).  Here is an example
of what it looks like:

sim   X1   X2   X3   sim.1   X1.1    X1.1    X3.1
1         #    #     #        #           #          #         #
1         #    #     #        #           #          #         #
1         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #
2         #    #     #        #           #          #         #

sim refers to the simulated/generated dataset. X1-X3 are the values for the
first imputed dataset, X1.1-X3.1 are the values for the second imputed
dataset.

The problem is that I want the data to be in long format, like this:

sim m X1 X2 X3
1  1   #   #    #
1  2   #   #    #
2  1   #   #    #
2  2   #   #    #

where m is the imputation number.
This will allow me to do cleaner calculations (e.g. X3-X1).

I know I can subset the data manually - e.g. [,1:10] and save this to
separate datasets then  rbind; however, I'm looking for a more flexible
approach to do this.  This manual approach would be quite tedious as number
of imputations (and therefore number of columns) increased (with only 10
imputations, there are roughly 810 columns). Also,I would like to
avoid having to recode each time I change the number of imputations.

THe same is true for the reshape function, which would require naming
a huge number of columns and edits each time 'm' changes.


Is there a flexible way to approach this? I'm inclined to use a for loop,
but know that 1) this is generally inefficient and 2) am having trouble with
the coding regardless.

Any suggestions are appreciated.

Thanks,
Andrea


-- 
Andrea Lamont, MA
Clinical-Community Psychology
University of South Carolina
Barnwell College
Columbia, SC 29208

Please consider the environment before printing this email.

CONFIDENTIAL: This transmission is intended for the use of the
individual(s) or entity to which it is addressed, and may contain
information that is privileged, confidential, and exempt from disclosure
under applicable law. Should the reader of this message not be the intended
recipient(s), you are hereby notified that any dissemination, distribution,
or copying of this communication is strictly prohibited.  If you are not
the intended recipient, please contact the sender by reply email and
destroy/delete all copies of the original message.

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list