[R] flexible approach to subsetting data

Tue Jul 23 19:49:53 CEST 2013

On Jul 23, 2013, at 10:01 AM, Adams, Jean wrote:

> Check out the reshape() function of the reshape package.  Here's one of the
> examples from ?reshape.
> 
> Jean
> 
> 
> library(reshape)   # No,  at least not for the reshape-function

The reshape function is from the 'base' package. The 'reshape' and 'reshape2' packages were written (at least in part) because the 'reshape'-function was so difficult to understand.

If you do choose to use the reshape2 package, which is well-respected and often extremely helpful, the function you will want to start with is 'melt'.

> long <- reshape(wide, direction="long")

I don't think this example will be particularly helpful since the initial direction is "long" (from "wide") and more input would be needed.

> wide
> long
> 
> 
> 
> On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont <alamont082 at gmail.com> wrote:
> 
>> Hello:
>> 
>> I am running a simulation study and am stuck with a subsetting problem.
>> 
>> Here is the basic issue:
>> I generated data and am running a simulation that uses multiple imputation.
>> For each generated dataset, I used multiple imputation.  The resultant
>> dataset is in wide for where each imputation is recorded as a separate
>> column (though the different simulations are stacked).  Here is an example
>> of what it looks like:
>> 
>> sim   X1   X2   X3   sim.1   X1.1    X1.1    X3.1

>> 1         #    #     #        #           #          #         #
>> 1         #    #     #        #           #          #         #
>> 1         #    #     #        #           #          #         #
>> 2         #    #     #        #           #          #         #
>> 2         #    #     #        #           #          #         #
>> 2         #    #     #        #           #          #         #
>> 
>> sim refers to the simulated/generated dataset. X1-X3 are the values for the
>> first imputed dataset, X1.1-X3.1 are the values for the second imputed
>> dataset.
>> 
>> The problem is that I want the data to be in long format, like this:
>> 
>> sim m X1 X2 X3
>> 1  1   #   #    #
>> 1  2   #   #    #
>> 2  1   #   #    #
>> 2  2   #   #    #
>> 
>> where m is the imputation number.
>> This will allow me to do cleaner calculations (e.g. X3-X1).
>> 
>> I know I can subset the data manually - e.g. [,1:10] and save this to
>> separate datasets then  rbind; however, I'm looking for a more flexible
>> approach to do this.  This manual approach would be quite tedious as number
>> of imputations (and therefore number of columns) increased (with only 10
>> imputations, there are roughly 810 columns). Also,I would like to
>> avoid having to recode each time I change the number of imputations.
>> 
>> THe same is true for the reshape function, which would require naming
>> a huge number of columns and edits each time 'm' changes.

If the columns are named regularly, then 'reshape' will attempt to split properly without an explicit naming. Details and a better description of the problem might allow more specific answers to emerge. The fact that the first instances have no numeric indicators may be a problem for the algorithm. 

Why not post dput(head( dfrm[ ,1:12]))

-- 
David.

>> 
>> 
>> Is there a flexible way to approach this? I'm inclined to use a for loop,
>> but know that 1) this is generally inefficient and 2) am having trouble
>> with
>> the coding regardless.
>> 
>> Any suggestions are appreciated.
>> 
>> Thanks,
>> Andrea
>> 
>> 
>> --
>> Andrea Lamont, MA

David Winsemius
Alameda, CA, USA