[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Crombie, Burnette N bcrombie at utk.edu
Thu Dec 18 14:09:26 CET 2014


I want to achieve a table that looks like a grid of 1's for all cases in a survey.  I'm an R beginner and don't have a clue how to do all the things you just suggested.  I really appreciate the time you took to explain all of those options, though.  -- BNC

-----Original Message-----
From: Boris Steipe [mailto:boris.steipe at utoronto.ca] 
Sent: Thursday, December 18, 2014 5:29 AM
To: Crombie, Burnette N
Cc: r-help at r-project.org
Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

What you are describing sounds like a very spreadsheet-y thing. 

- The information is already IN your dataframe, and easy to get out by subsetting. Depending on your usecase, that may actually be the "best". 

- If the number of CaseIDs is large, I would use a hash of lists (if the data is sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that may be the best. (Cf package hash, and explanations there). 

- If it must be the spreadsheet-y thing, you could make a matrix with rownames and colnames taken from unique() of your respective dataframe. Instead of 1 and NA I probably would use TRUE/FALSE. 

- If it takes less time to wait for the results than to look up how apply() works, you can write a simple loop to populate your matrix. Otherwise apply() is much faster. 

- You could even use a loop to build the datastructure, checking for every cbind() whether the value in column 1 already exists in the table - but that's terrible and would make a kitten die somewhere on every iteration.

All of these are possible, and you haven't told us enough about what you want to achieve to figure out what the "best" is. If you choose one of the options and need help with the code, let us know.

Cheers,
B.





On Dec 17, 2014, at 10:15 PM, bcrombie <bcrombie at utk.edu> wrote:

> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID's can be repeated because there can be up to 14 
> Primary.Viol.Type's per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, 
> where the first column is CaseID, and the rest are the 14 primary 
> viol. types.  The CaseID column will contain a list of the unique 
> CaseID's (no replicates) and for each of their rows, there will be a 
> "1" under  a column corresponding to a primary violation type recorded 
> for that CaseID.  So, technically, there could be zero to 14 "1's" in a CaseID's row.
> 
> # For example, the row for CaseID '1015285' above would have a "1" 
> under "AS.Age", "HS.Hours", "RK.Records_CL", and "V.Poster_Other", but have "NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row
> -of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list