[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Thu Dec 18 11:28:46 CET 2014

What you are describing sounds like a very spreadsheet-y thing. 

- The information is already IN your dataframe, and easy to get out by subsetting. Depending on your usecase, that may actually be the "best". 

- If the number of CaseIDs is large, I would use a hash of lists (if the data is sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that may be the best. (Cf package hash, and explanations there). 

- If it must be the spreadsheet-y thing, you could make a matrix with rownames and colnames taken from unique() of your respective dataframe. Instead of 1 and NA I probably would use TRUE/FALSE. 

- If it takes less time to wait for the results than to look up how apply() works, you can write a simple loop to populate your matrix. Otherwise apply() is much faster. 

- You could even use a loop to build the datastructure, checking for every cbind() whether the value in column 1 already exists in the table - but that's terrible and would make a kitten die somewhere on every iteration.

All of these are possible, and you haven't told us enough about what you want to achieve to figure out what the "best" is. If you choose one of the options and need help with the code, let us know.

Cheers,
B.

On Dec 17, 2014, at 10:15 PM, bcrombie <bcrombie at utk.edu> wrote:

> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID’s can be repeated because there can be up to 14 Primary.Viol.Type’s
> per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, where the
> first column is CaseID, and the rest are the 14 primary viol. types.  The
> CaseID column will contain a list of the unique CaseID’s (no replicates) and
> for each of their rows, there will be a “1” under  a column corresponding to
> a primary violation type recorded for that CaseID.  So, technically, there
> could be zero to 14 “1’s” in a CaseID’s row.
> 
> # For example, the row for CaseID '1015285' above would have a “1” under
> “AS.Age”, “HS.Hours”, “RK.Records_CL”, and “V.Poster_Other”, but have "NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.