[R] Transforming a dataframe into a response/predictor matrix

Ki L. Matlock kilynn at uark.edu
Fri Nov 13 00:14:04 CET 2009


I currently have a data frame whose rows correspond to each student and whose columns are different variables for the student, as shown below:

 Lastname Firstname CATALOG_NBR           Email StudentID   EMPLID     Start
1     alastname     afirstname        1213 *@uark.edu  10295236 # 12/2/2008
2     anotherlastname     anotherfirstname        1213 **@uark.edu  ## 10295236 9/3/2008
  Xattempts Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19
1         1  1  1  0  0  0  0  0  0  0   1   0   0   1   1   0   1   1   0   1
2         1  1  1  1  1  1  0  1  0  0   1   1   0   0   1   0   0   0   0   1
  Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Score Form CRSE_GRADE_OFF
1   0   0   0   0   0   0   0   0   0   1   0   0   0     9    E              D
2   0   0   0   0   0   0   0   0   0   0   1   1   0    13    G              D

Each student took a pre- and post- test indicated by the date under "Start", column 7.  (a date, mm/dd/yyyy, whose mm is 08 or 09 is pre-test; a date whose mm is 11 or 12 is post-test.  This test was one of four forms, E, F, G, or H, listed under "Form", column 42. Each test had 32 questions, Q1 to Q32, with a binary 1 indicating the student answered correctly to this question and 0 if incorrectly.

I am needing a matrix, y, with five columns labeled: response, i, j, r, s.  Column 1 indicates the response (0 or 1) for i-th student, on the j-th question (1:32), on the r-th form (E,F,G,H- these could be changed to numeric 1 for E, 2 for F, etc.), on the s-th test (pre or post indicated by a binary 0 for pre, 1 for post).

The data-set is very lengthy of approximately 2000 rows.  An efficient way to transform this data into the desired matrix would be very helpful.  Thank you.




More information about the R-help mailing list