[R] Dataframes and text identifier columns

Brian Willis b.h.willis at bham.ac.uk
Wed Jul 2 13:33:10 CEST 2014


Apologies I was trying to simplify the programme and missed out four input
files. The files on Andrew, Burt, Charlie  and Dave have the same format of
one factor and 13 numeric variables with repeated measurements eg.
Study	v1	v2	v3 	v4	v5	v6	v7	v8	v9	v10	v11	v12	v13
A	153	4.0	2.00	2.00	145.00	0.67	0.01	49.00	0.34	0.04	0.96	-3.24	0.04
B	96	33	3.0	13.0	47.0	0.9	0.2	4.2	0.1	0.5	0.5	-0.7	-0.7

Inp_dat is 
Case	r	p	SE	n
Andrew	0.03	0.01	0.0004	500
Burt	0.08	0.111	0.04	50
Charlie	0.04	0.022	0.0005	200
Dave	0.2	0.028	0.006	85

out_put starts as empty data frame and rows are added incrementally one for
Andrew, one for Burt etc.
If the code is
Andrew<-read.csv("/File /Andrew.csv")
Burt<-read.csv("/File /Burt.csv")
Charlie<-read.csv("/File /Charlie.csv")
Dave<-read.csv("/File /Dave.csv")

Inp_dat<- read.csv("/File/Input data.csv")


out_put<-data.frame(Case=character(), StdL=numeric(), StdPP=numeric(),
StdSE=numeric(), L=numeric(), MRPP=numeric(), MRSE=numeric(),
stringsAsFactors=FALSE)

for(i in 1:4)
{
if (i==1) b<-Andrew
if (i==2) b<-Burt
if (i==3) b<-Charlie
if (i==4) b<-Dave

pr <- Inp_dat$p[i]
SE_pr <- Inp_dat$SE[i]
r<- Inp_dat$r[i]
n<- Inp_dat$n[i]
Case<- Inp_dat$Case[i]
…

out_put[i,]<-data.frame(Case, stdL, stdPP, stdSE, L, PP, PP_SE)

}
out_put

  Case     StdL          StdPP           StdSE           L               
MRPP            MRSE
1    1  19.466823   0.16432300   0.03137456   26.002294   0.2080145  
0.03804692
2    2   2.334130    0.22566939   0.08962662    5.095703    0.3888451  
0.08399101
3    3   2.588678    0.05502765   0.00454159   42.058326   0.4861511  
0.02128030
4    4   7.857898    0.18457822   0.04372297    4.705487    0.1193687  
0.01921609



The Cases are labelled as integers 1 corresponding to Andrew, 2
corresponding to Burt etc. instead of the intended text labels Andrew, Burt,
Charlie and Dave. 

Note all other columns are correct. Furthermore

str(Case) 
Factor w/ 4 levels "Andrew","Burt",..: 4

str(out_put)

'data.frame':   4 obs. of  7 variables:
 $ Case  : chr  "1" "2" "3" "4"
 $ StdL : num  19.47 2.33 2.59 7.86
etc


I have tried changing the line

Case<- Inp_dat$Case[i]
 to

Case<- levels(Inp_dat$Case)[i]

and this gives the following output

  Case     StdL          StdPP           StdSE           L               
MRPP            MRSE
1    1  19.466823   0.16432300   0.03137456   26.002294   0.2080145  
0.03804692
2    1   2.334130    0.22566939   0.08962662    5.095703    0.3888451  
0.08399101
3    1   2.588678    0.05502765   0.00454159   42.058326   0.4861511  
0.02128030
4    1   7.857898    0.18457822   0.04372297    4.705487    0.1193687  
0.01921609

str(Case) 

chr "Dave"

and 

str(out_put)

'data.frame':   4 obs. of  7 variables:
 $ Case  : chr  "1" "1" "1" "1"
 $ StdL : num  19.47 2.33 2.59 7.86
etc


I’ve also tried adding, as suggested the stringsAsFactors=FALSE to the
Inp_dat<- read.csv("/File/Input data.csv", stringsAsFactors=FALSE)

This gives the same as the 2nd output above.




--
View this message in context: http://r.789695.n4.nabble.com/Dataframes-and-text-identifier-columns-tp4693184p4693389.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list