[R] Help with Order

Duncan Murdoch murdoch at stats.uwo.ca
Mon Jan 11 13:49:14 CET 2010


On 11/01/2010 7:37 AM, Steve Sidney wrote:
> Dear List
> 
> As a fairly new R programmer I seem to have run into a strange problem - 
> probably my inexperience with R
> 
> After reading and merging successive files into a single data frame, I find 
> that order does not sort the data as expected.
> 
> I have multiple references in each file but each file refers to measurement 
> data obtained at a different time.
> 
> Here's the code
> 
> library(reshape)
> # Enter file name to Read & Save data
> FileName=readline("Enter File name:\n")
> # Find first occurance of file
> for ( round1 in 1 : 6) {
> ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
> if (file.exists(ReadFile))
> break
> }
> 
> x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
> for ( round2 in (round1+1) : 6) {
> #
> ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
> if (file.exists(ReadFile)) {
> y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
>     if (round2 == (round1 +1))
>     z=data.frame(merge(x,y,all=TRUE))
>     z=data.frame(merge(y,z,all=TRUE))
> }
> }
> ordered = order(z$lab_id)
> 
> results = z[ordered,]
> 
> res = data.frame( 
> lab=results[,"lab_id"],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r 
> = results[,"rnd"])
> 
> 
> #
> # Establish no of samples recorded
> nsmpls = length(res[,c("lab")])
> 
> # Evaluate Z_scores for Between Lab Results
> for ( i in 1 : nsmpls) {
> if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
> res[i,"pf_zbw"]=1
> }
> # Evaluate Z_scores for Within Lab Results
> for ( i in 1 : nsmpls) {
> if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
> res[i,"pf_zwi"]=1
> }
> 
> dd = melt(res, id=c("lab","r"), "pf_zbw")
> b = cast(dd, lab ~ r)
> If anyone could see why the ordering only works for about 55 of 70 records 
> and could steer me in the right direction I would be obliged

I can't try out your code, but I'd guess it's due to conversion of 
strings to factors.  Sorting factors will sort them by their numerical 
value, not by the strings.

So the solution is to set stringsAsFactors=FALSE, either in each 
read.csv call, or globally with options(stringsAsFactors=FALSE).

Duncan Murdoch



More information about the R-help mailing list