[R] grep problem decimal points looping

David Winsemius dwinsemius at comcast.net
Tue Aug 10 16:12:51 CEST 2010


On Aug 10, 2010, at 9:51 AM, David Winsemius wrote:

>
> On Aug 10, 2010, at 9:17 AM, RCulloch wrote:
>
>>
>> Hi R Users,
>>
>> I have been trying to work out how to rename column names using grep,
>> basically I have generated these column names using tapply:
>>
>> [1] "NAME"  "X1.1"  "X2.1"  "X3.1"  "X4.1"  "X5.1"  "X6.1"  "X7.1"   
>> "X8.1"
>> [10] "X1.2"  "X2.2"  "X3.2"  "X4.2"  "X5.2"  "X6.2"  "X7.2"   
>> "X8.2"  "X1.3"
>> [19] "X2.3"  "X3.3"  "X4.3"  "X5.3"  "X6.3"  "X7.3"  "X8.3"   
>> "X1.5"  "X2.5"
>> [28] "X3.5"  "X4.5"  "X5.5"  "X6.5"  "X7.5"  "X8.5"  "X1.6"   
>> "X2.6"  "X3.6"
>> [37] "X4.6"  "X5.6"  "X6.6"  "X7.6"  "X8.6"  "X1.8"  "X2.8"   
>> "X3.8"  "X4.8"
>> [46] "X5.8"  "X6.8"  "X7.8"  "X8.8"  "X1.9"  "X2.9"  "X3.9"   
>> "X4.9"  "X5.9"
>> [55] "X6.9"  "X7.9"  "X8.9"  "X1.10" "X2.10" "X3.10" "X4.10" "X5.10"
>> "X6.10"
>> [64] "X7.10" "X8.10" "X1.12" "X2.12" "X3.12" "X4.12" "X5.12" "X6.12"
>> "X7.12"
>> [73] "X8.12" "X1.13" "X2.13" "X3.13" "X4.13" "X5.13" "X6.13" "X7.13"
>> "X8.13"
>> [82] "X1.14" "X2.14" "X3.14" "X4.14" "X5.14" "X6.14" "X7.14" "X8.14"
>> "X1.15"
>> [91] "X2.15" "X3.15" "X4.15" "X5.15" "X6.15" "X7.15" "X8.15" "X1.16"
>> "X2.16"
>> [100] "X3.16" "X4.16" "X5.16" "X6.16" "X7.16" "X8.16" "X1.17" "X2.17"
>> "X3.17"
>> [109] "X4.17" "X5.17" "X6.17" "X7.17" "X8.17" "X1.18" "X2.18" "X3.18"
>> "X4.18"
>> [118] "X5.18" "X6.18" "X7.18" "X8.18" "X1.19" "X2.19" "X3.19" "X4.19"
>> "X5.19"
>> [127] "X6.19" "X7.19" "X8.19" "X1.20" "X2.20" "X3.20" "X4.20" "X5.20"
>> "X6.20"
>> [136] "X7.20" "X8.20" "X1.21" "X2.21" "X3.21" "X4.21" "X5.21" "X6.21"
>> "X7.21"
>> [145] "X8.21" "X1.22" "X2.22" "X3.22" "X4.22" "X5.22" "X6.22" "X7.22"
>> "X8.22"
>> [154] "X1.23" "X2.23" "X3.23" "X4.23" "X5.23" "X6.23" "X7.23" "X8.23"
>> "X1.24"
>> [163] "X2.24" "X3.24" "X4.24" "X5.24" "X6.24" "X7.24" "X8.24" "X1.25"
>> "X2.25"
>> [172] "X3.25" "X4.25" "X5.25" "X6.25" "X7.25" "X8.25" "X1.26" "X2.26"
>> "X3.26"
>> [181] "X4.26" "X5.26" "X6.26" "X7.26" "X8.26" "X1.27" "X2.27" "X3.27"
>> "X4.27"
>> [190] "X5.27" "X6.27" "X7.27" "X8.27" "X1.28" "X2.28" "X3.28" "X4.28"
>> "X5.28"
>> [199] "X6.28" "X7.28" "X8.28" "X1.29" "X2.29" "X3.29" "X4.29" "X5.29"
>> "X6.29"
>> [208] "X7.29" "X8.29" "X1.30" "X2.30" "X3.30" "X4.30" "X5.30" "X6.30"
>> "X7.30"
>> [217] "X8.30" "X1.31" "X2.31" "X3.31" "X4.31" "X5.31" "X6.31" "X7.31"
>> "X8.31"
>> [226] "X1.32" "X2.32" "X3.32" "X4.32" "X5.32" "X6.32" "X7.32" "X8.32"
>> "X1.33"
>> [235] "X2.33" "X3.33" "X4.33" "X5.33" "X6.33" "X7.33" "X8.33"
>>
>> What the names mean are behaviour.day the X is not important to the  
>> data, it
>> is the numbers I am trying to select on.
>>
>> So I want to split the data by day i.e. selecting for the number  
>> after the
>> decimal.
>>
>> I am using this code (where scananal is the data) with out looping  
>> so the
>> number following the decimal I change manually (NB the data have been
>> changed to character):
>>
>
> You need to learn the special character"$" which marks the no- 
> character end of string. After creating a replica of your column- 
> names with scan and grep:
> inp <- scan(what="character")
> inX <- inp[grep("X", inp)]
>
> > DAY <- grep("(X[[:digit:]]+).3$",inX)
> > inX[DAY]
> [1] "X1.3" "X2.3" "X3.3" "X4.3" "X5.3" "X6.3" "X7.3" "X8.3"
>
>> DAY <- grep("(X[[:digit:]]+).3",colnames(scananal))
>>
>> However, this will select for day 3, 30, 31, 32, etc I have tried  
>> to use
>> fixed = TRUE, but that just returns integer(0). But if I use 30, it  
>> will
>> select only 30. Not sure what I'm doing wrong here, and I assumed  
>> that fixed
>> = T would fix this, but doesn't.
>>
>> I have tried to loop this too, but with no luck, so if anyone can  
>> point me
>> in the right direction about how to loop using grep I would be most
>> grateful!
>>
>> The main problem I have is where to put the loop, for example:
>>
>> for(i in 1:33){
>> print(i)
>> DAY[[i]] <- grep("(X[[:digit:]]+).[[i]]",colnames(scananal))
>> }

Hit the send button a bit prematurely. I have not figured out what  
sort of process or result you hope to achieve but perhaps showing how  
to improve the use of grep inside a loop will help:

for(i in 1:33){
patt <- paste("(X[[:digit:]]+).", i, "$", sep="");
if (length(inX[grep(patt,inX)]) >0 ) { DAY[i] <-  
list( grep(patt,inX) ) }
}

 > DAY[1:5]
[[1]]
[1] 1 2 3 4 5 6 7 8

[[2]]
[1]  9 10 11 12 13 14 15 16

[[3]]
[1] 17 18 19 20 21 22 23 24

[[4]]
NULL

[[5]]
[1] 25 26 27 28 29 30 31 32


This first constructs a pattern. It also needs to test if there are  
any results at each iteration because there are no days=="4". Unless  
you supply the result of grep() as a list it only records the first  
day in a series, so  it only gives you the starting locations. Maybe  
if you clarified what you will be doing with this DAY construct, there  
might be more of a target to shoot for. You could use lapply on those  
column numbers at the moment.

> -- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list