[R] Stringr / Regular Expressions advice

arun smartpink111 at yahoo.com
Wed Jul 2 05:00:00 CEST 2014


#or 

res <- mapply(`%in%`, accel_data, v.to.match)

res1 <- sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]<=tail(v.to.match[[i]],1) & accel_data[i] >=v.to.match[[i]][1])

all.equal(res, res1,check.attributes=F)
#[1] TRUE

A.K.

On Tuesday, July 1, 2014 10:56 PM, arun <smartpink111 at yahoo.com> wrote:
Hi Vincent,

You could try:
v.to.match <- list(438:445, 454:460,459:470)

sapply(seq_len(ncol(accel_data)),function(i) accel_data[i]<=tail(v.to.match[[i]],1) & accel_data[i] >=v.to.match[[i]][1])

#or use ?cut or ?findInterval

A.K.







On Tuesday, July 1, 2014 2:23 PM, VINCENT DEAN BOYCE <vincentdeanboyce at gmail.com> wrote:
Sara,

Yes, I modified the code that you provided and it worked quite well. Here
is the revised code:

.....

accel_data <- data
*# pattern to be identified*
v.to.match <- c(438, 454, 459)
# call the below function anytime the "v.to.match" criteria changes to
ensure match is updated
v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
which(v.matches)
[1] 405
sum(v.matches)
[1] 1

......

Again, here is the dataset:

> dput(head(accel_data, 20))

structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
"z_reading"), row.names = c(NA, 20L), class = "data.frame")

My next goal is to extend the range for each column. For instance:

v.to.match <- c(438:445, 454:460, 459:470)

Your thoughts?

Many thanks,

Vincent








On Fri, Jun 27, 2014 at 5:51 AM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:

> Hi,
>
> It's a good idea to copy back to the list, not just to mo, to keep the
> discussion all in one place.
>
>
> On Thursday, June 26, 2014, VINCENT DEAN BOYCE <vincentdeanboyce at gmail.com>
> wrote:
>
>> Sarah,
>>
>> Great feedback and direction. Here is the data I am working with*:
>>
>> > dput(head(data_log, 20))
>>
>> structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
>> 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
>> 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
>> 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
>> 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
>> 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
>> 455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
>> "z_reading"), row.names = c(NA, 20L), class = "data.frame")
>>
>> *however, I am unsure why the letter "L" has been appended to each
>> numerical string.
>>
>
> It denotes values stored as integers, and is nothing you need to worry
> about.
>
>
>> In any event, as you can see there are three columns of data named
>> x_reading, y_reading and z_reading. I would like to detect patterns among
>> them.
>>
>> For instance, let's say the pattern I wish to detect is 455, 502, 454
>> across the three columns respectively. As you can see in the data, this is
>> found in the first row.This particular string reoccurs numerous times
>> within the dataset is what I wish to quantify - how many times the string
>> 455, 502, 454 appears.
>>
>> Your thoughts?
>>
>
> Did you try the code I provided? It does what I think you're looking for.
>
> Sarah
>
>
>> Many thanks,
>>
>> Vincent
>>
>>
>> On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee <sarah.goslee at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
>>> <vincentdeanboyce at gmail.com> wrote:
>>> > Hello,
>>> >
>>> > Using R,  I've loaded a .cvs file comprised of several hundred rows
>>> and 3
>>> > columns of data. The data within maps the output of a triaxial
>>> > accelerometer, a sensor which measures an object's acceleration along
>>> the
>>> > x,y and z axes. The data for each respective column sequentially
>>> > oscillates, and ranges numerically from 100 to 500.
>>>
>>> If your data are numeric, why are you using stringr?
>>>
>>> It would be easier to provide you with an answer if we knew what your
>>> data looked like.
>>>
>>> dput(head(yourdata, 20))
>>>
>>> and paste that into your non-HTML email.
>>>
>>> > I want create a function that parses the data and detects patterns
>>> across
>>> > the three columns.
>>> >
>>> > For instance, I would like to detect instances when the values for the
>>> x,y
>>> > and z columns equal 150, 200, 300 respectively. Additionally, when a
>>> match
>>> > is detected, I would like to know how many times the pattern appears.
>>>
>>> That's easy enough:
>>>
>>> fakedata <- data.frame(matrix(c(
>>> 100, 100, 200,
>>> 150, 200, 300,
>>> 100, 350, 100,
>>> 400, 200, 300,
>>> 200, 500, 200,
>>> 150, 200, 300,
>>> 150, 200, 300),
>>> ncol=3, byrow=TRUE))
>>>
>>> v.to.match <- c(150, 200, 300)
>>>
>>> v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
>>>
>>> # which rows match
>>> which(v.matches)
>>>
>>> # how many rows match
>>> sum(v.matches)
>>>
>>> > I have been successful using str_detect to provide a Boolean, however
>>> it
>>> > seems to only work on a single vector, i.e, "400" , not a range of
>>> values
>>> > i.e "400 - 450". See below:
>>>
>>> This is where I get confused, and where we need sample data. Are your
>>> data numeric, as you state above, or some other format?
>>>
>>> If your data are character, and like "400 - 450", you can still match
>>> them with the code I suggested above.
>>>
>>> > # this works
>>> >> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>>> >
>>> > # this also works, but doesn't detect the particular range, rather the
>>> > existence of the numbers
>>> >> vals <- str_detect (string = data_log$x_reading, pattern =
>>> "[400-450]")
>>>
>>> Are you trying to match any numeric value in the range 400-450? Again,
>>> actual data.
>>>
>>> > Also, it appears that I can only apply it to a single column, not to
>>> all
>>> > three columns. However I may be mistaken.
>>>
>>> You answer your own question unwittingly - apply().
>>>
>>> Sarah
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>
>>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org
>

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list