[R] Finding Instances of a Pattern Throughout Data Set

R. Michael Weylandt michael.weylandt at gmail.com
Tue Apr 3 06:28:14 CEST 2012


As a purely R question though, the OP's suggested code is rather
efficient, though I wouldn't be surprised if a hair more efficiency
couldn't be achieved by pre-allocating the sensorReadings[,2] instead
of calling it twice. Probably a speed/memory trade-off at that point.

If he has enough data that this one liner bumps up on these sorts of
performance constraints, R might not be the best platform: it's a
rather straightforward problem in C (or friends) and converting to a
rolling/accumulating median algorithm would let this be a single loop
in C and pretty efficient -- also nicely adaptable to real-time work.
stats::runmed provides a starting point if you need code.

It's tangential (and perhaps I'm reading too much of myself in the
problem), but this is not an uncommon problem in finance: the OP might
want to look at the TTR package for various ideas and good
implementations.

Michael

On Tue, Apr 3, 2012 at 12:15 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> I strongly suggest you consult with a local statistician. Your
> description is far too vague (to me anyway) to make any sense of and
> probably requires a good deal of back and forth between you and a
> competent data analyst to pin down what the issues and constraints
> are. For example, what constitutes "interesting" patterns? --  do the
> data need to be analyzed in real time? -- Are visual displays
> sufficient or is some kind of numeric indication needed? ... etc. etc.
>
> One tentative word of advice, though: with this much data, do not get
> embroiled with P values or other measures of statistical
> "significance": anything you can see will be "significant."
>
> -- Bert
>
> On Mon, Apr 2, 2012 at 8:42 PM, Hasan Diwan <hasan.diwan at gmail.com> wrote:
>> I have approximately 2.5 million rows from a number of sensor
>> readings. Having plotted these, I can see a given pattern (say a spike
>> in the amplitude away from the mean). I would now like to automate
>> this procedure as we're expecting a great deal more data in the near
>> future. Is there any package or function that will make this possible?
>> Many thanks! I suppose, I could do something like:
>> amplitude <- abs(sensorReadings[sensorReadings[,2] > 1.2 *
>> median(sensorReadings[,2]),])
>> Is this the most efficient way to do what I want or not?
>> -- H
>>
>> --
>> Sent from my mobile device
>> Envoyait de mon portable
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list