[R] Line similarity

Tue Apr 30 22:47:58 CEST 2013

Here is one way to, for each row in the data.frame v, regress the numbers in
columns 2 through 4 on the numbers 1 through 3, storing only the slopes, and
then creating a column saying if the slope is greater than zero or not.

> v[,"Beta"] <- vapply(seq_len(nrow(v)),
                                        FUN=function(i)coef(lm(value~year, data=data.frame(value=as.numeric(v[i,2:4]), year=seq_len(3))))[2],
                                        FUN.VALUE=0)
> v[,"Growing"] <- v[,"Beta"] > 0
> v
  Name Year_1_value Year_2_value Year_3_value Beta Growing
1    A            1            2            3  1.0    TRUE
2    B            2            7           19  8.5    TRUE
3    C            3            4            2 -0.5   FALSE
4    D           10            7            6 -2.0   FALSE
5    E            4            4            5  0.5    TRUE
6    F           NA            3            6  3.0    TRUE

Since you are doing least-squares regression in which the predictors are the
same for all regressions (expect the one with the NA in it) you can also do
> coef(lm(value ~ year, list(value=t(as.matrix(v[1:5,2:4])), year=seq_len(3))))[2,]
   1    2    3    4    5 
 1.0  8.5 -0.5 -2.0  0.5
but you have to then make a special case for each pattern of missing values.

If you always use a 3-consecutive-year period you can use
   Growing <- v[,"Year_1_value"] < v[, "Year_3_value"]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Satsangi, Vivek (GE Capital)
> Sent: Tuesday, April 30, 2013 12:57 PM
> To: r-help at r-project.org
> Subject: [R] Line similarity
> 
> Folks,
> 
>                 This is probably a "help me google this properly, please"-type of question.
> 
>                 In TIBCO Spotfire, there is a procedure called "line similarity". I use this to
> determine which observations show a growing, stable or declining pattern... sort of like a
> mini-regression on the time-line for each observation.
> 
>                 So of the input is something like this:
> 
> Name Year_1_value Year_2_value Year_3_value
> A 1 2 3
> B 2 7 19
> C 3 4 2
> D 10 7 6
> E 4 4 5
> F NA 3 6
> 
> Then the desired output is as follows:
> A Growing
> B Growing
> C Stable
> D Declining
> E Stable
> F Growing (or NA is also fine)
> 
>                 The data can also be unstacked, i.e. the three years could be separate rows if
> necessary.
>                 Is there a package for R that implements something like the above? I can
> obviously try do a set of simple regressions to classify the rows, but I want to gain from
> the thoughts and learnings of others who may have taken the time to implement a
> package.
>                 I tried searching with the words "line similarity" or its variants to no avail.
> 
>                 Thanks in advance for your pointers!
> 
> Vivek Satsangi
> GE Capital
> Americas
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.