[R] [SPAM] - Re: How do I identify non-sequential data? - Found word(s) list error in the Text body

David Reiner David.Reiner at xrtrading.com
Fri Nov 22 17:16:41 CET 2013


Similar to Don MacQueen's:

unsplit(lapply(split(DF, DF$ID), transform, cv = c(0, diff(YoS))), DF$ID)

-- David Reiner


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lopez, Dan
Sent: Thursday, November 21, 2013 6:38 PM
To: MacQueen, Don
Cc: R help (r-help at r-project.org)
Subject: [SPAM] - Re: [R] How do I identify non-sequential data? - Found word(s) list error in the Text body

Hi Don,

Yes, I am error checking a dataset produced by a query.  Most likely a problem with the query but wanted to assess the problem first.

BTW Arun provided another solution which is similar to yours but uses the function ave instead:
 testSeq[!!(with(testSeq,ave(YoS,ID,FUN=function(x) any(c(0,diff(x))>1)))),]

I appreciate your response on this.
Dan


-----Original Message-----
From: MacQueen, Don
Sent: Thursday, November 21, 2013 3:58 PM
To: Lopez, Dan; R help (r-help at r-project.org)
Subject: Re: [R] How do I identify non-sequential data?

Dan,
Does this do it?

## where dt is the data

tmp <- split(dt, dt$ID)

foo <- lapply(tmp, function(x) any(diff(x$YoS) > 1))

foo <- data.frame( ID=names(foo), gap=unlist(foo))

Note that I ignored dept.
Little hard to see how YoS can increase by more than one when the year increases by only one ... unless this is a search for erroneous data.

-Don



--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/21/13 3:32 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote:

>Hi R Experts,
>
>About the data:
>My data consists of people (ID) with years of service (Yos) for each
>year. An ID can appear multiple times.
>The data is sorted by ID then by Year.
>
>Problem:
>I need to extract ID data with non-sequential YoS rows. For example
>below that would be all rows for ID 33 and 16 since they have a
>non-sequential YoS.
>To accomplish this I figured I could create a column called 'CheckVal'
>that takes current row YoS minus previous row YoS. The first instance
>for each ID will be 0. 'CheckVal' in the below data set was created in Excel.
>I want to know how to do this in R.
>Is there a package I can use or specific function or set of functions I
>can use to accomplish this?
>
>#My data looks like:
>> testSeq
>
>   ID Year YoS CheckVal dept
>
>1  12 2010 1.1      0.0    A
>
>2  12 2011 2.1      1.0    A
>
>3  44 2009 1.4      0.0    C
>
>4  44 2010 2.4      1.0    C
>
>5  44 2011 3.4      1.0    B
>
>6  33 2009 2.3      0.0    A
>
>7  33 2010 4.4      2.1    A
>
>8  16 2009 1.6      0.0    B
>
>9  16 2010 2.6      1.0    B
>
>10 16 2011 5.6      3.0    C
>
>11 16 2012 6.6      1.0    A
>
>#here is dput of data for R
>
>Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16,
>
>16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009,
>
>2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4,
>
>1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1,
>
>3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L,
>
>3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names =
>c("ID",
>
>"Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class =
>"data.frame")
>
>Dan
>Workforce Analyst
>LLNL
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws.  Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates.

THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.



More information about the R-help mailing list