[R] Deleting Rows based on Factor and Time Period

Mikkel Grum mi2kelgrum at yahoo.com
Tue Sep 13 18:55:56 CEST 2011


The following will get you the first stock in each week. Is that useful?

install.packages("surveillance")
library(surveillance)
alldat$year <- isoWeekYear(alldat$mydate)$ISOYear
alldat$week <- isoWeekYear(alldat$mydate)$ISOWeek
alldat <- alldat[order(alldat$year, alldat$week), ]
alldat[!duplicated(paste(alldat$year, alldat$week, alldat$myeq)), ]



----- Original Message -----
From: Anna Dunietz <anna.dunietz at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, September 13, 2011 9:04 AM
Subject: [R] Deleting Rows based on Factor and Time Period

Hi All!

I have been messing around with this problem for about a week but to no
avail! The following data has been cut down in order to make my question
reproducible.  The alldat data frame includes 2 columns: 1 date column and 1
factor column (equity names)).

mydate<-as.Date(c("2001-07-02","2001-07-02","2001-07-03","2001-07-03","2001-07-05","2001-07-05","2001-07-10","2001-07-13","2010-01-27"),origin="1970-01-01")

myeq<-factor(c("FCX.UN.Equity","TIE.UN.Equity","FCX.UN.Equity","TIE.UN.Equity","FCX.UN.Equity","TIE.UN.Equity","TIE.UN.Equity","L.UN.Equity","FCX.UN.Equity"))

alldat<-data.frame(mydate,myeq)


> alldat      mydate          myeq
1 2001-07-02 FCX.UN.Equity
2 2001-07-02 TIE.UN.Equity
3 2001-07-03 FCX.UN.Equity
4 2001-07-03 TIE.UN.Equity
5 2001-07-05 FCX.UN.Equity
6 2001-07-05 TIE.UN.Equity
7 2001-07-10 TIE.UN.Equity
8 2001-07-13   L.UN.Equity
9 2010-01-27 FCX.UN.Equity


I group respective factors together by using the split function.  For each
respective factor, I am interested in deleting the rows that entail dates
that are less than or equal to the *first* stock in that column + 6.  Repeat
the following sentence, but instead of *first* use second, third, etc.  In
short, I do not want an equity that has dates within a week of one another
at any point in the data frame/list (depending on if you're looking at
alldat or divall).  For example, for FCX.UN.Equity, I would only want the
row beginning with 2001-07-02 to remain, as well as the row starting with
2010-01-27.  I cannot delete rows immediately because I need all rows in
order to determine which rows to delete.

diveq<-alldat$myeq
divall<-split(alldat,diveq)

I try to pick out those rows that I want to delete by using a double loop
(inefficient and awful, I know).  For better or for worse, the double loop
does not work.  I get integer(0) for all elements of workin.  I put the
second condition in the which function, so that the first date is saved.  I
use the third condition, so that the dates looked at are all greater than or
equal than the date being looked at.  I have spent many, many hours on this
and can still not figure it out.

workin<-list()
  for(j in 1:length(divall)){
  for(i in 1:nrow(divall[[j]])){
    workin[[j]]<-which(divall[[j]][,1]<=divall[[j]][i,1]+6 &
divall[[j]][,1]!=divall[[j]][i,1] & divall[[j]][,1]>=divall[[j]][i,1])
  }}

If I could get the workin list to work, I would use unique and unlist in
order to find the index that would show me which rows in divall/alldat need
to be deleted.

I hope this has been clear.  Please let me know if you need any more
information!

Thank you very much!
Anna

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list