[R] data and parameters

analyst41 at hotmail.com analyst41 at hotmail.com
Tue Jan 25 03:05:32 CET 2011


Thanks.  I finally got around to implementing it and it works.

But I think the steps to produce master_reduced can be compressed into

master_reduced = merge(master,control)

> master
  clientId date value
1        1 1001 10001
2        2 1002 10002
3        3 1003 10003
4        4 1004 10004
5        2 1005 10005
> control
  clientId mindate maxdate control.params
1        2     100    1005              1
2        3    1005    1005              2


>  merge(master,control)
  clientId date value mindate maxdate control.params
1        2 1002 10002     100    1005              1
2        2 1005 10005     100    1005              1
3        3 1003 10003    1005    1005              2

with the added advantage that clientId doesn't occur twice.  Is this
just coincidence or can I use this technique reliably for merges of
this sort?

> master_reduced
  clientId date value clientId mindate maxdate control.params
2        2 1002 10002        2     100    1005              1
3        3 1003 10003        3    1005    1005              2
5        2 1005 10005        2     100    1005              1


On Jan 21, 5:20 am, "Moritz Grenke" <r-l... at 360mix.de> wrote:
> #dummy data:
> master=as.data.frame(list(clientId=c(1:4,2), date=1001:1005,
> value=10001:10005))
> control=as.data.frame(list(clientId=c(2,3), mindate=c(100,1005),
> maxdate=c(1005,1005), control.params=c(1,2)))
>
> #reducing master df:
> #generating "TRUE FALSE index":
> idIndex=master$clientId %in% control$clientId
>
> #choose only those lines where index==TRUE
> master_reduced=master[idIndex,]
> master_reduced
>
> #merging dfs:
> mergingIndex= match(master_reduced$clientId, control$clientId)
> master_reduced=cbind(master_reduced, control[mergingIndex,])
> master_reduced
>
> #finally choose those lines where date is in range
> dateIndex=master_reduced$date>master_reduced$mindate &
> master_reduced$date<master_reduced$maxdate
> finalDF=master_reduced[dateIndex,]
> finalDF
>
> Hope this helps
> Moritz
> _________________________
> Moritz Grenkehttp://www.360mix.de
>
> -----Ursprüngliche Nachricht-----
> Von: r-help-boun... at r-project.org [mailto:r-help-boun... at r-project.org] Im
> Auftrag von analys... at hotmail.com
> Gesendet: Freitag, 21. Januar 2011 03:02
> An: r-h... at r-project.org
> Betreff: [R] data and parameters
>
> (1) I have a master data frame that reads
>
> ClientID |date |value
>
> (2) I also have a control data frame that reads
>
> Client ID| Min date| Max date| control parameters
>
> The control data set may not have all client IDs .
>
> I want to use the control data frame on the master data frame to
> remove client IDS that don't exist in the control data set and for
> those that do, remove dates outside the required range.
>
> (3) We can either put the control parameters on all rows corresponding
> to a client ID or look it up from the control data frame
>
> (4) The basic function call looks like
>
> do.something(df,control parameters)
>
> where df is the subset of the master data set that corresponds to a
> single client with unwanted dates removed and the control parameters
> pertain to that client.
>
> Any help would be appreciated.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list