[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Jul 2 18:37:08 CEST 2016


I can understand you not wanting to supply your actual data online, but only you know what your data looks like so only you can create a simulated data set that we could show you how to work with. 
-- 
Sent from my phone. Please excuse my brevity.

On July 2, 2016 2:57:39 AM PDT, Kevin Wamae <KWamae at kemri-wellcome.org> wrote:
>I have a drug-trial study dataset (attached image).
>
>Since its a large and complex dataset (at least to me) and I hope to be
>as clear as possible with my question.
>The dataset is from a study where individuals are given drugs and
>followed up over a period spanning two consecutive years. Individuals
>do not start treatment on the same day and once they start, the
>variable "drug-admin" is marked "x" as well as the time they stop
>treatment in the following year.
>There exists another variable, "study_id", that I hope to populate as
>can be seen in the dataset, with the following conditions:
>
>For every individual
>•    if the individual has entries that show they received drugs both
>on the start and end date (marked with the "x")
>•    if the start of drug administration falls in month == 2 | 3 and
>end of administration falls in month == 2 | 4
>•    then, using the date that marks the start of drug administration,
>populate the variable _"study_id"_ in all the rows that fall within the
>timeframe that the individual was given drugs but excluding the end of
>drug administration.
>I have tried my level best and while I have explored several examples
>online, I haven't managed to solve this. The dataset contains close to
>6000 individuals spanning 10 years and my best bet was to use a loop
>which keeps crushing R after running for close to 30min. I have also
>read that dplyr may do the job but my attempts have been in vain.
>
>sample code
>-------------------------------------------------------------------------------------------------------------------------------------------------------------------
>individual <- unique (df$ID)  #vector of individuals
>datalength <- dim(df)[1]      #number of rows in dataframe
>
>for (i in 1:length(individual)) {
>  for (j in 1:datalength) {
>start_admin <- df[(df$year == 2007] & df$drug_admin == "x" & c(df$month
>== 2 | df$month == 3),1]  #capture date of start
>end_admin <- df[(df$year == 2008] & df$drug_admin == "x" & c(df$month
>== 2 | df$month == 4),1]    #capture date of end
>
>if(df[datalength,1] == individual(i) & df[datalength,2] >= start_admin
>& df[datalength,2] < end_admin) {
>df[datalength,6] <- start_admin #populate respective row if condition
>is met
>      }
>    }
>  }
>
>-------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>Above is the code that keeps failing..
>
>Any help is highly appreciated....
>
>
>______________________________________________________________________
>
>This e-mail contains information which is confidential. It is intended
>only for the use of the named recipient. If you have received this
>e-mail in error, please let us know by replying to the sender, and
>immediately delete it from your system.  Please note, that in these
>circumstances, the use, disclosure, distribution or copying of this
>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>cannot accept any responsibility for the  accuracy or completeness of
>this message as it has been transmitted over a public network. Although
>the Programme has taken reasonable precautions to ensure no viruses are
>present in emails, it cannot accept responsibility for any loss or
>damage arising from the use of the email or attachments. Any views
>expressed in this message are those of the individual sender, except
>where the sender specifically states them to be the views of
>KEMRI-Wellcome Trust Programme.
>______________________________________________________________________
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list