[R] Speeding up code?

Thu Jul 16 05:34:15 CEST 2015

Hi Ignacio, If I am reading your code correctly then the top while loop is
essentially seeking to select a random set of names from the original set,
then using unique to reduce it down, you then iterate until you have built
your quota.  Ultimately this results in a very inefficient attempt at
sampling without replacement.  Why not just sample without replacement
rather than loop iteratively and use unique?  Or if the set of possible
names are short enough why not just randomize it and then pull the first n
items off?

    Best,
    Collin.

On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez <ignacio82 at gmail.com>
wrote:

> Hi R-Help!
>
> I'm hoping that some of you may give me some tips that could make my code
> more efficient. More precisely, I would like to make the answer to my
> stakoverflow
> <
> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
> >
> question more efficient.
>
> This is the code:
>
> library(dplyr)
> library(randomNames)
> library(geosphere)
> set.seed(7142015)# Define Parameters
> n.Schools <- 20
> first.grade<-3
> last.grade<-5
> n.Grades <-last.grade-first.grade+1
> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
> teacher
> # Define Random names function:
> gen.names <- function(n, which.names = "both", name.order = "last.first"){
>   names <- unique(randomNames(n=n, which.names = which.names,
> name.order = name.order))
>   need <- n - length(names)
>   while(need>0){
>     names <- unique(c(randomNames(n=need, which.names = which.names,
> name.order = name.order), names))
>     need <- n - length(names)
>   }
>   return(names)}
> # Generate n.Schools names
> gen.schools <- function(n.schools) {
>   School.ID <-
>     paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
>   Schools <-
>     data.frame(School.ID, School.lat, School.long, School.RE) %>%
>     mutate(School.ID = as.character(School.ID)) %>%
>     rowwise() %>%  mutate (School.distance = distHaversine(
>       p1 = c(School.long, School.lat),
>       p2 = c(21.7672, 58.8471), r = 3961
>     ))
>   return(Schools)}
>
> Schools <- gen.schools(n.schools = n.Schools)
> # Generate Grades
> Grades <- c(first.grade:last.grade)
> # Generate n.Classrooms
>
> Classrooms <- LETTERS[1:n.Classrooms]
> # Group schools and grades
>
> SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
> FUN="paste")#head(SchGr)
> # Group SchGr and Classrooms
>
> SchGrClss <- outer(SchGr, paste0(Classrooms, '-'),
> FUN="paste")#head(SchGrClss)
> # These are the combination of  School-Grades-Classroom
> SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
> SchGrClssEnd <- as.data.frame(SchGrClssTmp)
> # Assign n.Teachers (2 classroom in a given school-grade)
> Allpairs <- as.data.frame(t(combn(SchGrClssTmp, 2)))
> AllpairsTmp <- paste(Allpairs$V1, Allpairs$V2, sep=" ")
>
> library(stringr)
> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
> "-"))
> separoPairs <- as.data.frame(t(separoPairs))
> row.names(separoPairs) <- NULL
> separoPairs <- separoPairs %>% select(-V7)  %>%  #Drops empty column
>   mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
> V5=as.numeric(V5)) %>% mutate(V4 = trimws(V4, which = "both"))
>
> separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid
> validPairs <- separoPairs %>% filter(V1==V4 & V2==V5) %>% select(V1, V2,
> V3, V6)
> # Generate n.Teachers
>
> gen.teachers <- function(n.teachers){
>   Teacher.ID <- gen.names(n = n.teachers, name.order = "last.first")
>   Teacher.exp <- runif(n = n.teachers, min = 1, max = 30)
>   Teacher.Other <- sample(c(0,1), replace = T, prob = c(0.5, 0.5),
> size = n.teachers)
>   Teacher.RE <- rnorm(n = n.teachers, mean = 0, sd = 1)
>   Teachers <- data.frame(Teacher.ID, Teacher.exp, Teacher.Other,
> Teacher.RE)
>   return(Teachers)}
> Teachers <- gen.teachers(n.teachers = n.Teachers) %>%
>   mutate(Teacher.ID = as.character(Teacher.ID))
> # Randomly assign n.Teachers teachers to the "ValidPairs"
> TmpAssignments <- validPairs[sample(1:nrow(validPairs), n.Teachers), ]
> Assignments <- cbind.data.frame(Teachers$Teacher.ID, TmpAssignments)
> names(Assignments) <- c("Teacher.ID", "School.ID", "Grade", "Class_1",
> "Class_2")
> # Tidy Data
> library(tidyr)
> TeacherClassroom <- Assignments %>%
>   gather(x, Classroom, Class_1,Class_2) %>%
>   select(-x) %>%
>   mutate(Teacher.ID = as.character(Teacher.ID))
> # Merge
> DF_Classrooms <- TeacherClassroom %>% full_join(Teachers,
> by="Teacher.ID") %>% full_join(Schools, by="School.ID")
> rm(list=setdiff(ls(), "DF_Classrooms")) # Clean the work space!
>
> *I want to end up with the same*  'DF_Classrooms *data frame* but getting
> there in a more efficient way. In particular, when is use n.Classrooms <-4
> the
> code run fast, but *if I increase it to something like 20 it is painfully
> slow.*
>
> Thanks!!!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]