[R] subset data using a vector

Michael Dewey lists at dewey.myzen.co.uk
Mon Nov 23 17:16:32 CET 2015


length(strsplit(as.character(mydata$ranges2use), ","))

was that what you expected? I think not.

On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
> Dear R users,
>                  I like to split my data by a vector created by using variable "ranges".  This vector will have the current range (ranges), preceding range (ranges - 1), and post range (ranges + 1) for a given plotid.  If the preceding or post ranges in this vector are outside the levels of ranges in the data set then I like to drop those ranges and only include the ranges that are available.  Variable "rangestouse" includes all the desired ranges I like to subset a given plotid.  After I subset these dataset using these desired ranges, then I like to extract the yield data for checks in those desired ranges and adjust yield of my data by dividing yield of a given plotid with the check average for the desired ranges.
>
> I have created this function (fun1) but when I run it, I get the following error:
>
> Error in m1[[i]] : subscript out of bounds
>
> Any help will be highly appreciated!
> Thanks, Nilesh
>
> Dataset:
> dput(mydata)
> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3",
> "4"), class = "factor"), cols = structure(c(1L, 10L, 11L, 12L,
> 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L,
> 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
> 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L,
> 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L, 5L,
> 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5", "6", "7",
> "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "factor"),
>      plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
>      290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
>      374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
>      379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
>      399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
>      480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
>      477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
>      4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
>      5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
>      5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
>      5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
>      5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
>      ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
>      2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>      1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>      2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
>      "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
>      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
>      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
>      4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
>      ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
>      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
>      4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
>      "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names = c("rows",
> "cols", "plotid", "yield", "linecode", "ranges", "rangestouse"
>
> ), class = "data.frame", row.names = c(NA, -64L))
>
> Function:
>
> fun1<- function (dataset, plot.id, ranges2use, control)
>
> {
>
>      m1 <- strsplit(as.character(dataset$ranges2use), ",")
>
>      dat1 <- data.frame()
>
>      m2 <- c()
>
>      row_check_mean <- c()
>
>      row_check_adj_yield <- c()
>
>      x <- length(plot.id)
>
>      for (i in (1:x)) {
>
>          m2[i] <- m1[[i]]
>
>          dat1 <- dataset[dataset$ranges %in% m2[i], ]
>
>          row_check_mean[i] <- tapply(dat1$trait, dat1$control,
>
>              mean, na.rm = TRUE)[1]
>
>          row_check_adj_yield[i] <- ifelse(control[i] == "variety",
>
>              trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])
>
>      }
>
>      data.frame(dataset, row_check_adj_yield)
>
> }
>
> Apply function:
> fun1(mydata, plot.id=mydata$plotid, ranges2use = mydata$rangestouse,control=mydata$linecode)
>
> Error:
>
> Error in m1[[i]] : subscript out of bounds
>
> Session info:
>
> R version 3.2.1 (2015-06-18)
>
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
>
>
> locale:
>
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
>
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> loaded via a namespace (and not attached):
>
>   [1] magrittr_1.5    plyr_1.8.3      tools_3.2.1     reshape2_1.4.1  Rcpp_0.12.1     stringi_1.0-1
>
>   [7] grid_3.2.1      agridat_1.12    stringr_1.0.0   lattice_0.20-31
>
>
> Nilesh Dighe
> (806)-252-7492 (Cell)
> (806)-741-2019 (Office)
>
>
> This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html



More information about the R-help mailing list