[R] subset data using a vector

Jim Lemon drjimlemon at gmail.com
Tue Nov 24 09:53:23 CET 2015


Hi Nilesh,
I simplified your code a bit:

fun1<-function (dataset, plot.id, ranges2use, control) {
 m1 <- strsplit(as.character(ranges2use), ",")
 dat1 <- data.frame()
 row_check_mean <- NA
 row_check_adj_yield <- NA
 x <- length(plot.id)
 for (i in 1:x) {
  cat(i,"\n")
  dat1 <- dataset[dataset$ranges %in% m1[[i]], ]
  row_check_mean[i] <- tapply(unlist(dat1$trait),unlist(dat1$control),
   mean, na.rm = TRUE)[1]
  row_check_adj_yield[i] <- ifelse(control[i] == "variety",
  trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])
 }
 data.frame(dataset, row_check_adj_yield)
}

 and got it to run down to this line:

row_check_mean[i]<-tapply(dat1$trait,dat1$control,mean,na.rm=TRUE)[1]

which generates the error:

Error in split.default(X, group) : first argument must be a vector

As far as I can see, there is no element in "mydata" named "trait" and
"control" is not an element of the local variable "dat1". I can't get past
this, but perhaps it will help you to sort it out.

Jim


On Tue, Nov 24, 2015 at 10:10 AM, DIGHE, NILESH [AG/2362] <
nilesh.dighe at monsanto.com> wrote:

> Michael:  I tried using your suggestion of using length and still get the
> same error:
> Error in m1[[i]] : subscript out of bounds
>
> I also checked the length of m1 and x and they both are of same length
> (64).
>
> After trying several things, I was able to extract the list but this was
> done outside the function I am trying to create.
> Code that worked is listed below:
>
> for(i in (1:length(mydata$plotid))){
>         v1<-as.numeric(strsplit(as.character(mydata$rangestouse),
> ",")[[i]])
>         print(head(v1))}
>
> However, when I try to get this code in a function (fun3) listed below, I
> get the following error:
> Error in strsplit(as.character(dataset$ranges2use), ",")[[i]] :
>   subscript out of bounds
>
> fun3<- function (dataset, plot.id, ranges2use, control)
> {
>     m1 <- c()
>     x <- length(plot.id)
>     for (i in (1:x)) {
>         m1 <- as.numeric(strsplit(as.character(dataset$ranges2use),
>             ",")[[i]])
>     }
>     m2
> }
>
> I am not sure where I am making a mistake.
> Thanks.
> Nilesh
>
> -----Original Message-----
> From: Michael Dewey [mailto:lists at dewey.myzen.co.uk]
> Sent: Monday, November 23, 2015 12:11 PM
> To: DIGHE, NILESH [AG/2362]; r-help at r-project.org
> Subject: Re: [R] subset data using a vector
>
> Try looking at your function and work through what happens if the length
> is what I suggested.
>
>  >>       x <- length(plot.id)
>  >>
>  >>       for (i in (1:x)) {
>  >>
>  >>           m2[i] <- m1[[i]]
>
> So unless m1 has length at least x you are doomed.
>
> On 23/11/2015 16:26, DIGHE, NILESH [AG/2362] wrote:
> > Michael:  I like to use the actual range id's listed in column
> "rangestouse" to subset my data and not the length of that vector.
> >
> > Thanks.
> > Nilesh
> >
> > -----Original Message-----
> > From: Michael Dewey [mailto:lists at dewey.myzen.co.uk]
> > Sent: Monday, November 23, 2015 10:17 AM
> > To: DIGHE, NILESH [AG/2362]; r-help at r-project.org
> > Subject: Re: [R] subset data using a vector
> >
> > length(strsplit(as.character(mydata$ranges2use), ","))
> >
> > was that what you expected? I think not.
> >
> > On 23/11/2015 16:05, DIGHE, NILESH [AG/2362] wrote:
> >> Dear R users,
> >>                   I like to split my data by a vector created by using
> variable "ranges".  This vector will have the current range (ranges),
> preceding range (ranges - 1), and post range (ranges + 1) for a given
> plotid.  If the preceding or post ranges in this vector are outside the
> levels of ranges in the data set then I like to drop those ranges and only
> include the ranges that are available.  Variable "rangestouse" includes all
> the desired ranges I like to subset a given plotid.  After I subset these
> dataset using these desired ranges, then I like to extract the yield data
> for checks in those desired ranges and adjust yield of my data by dividing
> yield of a given plotid with the check average for the desired ranges.
> >>
> >> I have created this function (fun1) but when I run it, I get the
> following error:
> >>
> >> Error in m1[[i]] : subscript out of bounds
> >>
> >> Any help will be highly appreciated!
> >> Thanks, Nilesh
> >>
> >> Dataset:
> >> dput(mydata)
> >> structure(list(rows = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >> 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> >> 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
> >> cols = structure(c(1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L, 4L,
> >> 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L, 3L,
> >> 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 2L,
> >> 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
> >> 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", "3", "4", "5",
> >> "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class =
> "factor"),
> >>       plotid = c(289L, 298L, 299L, 300L, 301L, 302L, 303L, 304L,
> >>       290L, 291L, 292L, 293L, 294L, 295L, 296L, 297L, 384L, 375L,
> >>       374L, 373L, 372L, 371L, 370L, 369L, 383L, 382L, 381L, 380L,
> >>       379L, 378L, 377L, 376L, 385L, 394L, 395L, 396L, 397L, 398L,
> >>       399L, 400L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
> >>       480L, 471L, 470L, 469L, 468L, 467L, 466L, 465L, 479L, 478L,
> >>       477L, 476L, 475L, 474L, 473L, 472L), yield = c(5.1, 5, 3.9,
> >>       4.6, 5, 4.4, 5.1, 4.3, 5.5, 5, 5.5, 6.2, 5.1, 5.5, 5.2, 5,
> >>       5.6, 4.7, 5.4, 4.8, 4.6, 3.9, 4.2, 4.4, 5.3, 5.5, 5.8, 4.6,
> >>       5.8, 4.8, 5.3, 5.5, 5.6, 4.2, 4.6, 4.2, 4.2, 4, 3.9, 4.5,
> >>       5, 4.8, 4.9, 5.2, 5.3, 4.6, 4.8, 5.3, 4.5, 4.5, 5.1, 4.9,
> >>       5.2, 4.6, 4.8, 5.4, 5.9, 4.9, 5.8, 5.3, 4.8, 4.7, 5.2, 5.8
> >>       ), linecode = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>       2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
> >>       2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>       1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>       2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("check",
> >>       "variety"), class = "factor"), ranges = c(1L, 1L, 1L, 1L,
> >>       1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> >>       2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
> >>       3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
> >>       4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
> >>       ), rangestouse = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> >>       1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>       2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
> >>       3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
> >>       4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1,2",
> >>       "1,2,3", "2,3,4", "3,4"), class = "factor")), .Names =
> >> c("rows", "cols", "plotid", "yield", "linecode", "ranges", "rangestouse"
> >>
> >> ), class = "data.frame", row.names = c(NA, -64L))
>
> >>
> >> Function:
> >>
> >> fun1<- function (dataset, plot.id, ranges2use, control)
> >>
> >> {
> >>
> >>       m1 <- strsplit(as.character(dataset$ranges2use), ",")
> >>
> >>       dat1 <- data.frame()
> >>
> >>       m2 <- c()
> >>
> >>       row_check_mean <- c()
> >>
> >>       row_check_adj_yield <- c()
> >>
> >>       x <- length(plot.id)
> >>
> >>       for (i in (1:x)) {
> >>
> >>           m2[i] <- m1[[i]]
> >>
> >>           dat1 <- dataset[dataset$ranges %in% m2[i], ]
> >>
> >>           row_check_mean[i] <- tapply(dat1$trait, dat1$control,
> >>
> >>               mean, na.rm = TRUE)[1]
> >>
> >>           row_check_adj_yield[i] <- ifelse(control[i] == "variety",
> >>
> >>               trait[i]/dataset$row_check_mean[i], trait[i]/trait[i])
> >>
> >>       }
> >>
> >>       data.frame(dataset, row_check_adj_yield)
> >>
> >> }
> >>
> >> Apply function:
> >> fun1(mydata, plot.id=mydata$plotid, ranges2use =
> >> mydata$rangestouse,control=mydata$linecode)
> >>
> >> Error:
> >>
> >> Error in m1[[i]] : subscript out of bounds
> >>
> >> Session info:
> >>
> >> R version 3.2.1 (2015-06-18)
> >>
> >> Platform: i386-w64-mingw32/i386 (32-bit)
> >>
> >> Running under: Windows 7 x64 (build 7601) Service Pack 1
> >>
> >>
> >>
> >> locale:
> >>
> >> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> >> States.1252
> >>
> >> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> >>
> >> [5] LC_TIME=English_United States.1252
> >>
> >>
> >>
> >> attached base packages:
> >>
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>
> >>
> >>
> >> loaded via a namespace (and not attached):
> >>
> >>    [1] magrittr_1.5    plyr_1.8.3      tools_3.2.1     reshape2_1.4.1
> Rcpp_0.12.1     stringi_1.0-1
> >>
> >>    [7] grid_3.2.1      agridat_1.12    stringr_1.0.0   lattice_0.20-31
> >>
> >>
> >> Nilesh Dighe
> >> (806)-252-7492 (Cell)
> >> (806)-741-2019 (Office)
> >>
> >>
> >> This e-mail message may contain privileged and/or confidential
> >> information, and is intended to be received only by persons entitled
> >> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and all attachments
> from any servers, hard drives or any other media. Other use of this e-mail
> by you is strictly prohibited.
> >>
> >> All e-mails and attachments sent and received are subject to
> >> monitoring, reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> >> Monsanto, along with its subsidiaries, accepts no liability for any
> >> damage caused by any such code transmitted by or accompanying this
> e-mail or any attachment.
> >>
> >>
> >> The information contained in this email may be subject to the export
> >> control laws and regulations of the United States, potentially
> >> including but not limited to the Export Administration Regulations
> >> (EAR) and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all applicable U.S. export
> laws and regulations.
> >>
> >>      [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > --
> > Michael
> > http://www.dewey.myzen.co.uk/home.html
> > This e-mail message may contain privileged and/or confidential
> > information, and is intended to be received only by persons entitled
> > to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and all attachments
> from any servers, hard drives or any other media. Other use of this e-mail
> by you is strictly prohibited.
> >
> > All e-mails and attachments sent and received are subject to
> > monitoring, reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> > Monsanto, along with its subsidiaries, accepts no liability for any
> > damage caused by any such code transmitted by or accompanying this
> e-mail or any attachment.
> >
> >
> > The information contained in this email may be subject to the export
> > control laws and regulations of the United States, potentially
> > including but not limited to the Export Administration Regulations
> > (EAR) and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all applicable U.S. export
> laws and regulations.
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
> This e-mail message may contain privileged and/or confidential
> information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other
> use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list