[R] Defining Variables from a Matrix for 10-Fold Cross Validation

Zach Simpson zp@|mp@o @end|ng |rom gm@||@com
Thu Oct 11 05:37:06 CEST 2018


Hey Matthew,

In addition to what's been mentioned, you may want to look at the
'caret' package, as it provides a nice system for whatever flavor of
cross-validation you're after *and* has a built-in method for `kknn`:

http://topepo.github.io/caret/available-models.html

Hope this helps,
Zach Simpson

On October 9, 2018 15:34:15 -0700, David Winsemius
<dwinsemius using comcast.net> wrote:
> Message: 26
> Date: Tue, 9 Oct 2018 15:34:15 -0700
> From: David Winsemius <dwinsemius using comcast.net>
> To: matthew campbell <mcc3qb using virginia.edu>
> Cc: R-help using r-project.org
> Subject: Re: [R]  Defining Variables from a Matrix for 10-Fold Cross
>         Validation
> Message-ID: <85DC895F-BEA2-4E47-ACC1-49A5C350B2D8 using comcast.net>
> Content-Type: text/plain; charset="us-ascii"
>
>
> > On Oct 9, 2018, at 3:04 PM, matthew campbell <mcc3qb using virginia.edu> wrote:
> >
> > Good afternoon,
> >
> > I am trying to run a 10-fold CV, using a matrix as my data set.
> > Essentially, I want "y" to be the first column of the matrix, and my "x" to
> > be all remaining columns (2-257). I've posted some of the code I used
> > below, and the data set (called "zip.train") is in the "ElemStatLearn"
> > package. The error message is highlighted in red, and the corresponding
> > section of code is bolded. (I am not concerned with the warning message,
> > just the error message).
> >
> > The issue I am experiencing is the error message below the code: I haven't
> > come across that specific message before, and am not exactly sure how to
> > interpret its meaning. What exactly is this error message trying to tell
> > me?  Any suggestions or insights are appreciated!
> >
> > Thank you all,
> >
> > Matthew Campbell
> >
> >
> >> library (ElemStatLearn)
> >> library(kknn)
> >> data(zip.train)
> >> train=zip.train[which(zip.train[,1] %in% c(2,3)),]
> >> test=zip.test[which(zip.test[,1] %in% c(2,3)),]
> >> nfold = 10
> >> infold = sample(rep(1:10, length.out = (x)))
>
> I don't see a definition for x.
>
> > Warning message:
> > In rep(1:10, length.out = (x)) :
> >  first element used of 'length.out' argument
>
> But apparently it las a length greater than 1 and your are getting a sample whose length is specified by the first element of x.
>
>
> >>
> > *> mydata = data.frame(x = train[ , c(2,257)] , y = train[ , 1])*
> >>
> >> K = 20
> >> errorMatrix = matrix(NA, K, 10)
> >>
> >> for (l in nfold)
> > + {
> > +   for (k in 1:20)
> > +   {
> > +     knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test =
> > mydata[infold == l, ], k = k)
> > +     errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold ==
> > l])^2)
> > +   }
> > + }
> > Error in model.frame.default(formula, data = train) :
> >  variable lengths differ (found for 'x')
>
> So the warning above is probably a great clue to the source of this error.
>
> Morale of the tale: Always read the warnings, even if your code proceeds.
>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> "The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts." - Bertrand Russell




More information about the R-help mailing list