[BioC] How to do k-fold validation using SVM

Robert Gentleman rgentlem@jimmy.harvard.edu
Fri, 24 Jan 2003 09:09:34 -0500


On Fri, Jan 24, 2003 at 01:26:07PM -0000, Stephen Henderson wrote:
> No not before you start but after each fold, so that each training round
> uses a slightly different set of genes/features.

 Typically you need to do some filtering (what I've been calling
 non-specific ) before any model fitting. Genes that show little
 variation across samples are not interesting and can be excluded.

 Then inside of cv, I usually do something like:
   cvX <- function(data, filter, otherargs)
 where filter is a function that takes an exprSet and returns the
   appropriate subset.
 On each iteration, apply filter to the training set, and then build
   the model, and test.
 
 If you make the function a parameter to the cv function then you can
 change your gene selection method (say from t-test to ROC) without
 having to do much more than write a new gene selection method.



> 
> -----Original Message-----
> From: Robert Gentleman [mailto:rgentlem@jimmy.harvard.edu] 
> Sent: Friday, January 24, 2003 1:16 PM
> To: Stephen Henderson
> Subject: Re: [BioC] How to do k-fold validation using SVM
> 
> On Fri, Jan 24, 2003 at 01:08:23PM -0000, Stephen Henderson wrote:
> > Is there a simple??? way to do a gene/feature selection for each round of
> > cross validation-- using the ipred errorest function?
> > 
>   do you mean take a subset before you start? there is a whole package
>   called genefilter that does all sorts of things in that regard?
> 
>   robert
> 
> 
> > I do not mean select some set of genes and then do a cv on this subset,
> but
> > rather to reselect the subset for each fold?
> > 
> > I had written a rather long winded loop previous to this posting (had
> missed
> > ipred) but now wonder if there is a shortcut?
> > 
> > -----Original Message-----
> > From: Torsten Hothorn [mailto:Torsten.Hothorn@rzmail.uni-erlangen.de] 
> > Sent: Friday, January 24, 2003 7:13 AM
> > To: Adaikalavan Ramasamy
> > Cc: Song, Guangchun; bioconductor@stat.math.ethz.ch
> > Subject: RE: [BioC] How to do k-fold validation using SVM
> > 
> > On Fri, 24 Jan 2003, Adaikalavan Ramasamy wrote:
> > 
> > > You might want to use the function svm() in the e1071 library with the
> > > option 'cross'.
> > >
> > > Or you can manually break the dataset into k subsets and write a loop.
> > > This might be better if you prefer to do stratified sampling for the
> > > fold rather than random sampling.
> > >
> > 
> > or you can use the "errorest" function in the ipred-package (see R News
> > 2(2) for examples)
> > 
> > Torsten
> > 
> > > -----Original Message-----
> > > From: Song, Guangchun [mailto:Guangchun.Song@stjude.org]
> > > Sent: Friday, January 24, 2003 7:35 AM
> > > To: bioconductor@stat.math.ethz.ch
> > > Subject: [BioC] How to do k-fold validation using SVM
> > >
> > >
> > >
> > > Did anyone know how to do the k-fold validation on the training data set
> > > by SVM?
> > >
> > > Thanks.
> > >
> > >
> > > Guangchun
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > >
> > >
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > 
> > 
> > **********************************************************************
> > This email and any files transmitted with it are confidential an ...
> [[dropped]]
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> -- 
> +---------------------------------------------------------------------------
> +
> | Robert Gentleman                 phone : (617) 632-5250
> |
> | Associate Professor              fax:   (617)  632-2444
> |
> | Department of Biostatistics      office: M1B20
> | Harvard School of Public Health  email: rgentlem@jimmy.dfci.harvard.edu
> |
> +---------------------------------------------------------------------------
> +
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

-- 
+---------------------------------------------------------------------------+
| Robert Gentleman                 phone : (617) 632-5250                   |
| Associate Professor              fax:   (617)  632-2444                   |
| Department of Biostatistics      office: M1B20
| Harvard School of Public Health  email: rgentlem@jimmy.dfci.harvard.edu   |
+---------------------------------------------------------------------------+