[R] data frame is killing me! help

James W. MacDonald jmacdon at med.umich.edu
Fri Oct 23 16:00:16 CEST 2009



bbslover wrote:
> 
> 
> Steve Lianoglou-6 wrote:
>> Hi,
>>
>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>
>>> Usage
>>> data(gasoline)
>>> Format
>>> A data frame with 60 observations on the following 2 variables.
>>> octane
>>> a numeric vector. The octane number.
>>> NIR
>>> a matrix with 401 columns. The NIR spectrum
>>>
>>> and I see the gasoline data to see below
>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696  
>>> nm
>>> NIR.1698 nm NIR.1700 nm
>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913  
>>> 1.221135
>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985  
>>> 1.198851
>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321  
>>> 1.208742
>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655  
>>> 1.206696
>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864  
>>> 1.202926
>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763  
>>> 1.207576
>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273  
>>> 1.200446
>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947  
>>> 1.188174
>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883  
>>> 1.196102
>>>
>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>>> 1694 nm
>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>
>>> how can I add letters NIR to my variable, because my 600  
>>> independents never
>>> have NIR as the prefix. however, it is needed to model the plsr.   for
>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is  
>>> necessary, how
>>> can I do with it?
>> I'm not really sue that I'm getting you, but if your problem is that  
>> the column names of your data.frame don't match the variable names  
>> you'd like to use in your formula, just change the colnames of your  
>> data.frame to match your formula.
>>
>> BTW - I have no idea where to get this gasoline data set, so I'm just  
>> imagining:
>>
>> eg.
>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',  
>> 'you', 'want', 'here')
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>    |  Memorial Sloan-Kettering Cancer Center
>>    |  Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 
> thanks for you. but the numbers of indenpendence are so many, it is not easy
> to identify them one by one,  is there some better way?

You don't need to identify anything. What you need to do is read the 
help page for the function you want to use, so you (at the very least) 
know how to use the function.

 > library(pls)
 > data(gasoline)
 > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
 > summary(fit)
Data: 	X dimension: 60 401
	Y dimension: 60 1
Fit method: kernelpls
Number of components considered: 53

VALIDATION: RMSEP
Cross-validated using 10 random segments.
        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
CV           1.543    1.372   0.3827   0.2522   0.2347   0.2455   0.2281
adjCV        1.543    1.367   0.3740   0.2497   0.2360   0.2407   0.2243
        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
CV      0.2311   0.2352   0.2455    0.2534    0.2737    0.2814    0.2832
adjCV   0.2257   0.2303   0.2395    0.2473    0.2646    0.2705    0.2726
        14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20 comps
CV       0.2913    0.2932    0.2985    0.3137    0.3289    0.3323    0.3391
adjCV    0.2808    0.2821    0.2863    0.3008    0.3141    0.3172    0.3228
        21 comps  22 comps  23 comps  24 comps  25 comps  26 comps  27 comps
CV       0.3476    0.3384    0.3316    0.3213    0.3155    0.3118    0.3062
adjCV    0.3307    0.3217    0.3154    0.3057    0.3002    0.2964    0.2908
        28 comps  29 comps  30 comps  31 comps  32 comps  33 comps  34 comps
CV       0.3033    0.3034    0.3074    0.3083    0.3094    0.3087    0.3105
adjCV    0.2881    0.2881    0.2917    0.2926    0.2936    0.2929    0.2946
        35 comps  36 comps  37 comps  38 comps  39 comps  40 comps  41 comps
CV       0.3108    0.3106    0.3105    0.3104    0.3104    0.3105    0.3105
adjCV    0.2949    0.2947    0.2946    0.2945    0.2945    0.2945    0.2946
        42 comps  43 comps  44 comps  45 comps  46 comps  47 comps  48 comps
CV       0.3105    0.3105    0.3105    0.3105    0.3105    0.3105    0.3105
adjCV    0.2946    0.2946    0.2946    0.2946    0.2946    0.2946    0.2946
        49 comps  50 comps  51 comps  52 comps  53 comps
CV       0.3105    0.3105    0.3105    0.3105    0.3105
adjCV    0.2946    0.2946    0.2946    0.2946    0.2946

TRAINING: % variance explained
         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps 
8 comps
X         70.97    78.56    86.15     95.4    96.12    96.97    97.32 
   98.1
octane    31.90    94.66    97.71     98.0    98.68    98.93    99.06 
   99.1
         9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
X         98.32     98.71     98.84     99.00     99.21     99.46     99.52
octane    99.20     99.24     99.36     99.44     99.49     99.51     99.58
         16 comps  17 comps  18 comps  19 comps  20 comps  21 comps  22 
comps
X          99.57     99.64     99.68     99.76     99.78     99.82     99.84
octane     99.65     99.69     99.78     99.81     99.86     99.89     99.92
         23 comps  24 comps  25 comps  26 comps  27 comps  28 comps  29 
comps
X          99.88     99.91     99.92     99.93     99.94     99.95     99.96
octane     99.93     99.94     99.95     99.97     99.98     99.99     99.99
         30 comps  31 comps  32 comps  33 comps  34 comps  35 comps  36 
comps
X          99.96     99.97     99.97     99.98     99.98     99.98     99.98
octane     99.99    100.00    100.00    100.00    100.00    100.00    100.00
         37 comps  38 comps  39 comps  40 comps  41 comps  42 comps  43 
comps
X          99.99     99.99     99.99     99.99       100       100       100
octane    100.00    100.00    100.00    100.00       100       100       100
         44 comps  45 comps  46 comps  47 comps  48 comps  49 comps  50 
comps
X            100       100       100       100       100       100       100
octane       100       100       100       100       100       100       100
         51 comps  52 comps  53 comps
X            100       100       100
octane       100       100       100


> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826




More information about the R-help mailing list