[R] data frame is killing me! help

bbslover dluthm at yeah.net
Fri Oct 23 18:43:56 CEST 2009


I have read that one ,I want to this method to be used to my data.but I donot
know how to put my data into R. 

James W. MacDonald wrote:
> 
> 
> 
> bbslover wrote:
>> 
>> 
>> Steve Lianoglou-6 wrote:
>>> Hi,
>>>
>>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>> Usage
>>>> data(gasoline)
>>>> Format
>>>> A data frame with 60 observations on the following 2 variables.
>>>> octane
>>>> a numeric vector. The octane number.
>>>> NIR
>>>> a matrix with 401 columns. The NIR spectrum
>>>>
>>>> and I see the gasoline data to see below
>>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696  
>>>> nm
>>>> NIR.1698 nm NIR.1700 nm
>>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913  
>>>> 1.221135
>>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985  
>>>> 1.198851
>>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321  
>>>> 1.208742
>>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655  
>>>> 1.206696
>>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864  
>>>> 1.202926
>>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763  
>>>> 1.207576
>>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273  
>>>> 1.200446
>>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947  
>>>> 1.188174
>>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883  
>>>> 1.196102
>>>>
>>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>>>> 1694 nm
>>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>>
>>>> how can I add letters NIR to my variable, because my 600  
>>>> independents never
>>>> have NIR as the prefix. however, it is needed to model the plsr.   for
>>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is  
>>>> necessary, how
>>>> can I do with it?
>>> I'm not really sue that I'm getting you, but if your problem is that  
>>> the column names of your data.frame don't match the variable names  
>>> you'd like to use in your formula, just change the colnames of your  
>>> data.frame to match your formula.
>>>
>>> BTW - I have no idea where to get this gasoline data set, so I'm just  
>>> imagining:
>>>
>>> eg.
>>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',  
>>> 'you', 'want', 'here')
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>    |  Memorial Sloan-Kettering Cancer Center
>>>    |  Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> 
>> thanks for you. but the numbers of indenpendence are so many, it is not
>> easy
>> to identify them one by one,  is there some better way?
> 
> You don't need to identify anything. What you need to do is read the 
> help page for the function you want to use, so you (at the very least) 
> know how to use the function.
> 
>  > library(pls)
>  > data(gasoline)
>  > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
>  > summary(fit)
> Data: 	X dimension: 60 401
> 	Y dimension: 60 1
> Fit method: kernelpls
> Number of components considered: 53
> 
> VALIDATION: RMSEP
> Cross-validated using 10 random segments.
>         (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
> CV           1.543    1.372   0.3827   0.2522   0.2347   0.2455   0.2281
> adjCV        1.543    1.367   0.3740   0.2497   0.2360   0.2407   0.2243
>         7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
> CV      0.2311   0.2352   0.2455    0.2534    0.2737    0.2814    0.2832
> adjCV   0.2257   0.2303   0.2395    0.2473    0.2646    0.2705    0.2726
>         14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20
> comps
> CV       0.2913    0.2932    0.2985    0.3137    0.3289    0.3323   
> 0.3391
> adjCV    0.2808    0.2821    0.2863    0.3008    0.3141    0.3172   
> 0.3228
>         21 comps  22 comps  23 comps  24 comps  25 comps  26 comps  27
> comps
> CV       0.3476    0.3384    0.3316    0.3213    0.3155    0.3118   
> 0.3062
> adjCV    0.3307    0.3217    0.3154    0.3057    0.3002    0.2964   
> 0.2908
>         28 comps  29 comps  30 comps  31 comps  32 comps  33 comps  34
> comps
> CV       0.3033    0.3034    0.3074    0.3083    0.3094    0.3087   
> 0.3105
> adjCV    0.2881    0.2881    0.2917    0.2926    0.2936    0.2929   
> 0.2946
>         35 comps  36 comps  37 comps  38 comps  39 comps  40 comps  41
> comps
> CV       0.3108    0.3106    0.3105    0.3104    0.3104    0.3105   
> 0.3105
> adjCV    0.2949    0.2947    0.2946    0.2945    0.2945    0.2945   
> 0.2946
>         42 comps  43 comps  44 comps  45 comps  46 comps  47 comps  48
> comps
> CV       0.3105    0.3105    0.3105    0.3105    0.3105    0.3105   
> 0.3105
> adjCV    0.2946    0.2946    0.2946    0.2946    0.2946    0.2946   
> 0.2946
>         49 comps  50 comps  51 comps  52 comps  53 comps
> CV       0.3105    0.3105    0.3105    0.3105    0.3105
> adjCV    0.2946    0.2946    0.2946    0.2946    0.2946
> 
> TRAINING: % variance explained
>          1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps 
> 8 comps
> X         70.97    78.56    86.15     95.4    96.12    96.97    97.32 
>    98.1
> octane    31.90    94.66    97.71     98.0    98.68    98.93    99.06 
>    99.1
>          9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15
> comps
> X         98.32     98.71     98.84     99.00     99.21     99.46    
> 99.52
> octane    99.20     99.24     99.36     99.44     99.49     99.51    
> 99.58
>          16 comps  17 comps  18 comps  19 comps  20 comps  21 comps  22 
> comps
> X          99.57     99.64     99.68     99.76     99.78     99.82    
> 99.84
> octane     99.65     99.69     99.78     99.81     99.86     99.89    
> 99.92
>          23 comps  24 comps  25 comps  26 comps  27 comps  28 comps  29 
> comps
> X          99.88     99.91     99.92     99.93     99.94     99.95    
> 99.96
> octane     99.93     99.94     99.95     99.97     99.98     99.99    
> 99.99
>          30 comps  31 comps  32 comps  33 comps  34 comps  35 comps  36 
> comps
> X          99.96     99.97     99.97     99.98     99.98     99.98    
> 99.98
> octane     99.99    100.00    100.00    100.00    100.00    100.00   
> 100.00
>          37 comps  38 comps  39 comps  40 comps  41 comps  42 comps  43 
> comps
> X          99.99     99.99     99.99     99.99       100       100      
> 100
> octane    100.00    100.00    100.00    100.00       100       100      
> 100
>          44 comps  45 comps  46 comps  47 comps  48 comps  49 comps  50 
> comps
> X            100       100       100       100       100       100      
> 100
> octane       100       100       100       100       100       100      
> 100
>          51 comps  52 comps  53 comps
> X            100       100       100
> octane       100       100       100
> 
> 
>> 
>> 
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26029667.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list