[R] help please: put output into dataframe

David Winsemius dwinsemius at comcast.net
Fri Mar 18 16:20:05 CET 2011


On Mar 18, 2011, at 10:53 AM, Ram H. Sharma wrote:

> Thanks, Jim for the idea.
>
> I tried with save as list. I can not write to a table with  
> "write.table", I
> could not find a function that is write.list or equivalent. Even if  
> it is
> list I think it would be difficult to post-processing than as table.
>
> outx<- as.list(apply(datafr1, 2, fout))
> write.table (outx, "outlier.csv", sep=",")

Use `dump` to save  as an R an object that can later be `source`()- 
eded, which is what I think you want,
....  or `capture.output` to save as the text representation you would  
see at the console which would suffer from difficulty in restoring as  
an R object.

-- 
David.
>
> Ram
>
>
>
> On Fri, Mar 18, 2011 at 10:04 AM, jim holtman <jholtman at gmail.com>  
> wrote:
>
>> I think it was suggested that you save your output to a 'list' and
>> then you will have it in a format that can accept variable numbers of
>> items in each element and it is also in a form that you can easily
>> process it to create whatever other output you might need.
>>
>> On Fri, Mar 18, 2011 at 7:24 AM, Ram H. Sharma <sharma.ram.h at gmail.com 
>> >
>> wrote:
>>> Hi Dennis and R-users
>>>
>>> Thank you for more help. I am pretty close, but challenge still  
>>> remain is
>>> forcing the output with different length to output dataframe.
>>>
>>>> x <- data.frame(apply(datafr1, 2, fout))
>>> Error in data.frame(var1 = c(-0.70777998321315, 0.418602152926712,
>>> 2.08356737154810,  :
>>> arguments imply differing number of rows: 28, 12, 20, 19
>>>
>>> As I need to work with >2000 variables, my intension here is to  
>>> save this
>>> output to such way that it would be further manipulated. Topline  
>>> is to
>> save
>>> in dataframe that have extreme values for the variable concerned and
>>> bottomline is automate to save the output printed in the screen to a
>>> textfile.
>>>
>>> Thank you for help once again.
>>>
>>> Ram
>>>
>>>
>>> On Fri, Mar 18, 2011 at 3:16 AM, Dennis Murphy <djmuser at gmail.com>
>> wrote:
>>>
>>>> Hi:
>>>>
>>>> Is this what you're after?
>>>>
>>>> fout <- function(x) {
>>>>     lim <- median(x) + c(-2, 2) * mad(x)
>>>>     x[x < lim[1] | x > lim[2]]
>>>>   }
>>>>> apply(datafr1, 2, fout)
>>>> $var1
>>>> [1] 17.5462078 18.4548214  0.7083442  1.9207578 -1.2296787  
>>>> 17.4948240
>>>> [7] 19.5702558  1.6181150 20.9791652 -1.3542099  1.8215087  
>>>> -1.0296303
>>>> [13] 20.5237930 17.5366497 18.5657566  0.9335419 19.7519983  
>>>> 17.8607968
>>>> [19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409  
>>>> 19.6949309
>>>> [25] 1.9712347
>>>>
>>>> $var2
>>>> [1]  37.3822087  35.6490641  35.6000785  38.5981086  -1.6504275
>>>> 37.1419290
>>>> [7]  37.7605230  40.3508689   0.6639900   2.4695841  38.8209491
>>>> 39.9087921
>>>> [13]  38.9907585  35.8279437   2.7870799  37.0941113   0.6308583
>>>> 36.4556638
>>>> [19] -10.2384849   2.8480199  -7.7680457  35.7076539  -0.5467739
>>>> 3.4702765
>>>> [25]  40.4818580   3.2864273   1.4917174
>>>>
>>>> $var3
>>>> [1]  74.252563  68.396391  68.845461  -5.006545  66.083402   
>>>> 76.036577
>>>> [7]  75.112586  -6.374241  63.883549  64.041216 -19.764360  
>>>> -15.051017
>>>> [13]  -9.782767  64.696013  70.970648  -4.562031 -22.135003   
>>>> 70.549310
>>>> [19]  69.495915  -4.095587  86.612375  87.029526  70.072126   
>>>> -6.421695
>>>> [25] 65.737536
>>>>
>>>> $var4
>>>> [1]  81.476483  87.098767 -10.451616  91.927329  86.588952   
>>>> 85.080950
>>>> [7]  84.958645  -9.456368  86.270876 -22.936779  83.314032
>>>>
>>>> Double checks:
>>>>> apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
>>>>         var1      var2      var3      var4
>>>> [1,]  2.12167  3.779415 -3.736066 -3.471752
>>>> [2,] 17.37176 34.929800 62.969733 80.224799
>>>>> apply(datafr1, 2, range)
>>>>          var1      var2      var3      var4
>>>> [1,] -2.668841 -10.23848 -22.13500 -22.93678
>>>> [2,] 21.803714  40.48186  87.02953  91.92733
>>>>
>>>> Assuming you wanted to do this columnwise (by variable), it  
>>>> appears to
>> be
>>>> doing the right thing.
>>>>
>>>> HTH,
>>>> Dennis
>>>>
>>>>
>>>> On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma <sharma.ram.h at gmail.com
>>> wrote:
>>>>
>>>>> Dear R community members
>>>>>
>>>>> I have been struggling on this simple question, but never get
>> appropriate
>>>>> solution. So please help.
>>>>>
>>>>> # my data, though I have a large number of variables
>>>>> var1 <- rnorm(500, 10,4)
>>>>> var2 <- rnorm(500, 20, 8)
>>>>> var3 <- rnorm(500, 30, 18)
>>>>> var4 <- rnorm(500, 40, 20)
>>>>> datafr1 <- data.frame(var1, var2, var3, var4)
>>>>>
>>>>> # my unsuccessful codes
>>>>> nvar <- ncol(datafr1)
>>>>> for (i in 1:nvar) {
>>>>>             out1 <- NULL
>>>>>             out2 <- NULL
>>>>>             medianx <- median(getdata[,i], na.rm = TRUE)
>>>>>             show(madx <- mad(getdata[,i], na.rm = TRUE))
>>>>>             MD1 <- c(medianx + 2*madx)
>>>>>             MD2 <- c(medianx - 2*madx)
>>>>>             out1[i] <- which(getdata[,i] > MD1) # store data  
>>>>> that are
>>>>> greater than median + 2 mad
>>>>>             out2[i] <- which (getdata[,1] < MD2) # store data  
>>>>> that are
>>>>> greater than median - 2 mad
>>>>>            resultdf <- data.frame(out1, out2)
>>>>>            write.table (resultdf, "out.csv", sep=",")
>>>>>             }
>>>>>
>>>>>
>>>>> My idea here is to store those value which are either greater than
>> median
>>>>> +
>>>>> 2 *MAD or less than median - 2*MAD. Each variable have different  
>>>>> length
>> of
>>>>> output.
>>>>>
>>>>> The following last error message:
>>>>> Error in data.frame(out1, out2) :
>>>>> arguments imply differing number of rows: 2, 0
>>>>> In addition: Warning messages:
>>>>> 1: In out1[i] <- which(getdata[, i] > MD1) :
>>>>> number of items to replace is not a multiple of replacement length
>>>>> 2: In out2[i] <- which(getdata[, 1] < MD2) :
>>>>> number of items to replace is not a multiple of replacement length
>>>>> 3: In out1[i] <- which(getdata[, i] > MD1) :
>>>>> number of items to replace is not a multiple of replacement length
>>>>>
>>>>> Thank you in advance for helping me.
>>>>>
>>>>> Best regards;
>>>>> RHS
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html 
>>>>> >
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html 
>> >
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list