[R] File normalization

Phil Spector spector at stat.berkeley.edu
Tue May 25 19:57:03 CEST 2010


The scale function can use whatever vector you choose for
subtraction and division.  (It's basically a wrapper for
the sweep function.) For example, to subtract the 
median and divide by the median absolute deviation, use

scale(x,center=apply(x,2,median),scale=apply(x,2,mad))

Either the center= or scale= arguments can be omitted if
you only want to divide or subtract.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu





On Tue, 25 May 2010, Joris Meys wrote:

> Scale is written to do that IF you want to normalize according to the mean
> and the sd. For any other form of normalization, apply or sweep constructs
> will have to be used.
>
> I couldn't really see a way of using the absolute median value in a
> sweep-statement.
>
> On Tue, May 25, 2010 at 7:11 PM, Bert Gunter <gunter.berton at gene.com> wrote:
>
>> ?scale
>>
>> is specifically written for this. See also ?sweep
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> On
>> Behalf Of Joris Meys
>> Sent: Tuesday, May 25, 2010 9:54 AM
>> To: cobbler_squad
>> Cc: r-help at r-project.org
>> Subject: Re: [R] File normalization
>>
>> My code substracts the median absolute value. If you want to divide by it,
>> the code must be :
>> apply(some_dataset,2,function(
>>>
>>> x){
>>>    x/median(abs(x))
>>> })
>>
>>
>> Thanks to Peter Langfelder for pointing out my mistake.
>>
>> On Tue, May 25, 2010 at 6:24 PM, Joris Meys <jorismeys at gmail.com> wrote:
>>
>>> What kind of normalization do you want to do?
>>> If you want to divide all columns by the median absolute value, try :
>>>
>>> apply(some_dataset,2,function(x){
>>>    x-median(abs(x))
>>> })
>>>
>>> also look at ?scale for normalization using the average and the sd.
>>> Cheers
>>> Joris
>>>
>>>
>>> On Tue, May 25, 2010 at 6:01 PM, cobbler_squad <la.foma at gmail.com>
>> wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> I have a file with 57 columns (671 time points in each column)
>>>>
>>>> File looks like this:
>>>> 1    0.279191   -1.203200e-02   -0.166772  6.12080e-02  0.196379
>>>> 4.591900e-02  0.293689
>>>> 2    0.267017   -1.150700e-02   -0.159463  5.85400e-02  0.187775
>>>> 4.392200e-02  0.280854
>>>> 3    0.053778   -2.322000e-03   -0.032103  1.18490e-02  0.037921
>>>> 8.867000e-03  0.056571
>>>> 4    0.035469   -1.531000e-03   -0.021166  7.79200e-03  0.024937
>>>> 5.843000e-03  0.037273
>>>> 5    0.040774   -1.761000e-03   -0.024342  8.96000e-03  0.028674
>>>> 6.726000e-03  0.042910
>>>> 6   -0.359709    1.547400e-02    0.214844 -7.87320e-02 -0.253034
>>>> -5.905100e-02 -0.378322
>>>>
>>>> I need to normalize it -- is it possible?
>>>>
>>>> I looked into normalize columns of a matrix to have the median absolute
>>>> value in R, but I am not sure how to apply it in this case. Would very
>>>> much
>>>> appreciate any input you could give me..
>>>>
>>>> Thank you all in advance,
>>>>
>>>> Cobbler
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/File-normalization-tp2230251p2230251.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Joris Meys
>>> Statistical Consultant
>>>
>>> Ghent University
>>> Faculty of Bioscience Engineering
>>> Department of Applied mathematics, biometrics and process control
>>>
>>> Coupure Links 653
>>> B-9000 Gent
>>>
>>> tel : +32 9 264 59 87
>>> Joris.Meys at Ugent.be
>>> -------------------------------
>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical Consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> Coupure Links 653
>> B-9000 Gent
>>
>> tel : +32 9 264 59 87
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> -- 
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list