[R] Inverse normal transformation of ranked data
data_analyst20bhrl at yahoo.com
data_analyst20bhrl at yahoo.com
Fri May 20 23:06:18 CEST 2016
I am using ddply on a data set that contains 2+ million rows; trying to rank the values of a variable within groups, and then transform the ranks to (approximate) z-scores --- i.e generate quantiles on the normal scale.
Here is some sample data for one group:x <- NA 0.3640951 0.1175880 0.3453916 0.4214050 0.7469022 0.1091423 0.6099482 NA NA 0.6786140 0.1785854 0.9750262 NA
I have tried the following two alternatives:
(1) Using the qnorm function from the stats package in conjunction with the percent_rank function from the dplyr package:For example:
y <- qnorm(percent_rank(x))
This produces -Inf and Inf for the extreme values in the sample data. This issue is resolved if I use the rank function from the stats package instead, for example:y <- qnorm(rank(x, na.last = "keep", ties.method = "average")/length(x))
but if there are no NAs in a certain group, the upper extreme data point is still evaluated to Inf.
(2) Using the ztransform function from the GenABEL package:
For example:
y <- ztransform(percent_rank(x))
This preserves the extreme values but produces one of the following types of errors when used on my full data set.
Error in ztransform(x) : trait is binary
ORError in ztransform(x) : trait is monomorphic
I suspect these errors may be due to the fact that there are very few observations and/or several missing values (NAs) within certain groups, but I am not sure since there are several hundred groups.
Is there a better way?
Sent from Yahoo Mail. Get the app
[[alternative HTML version deleted]]
More information about the R-help
mailing list