[R] Finding non-normal distributions per row of data frame?

DB1984 dannybolg at gmail.com
Sat Feb 5 00:21:01 CET 2011


Greg, Dennis - thanks for your input, I really appreciate the feedback, as it
is not easy to source.

In terms of the data; I've described it as 20 columns, which is the smallest
dataset, but this can run to 320 columns, so in some cases there is likely
to be enough power to detect non-normality. That said, a better solution
would be useful.

As a first approximation, I looked at the mean/median ratio to indicate
simple skew in the data - which suggested that most of the data was normally
distributed. I took the 'nuggets' to be those with a mean/median ratio in
the top or bottom 1% of the data. This was a small group - overall the data
appears relatively normally distributed within rows. 

The aim is really to find those nuggets with significantly non-normal
distributions. My hope was to be able to take the tails of the p-values for
Shapiro-Wilk, or some similar test, and find these enriched with nuggets.
This may not be an appropriately robust approach - but is there a better
option?

One idea was to sort the data in each row, and perform a linear regression.
For normal distributions I am expecting the intercept to be close to the
mean. Using the (intercept-mean) and p-values for the fit of the regression
was again another way to filter out the nuggets in the dataset.

If it helps, the nuggets I am expecting are either grouped 80% grouped
around the mean with 20% forming a uni-directional tail, or an approximate
bimodal distribution. 

As I'd imagine is obvious - I don't have an ideal solution to finding these
nuggets, and so coming up with the R code to do so is harder still. If
anybody has insight into this sort of problem, and can point me in the
direction of further reading, that would be helpful. If there is a
ready-made solution, even better!

As I said, thanks for your time with this...


-- 
View this message in context: http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3261203.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list