[R] Finding non-normal distributions per row of data frame?

Greg Snow Greg.Snow at imail.org
Sat Feb 5 01:23:52 CET 2011


Have you looked into bioconductor?  There is a separate mailing list and many packages designed for genetic analysis within the bioconductor project.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of DB1984
> Sent: Friday, February 04, 2011 4:21 PM
> To: r-help at r-project.org
> Subject: Re: [R] Finding non-normal distributions per row of data
> frame?
> 
> 
> Greg, Dennis - thanks for your input, I really appreciate the feedback,
> as it
> is not easy to source.
> 
> In terms of the data; I've described it as 20 columns, which is the
> smallest
> dataset, but this can run to 320 columns, so in some cases there is
> likely
> to be enough power to detect non-normality. That said, a better
> solution
> would be useful.
> 
> As a first approximation, I looked at the mean/median ratio to indicate
> simple skew in the data - which suggested that most of the data was
> normally
> distributed. I took the 'nuggets' to be those with a mean/median ratio
> in
> the top or bottom 1% of the data. This was a small group - overall the
> data
> appears relatively normally distributed within rows.
> 
> The aim is really to find those nuggets with significantly non-normal
> distributions. My hope was to be able to take the tails of the p-values
> for
> Shapiro-Wilk, or some similar test, and find these enriched with
> nuggets.
> This may not be an appropriately robust approach - but is there a
> better
> option?
> 
> One idea was to sort the data in each row, and perform a linear
> regression.
> For normal distributions I am expecting the intercept to be close to
> the
> mean. Using the (intercept-mean) and p-values for the fit of the
> regression
> was again another way to filter out the nuggets in the dataset.
> 
> If it helps, the nuggets I am expecting are either grouped 80% grouped
> around the mean with 20% forming a uni-directional tail, or an
> approximate
> bimodal distribution.
> 
> As I'd imagine is obvious - I don't have an ideal solution to finding
> these
> nuggets, and so coming up with the R code to do so is harder still. If
> anybody has insight into this sort of problem, and can point me in the
> direction of further reading, that would be helpful. If there is a
> ready-made solution, even better!
> 
> As I said, thanks for your time with this...
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3261203.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list