[R] 2 D density plot interpretation and manipulating the data

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Fri Oct 9 17:24:44 CEST 2020


Hi Abby,

thank you for getting back to me and for this useful information.

I'm trying to detect the outliers in my distribution based of mean and
variance. Can I see that from the plot I provided? Would outliers be
outside of ellipses? If so how do I extract those from my data frame,
based on which parameter?

So I am trying to connect outliers based on what the plot is showing:
s <- ggplot(SNP, mapping = aes(x = mean, y = var))
s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")

versus what is in the data:

> head(SNP)
               mean      var     sd
FQC.10090295 0.0327 0.002678 0.0517
FQC.10119363 0.0220 0.000978 0.0313
FQC.10132112 0.0275 0.002088 0.0457
FQC.10201128 0.0169 0.000289 0.0170
FQC.10208432 0.0443 0.004081 0.0639
FQC.10218466 0.0116 0.000131 0.0115
...

the distribution is not normal, it is right-skewed.

Cheers,
Ana

On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdle.a using gmail.com> wrote:
>
> > My understanding is that this represents bivariate normal
> > approximation of the data which uses the kernel density function to
> > test for inclusion within a level set. (please correct me)
>
> You can fit a bivariate normal distribution by computing five parameters.
> Two means, two standard deviations (or two variances) and one
> correlation (or covariance) coefficient.
> The bivariate normal *has* elliptical contours.
>
> A kernel density estimate is usually regarded as an estimate of an
> unknown density function.
> Often they use a normal (or Gaussian) kernel, but I wouldn't describe
> them as normal approximations.
> In general, bivariate kernel density estimates do *not* have
> elliptical contours.
> But in saying that, if the data is close to normality, then contours
> will be close to elliptical.
>
> Kernel density estimates do not test for inclusion, as such.
> (But technically, there are some exceptions to that).
>
> I'm not sure what you're trying to achieve here.



More information about the R-help mailing list