[R] qqnorm & huge datasets
pdalgd at gmail.com
Thu Dec 22 19:09:15 CET 2011
On Dec 22, 2011, at 18:17 , Sam Steingold wrote:
>> * peter dalgaard <cqnytq at tznvy.pbz> [2011-12-21 23:59:18 +0100]:
>> On Dec 21, 2011, at 23:10 , Sam Steingold wrote:
>>> When qqnorm on a vector of length 10M+ I get a huge pdf file which
>>> cannot be loaded by acroread or evince.
>>> Any suggestions? (apart from sampling the data).
>> Sample intelligently? Things like
>>> qq <- seq(-4,4,,10001)
> Perfect! Thanks!
> m <- mean(x); s <- sd(x);
> qq <- seq(min(x), max(x),, sqrt(length(x)));
> qu <- quantile(x, pnorm(qq, mean=m, sd=s));
> qqplot(qq, qu, type="l", xlab=paste("normal(",m,",",s,")"),
> ylab="log scaled weights",
> main="log scaled weight quantile");
I tried this with exponentially distributed x, and it did reveal a weakness. If qq has values way off the normal range, you end up with the last bit of your curve being a horizontal line through max(x) because pnorm(qq,...) is essentially 1.00.
So somehow you should restrict the range of qq to what is compatible with a normal distribution, rather than what is observed in data.
> Now, how do I add the perfect line there?
abline(0,1), perhaps? Or maybe retrace the logic of qqline and work out the line through the quartiles. Lessee... Does this do it?
qua <- quantile(x, c(.25,.75))
slope <- diff(qua)/diff(qnorm(c(.25.,75),mean=m,sd=s)
int <- mean(qua)-slope*m
> Why do neither qqline(qq) nor qqline(qu) add anything to the plot?
Why should they? I suspect that if you do qqnorm(qq) and qqnorm(qu), you'll realize that the scales don't match...
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://www.memritv.org http://palestinefacts.org
> http://thereligionofpeace.com http://mideasttruth.com http://pmw.org.il
> nobody's life, liberty or property are safe while the legislature is in session
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help