[R] selecting significant predictors from ANOVA result

Ista Zahn istazahn at gmail.com
Thu Jan 28 21:00:57 CET 2010


Hi Ram,
As others have pointed out, writing the code is the least of your
problems. In case this isn't sinking in, try the following exercise:

set.seed(10)
P <- vector()
DF <- as.data.frame(matrix(rep(NA, 100000), nrow=100))
names(DF) <- c(paste("x",1:999, sep=""), "y")

for(i in 1:1000) {
  DF[,i] <- rnorm(100)
}

for(i in 1:999) {
  P[i] <- summary(lm(DF$y ~ DF[,i]))$coefficients[2,4]
}

which(P < .05)

Notice that the variables in the data set DF are random numbers. The
fact that 53 of them are 'significantly' correlated with y at p < .05
doesn't change that. So in this example, those 53 "significant"
predictors are meaningless. And your actual problem is even worse than
this example, because you're running way more than 999 models.

As has already been suggested, it's time to consult a statistician.

-Ista

On Thu, Jan 28, 2010 at 3:39 AM, ram basnet <basnetabc at yahoo.com> wrote:
> Dear Sir,
>
> Thanks for your message. My problem is in writing codes. I did ANOVA for 75000 response variables (let's say Y) with 243 predictors (let's say X-matrix) one by one with "for" loop in R. I stored the p-values of all predictors, however, i have very huge file because i have pvalues of 243 predictors for all 75000 Y-variables.
> Now, i want to find some codes that autamatically select only significant X-predictors from the whole list. If you have ideas on that, it will be great help.
> Thanks in advances
>
> Sincerely,
> Ram
>
> --- On Wed, 1/27/10, Bert Gunter <gunter.berton at gene.com> wrote:
>
>
> From: Bert Gunter <gunter.berton at gene.com>
> Subject: RE: [R] selecting significant predictors from ANOVA result
> To: "'ram basnet'" <basnetabc at yahoo.com>, "'R help'" <r-help at r-project.org>
> Date: Wednesday, January 27, 2010, 7:56 AM
>
>
> Ram:
>
> You do not say how many cases (rows in your dataset) you have, but I suspect
> it may be small (a few hundred, say).
>
> In any case, what you describe is probably just a complicated way to
> generate random numbers -- it is **highly** unlikely that any meaningful,
> replicable scientific results would result from your proposed approach.
>
> Not surprising -- this appears to be a very difficult data analysis issue.
> It is obvious that you have only a minimal statistical background, so I
> would strongly recommend that you find a competent local statistician to
> help you with your work. Remote help from this list is wholly inadequate.
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of ram basnet
> Sent: Wednesday, January 27, 2010 2:52 AM
> To: R help
> Subject: [R] selecting significant predictors from ANOVA result
>
> Dear all,
>
> I did ANOVA for many response variables (Var1, Var2, ....Var75000), and i
> got the result of p-value like below. Now, I want to select those
> predictors, which have pvalue less than or equal to 0.05 for each response
> variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and
> similarly, X1, X2.......X5 in case of Var2, only X1 in case of Var3 and none
> of the predictors in case of Var4.
>
>
>
>
>
>
>
> predictors
> Var1
> Var2
> Var3
> Var4
>
> X1
> 0.00005
> 0.001
> 0.05
> 0.36
>
> X2
> 0.0001
> 0.001
> 0.09
> 0.37
>
> X3
> 0.0002
> 0.005
> 0.13
> 0.38
>
> X4
> 0.0003
> 0.01
> 0.17
> 0.39
>
> X5
> 0.01
> 0.05
> 0.21
> 0.4
>
> X6
> 0.05
> 0.0455
> 0.25
> 0.41
>
> X7
> 0.038063
> 0.0562
> 0.29
> 0.42
>
> X8
> 0.04605
> 0.0669
> 0.33
> 0.43
>
> X9
> 0.054038
> 0.0776
> 0.37
> 0.44
>
> X10
> 0.062025
> 0.0883
> 0.41
> 0.45
>
> I have very large data sets (# of response variables = ~75,000). So, i need
> some kind of automated procedure. But i have no ideas.
> If i got help from some body, it will be great for me.
>
> Thanks in advance.
>
> Sincerely,
>
> Ram Kumar Basnet,
> Ph. D student
> Wageningen University,
> The Netherlands.
>
>
>
>
>
>     [[alternative HTML version deleted]]
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list