[R] Linear Regression Problem

Ravi Varadhan RVaradhan at jhmi.edu
Tue Jul 14 17:48:49 CEST 2009


I am not sure that you really want to do separate regressions for each row
of X, with the same y.  This does not make much sense.  

Why do you think multiple linear regression is not possible just because X'X
is not invertible?  You have 2 main options here:

1.  Obtain a minimum-norm solution using SVD (also known as Moore-Penrose
inverse). This solution minimizes ||y - Xb|| subject to minimum ||b||
2.  Obtain a regularized solution such as the ridge-regression, as Vito
suggested.

You can do (1) as follows:

	require(MASS)

	soln <- ginv(X, y)

Here is an example:

	X <- matrix(rnorm(1000), 10, 100)  # matrix with rank = 10

	b <- rep(1, 100)

	y <- crossprod(t(X), b)

	soln <- c(ginv(X) %*% y)  # this will not be close to b
 
Hope this helps,
Ravi.


----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Alex Roy
Sent: Tuesday, July 14, 2009 11:29 AM
To: Vito Muggeo (UniPa)
Cc: r-help at r-project.org
Subject: Re: [R] Linear Regression Problem

Dear Vito,
                Thanks for your comments. But I want to do Simple linear
regression not Multiple Linear regression. Multiple Linear regression is not
possible here as number of variables are much more than samples.( X is ill
condioned, inverse of X^TX does not exist! ) I just want to take one
predictor variable and regress on y and store regression coefficients, p
values and R^2 values. And the loop go up to 40,000 predictors.

Alex
On Tue, Jul 14, 2009 at 5:18 PM, Vito Muggeo (UniPa)
<vito.muggeo at unipa.it>wrote:

> dear Alex,
> I think your problem with a large number of predictors and a 
> relatively small number of subjects may be faced via some 
> regularization approach (ridge or lasso regression..)
>
> hope this helps you,
> vito
>
> Alex Roy ha scritto:
>
>>  Dear All,
>>                 I have a matrix  say, X ( 100 X 40,000) and a vector 
>> say, y (100 X 1) . I want to perform linear regression. I have scaled  
>> X matrix by using scale () to get mean zero and s.d 1  . But still I 
>> get very high values of regression coefficients.  If I scale X 
>> matrix, then the regression coefficients will bahave as a correlation 
>> coefficient and they should not be more than 1. Am I right? I do not 
>> whats going wrong.
>> Thanks for your help.
>> Alex
>>
>>
>> *Code:*
>>
>> UniBeta <- sapply(1:dim(X)[2], function(k)
>> + summary(lm(y~X[,k]))$coefficients[2,1])
>>
>> pval <- sapply(1:dim(X)[2], function(l)
>> + summary(lm(y~X[,l]))$coefficients[2,4])
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting
-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> ====================================
> Vito M.R. Muggeo
> Dip.to Sc Statist e Matem `Vianelli'
> Universit` di Palermo
> viale delle Scienze, edificio 13
> 90128 Palermo - ITALY
> tel: 091 6626240
> fax: 091 485726/485612
> http://dssm.unipa.it/vmuggeo
> ====================================
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list