[R] Re gression using age and Duration of disease as a continous factors

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Jul 21 20:24:02 CEST 2009


> it looks like the experts
> individuals just come to poke fun at our expesense who has no  
> background of
> statistics.

This isn't really a fair statement ... I'd simply suggest to be  
mindful of what you ask. It was as if you couldn't be bothered to take  
the time to fully describe your problem (how was anybody supposed to  
deduce what you explained below from your original email??), but  
wanted other people to take their time and to understand what you want  
and do your work for you.

When you look at it that way, it's not a big surprise that you  
received some of the answers you received. Lastly, I'm not sure how  
true this is through and through, or how relevant it is to *this  
particular scenario* but when people post to a somehow-professional  
list such as this one, I'd think it's generally frowned upon to use  
some bizarre alias instead of a real name (my 2 cents, there).

In any event, perhaps we can all move on.

As a disclaimer, anything I say from here on out would require taking  
with a grain of salt:

> I have a 8 proteins and I have two groups with 840 samples in  
> control and
> 1140 samples in diseases further stratified by sex, draw age,  
> duration of
> disease. all these groups and sub groups is making the thing very  
> confusing
> as how to do the regression in R. the pupose is to show the changes  
> in the
> levels of these proteins as the disease progress or changes in their  
> levels
> with respect to progression in age, effect of gender, SNPs for these
> proteins, it is a pretty big dataset.

I'd start by trying to creating some clever graphics to see if you can  
eyeball any trends to see if you can get some juice out of further  
downstream analysis.

Anyway, I don't think there is a simple answer you can get from an  
email, and I'm a bit surprised that your statistician mentor doesn't  
have at least some idea of where to start. It sounds like you want to  
build some predictive model that uses the values in your predictor  
variables to predict some real valued expression of your protein(s) --  
and the problem is that there is no guarantee that you can do this  
with the data you have anyway (repeat after me: "research is fun").

That being said, one (overly) simple approach (there is no grouping/ 
subgrouping here) you can do is to use glmnet to and try to do lasso  
or elasticnet regression using all the factors you mention as  
predictor variables for the 8 different output vectors, which would be  
the individual expression of your proteins (so -- that's 8 different  
models you're trying to learn).

The hope is that the lasso will nuke some of the predictors (by  
setting their coefficients to 0) and help you find "the most  
important" factors that influence the protein expression ... in all  
likelihood, this probably won't work ... and if this is the type of  
answer you are looking to get, I'm not sure you will get anything  
satisfactory (repeat after me: "research is fun").

> I am not here to ask someone to do my data analysis, but to get an
> understanding of the process as well as a proper direction to look  
> for the
> analysis.  after all I do have to explain all these things to my  
> boss as
> well.

I'm not an expert, but there is no canned process to do this ... and  
like I said, there is no guarantee you can do this .. I mean, does it  
make sense to set up your problem in this way and expect a reasonable  
outcome (biologically speaking-wise)? Do you have to somehow take into  
account how these 8 proteins are interacting w/ each other? Many  
questions to answer ...

Anyway ... I'm not sure there's any real value in this email, but I've  
got my own fish to fry so time to move on ...

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list