Re: [R] RE: more on lm(y~x) question: removing NA´s

Thomas Lumley tlumley at u.washington.edu
Tue May 4 17:30:39 CEST 2004


On Tue, 4 May 2004, Christoph Scherber wrote:

> it all works fine (the regression lines fit correctly to the data) as
> long as there are not both missing values in j and k.

That's very strange.  The lines
 for (k in 1:length(foranalysis[93:174,i]))
     number[k]_substring(plotcode[foranalysis[k,1]],1,5)

should set result in k being the scalar value 81 after the loop is over.
In R (unlike S-PLUS), loop indices are just ordinary variables in the
environment where the loop is executed. I'd expect this code to work in
S-PLUS but not in R.

That loop is actually redundant, since substring() is vectorised:
	number <- substring(plotcode[foranalysis[93:174,1]],1,5)
should work just as well.

It's also strange that you create a data frame df from j and k but don't
use it in the lm() call (or AFAICS anywhere else).

>
> What suggestions would you have for this? Or, more precisely, how would
> you create multiple graphs from subsequent columns of a data.frame?

I'd probably use lsfit. The following is obviously not tested, since I
don't have the data (or even understand fully the data layout).

L <- length(93:174)
for(i in p) {
	X<-foranalysis[93:174, i]
	Y<-foranalysis[93:174, i+1]
	corr<-cor(X,Y)
	corrtrunc<-cor(X[X<0.9], Y[X<0.9])
	mainlab <- paste(substring(names(foranalysis[i]), 2, 8),
			"; corr.:", corr,
			";excl.Mono", corrtrunc))
        plot(X,Y,main=mainlab,
		xlab="% of total biomass",ylab="% of total cover",pch="n")
	number <- substring(plotcode[foranalysis[1:L,1]], 1, 5)
	text(X, Y, number)
	model <- lsfit(X,Y)
	abline(model)
	abline(0, 1, lty=2)
    }


	-thomas

> >>>
> >>>par(mfrow=c(5,5))
> >>>p_seq(3,122,2)
> >>>i_0
> >>>k_0
> >>>number_0
> >>>for (i in p) {
> >>>   j_foranalysis[93:174,i+1]
> >>>   k_foranalysis[93:174,i]
> >>>   df_data.frame(j,k)
> >>>   mainlab1_substring(names(foranalysis[i]),2,8)
> >>>   mainlab2_"; corr.:"
> >>>   mainlab3_round(cor(j,k,na.method="available"),4)
> >>>   mainlab4_"; excl.Mono:"
> >>>   mainlab5_round(cor(j[j<0.9],k[j<0.9],na.method="available"),4)
> >>>   mainlab_paste(mainlab1,mainlab2,mainlab3,mainlab4,mainlab5)
> >>>   plot(k,j,main=mainlab,xlab="% of total biomass",ylab="% of total
> >>>cover",pch="n")
> >>>   for (k in 1:length(foranalysis[93:174,i]))
> >>>number[k]_substring(plotcode[foranalysis[k,1]],1,5)
> >>>   text(foranalysis[93:174,i],foranalysis[93:174,i+1],number)
> >>>**********************************
> >>>   model_lm(j~k,na.action=na.exclude])
> >>>**********************************
> >>>   abline(model)
> >>>   abline(0,1,lty=2)
> >>>    }
> >>>
> >>>Does anyone have any suggestions on this?
> >>>
> >>>Best regards
> >>>Chris.,
> >>>
> >>>
> >>>
> >>>
> >>>Liaw, Andy wrote:
> >>>
> >>>
> >>>
> >>>>By (`factory') default that's done for you automagically, because
> >>>>options("na.action") is `na.omit'.
> >>>>
> >>>>If you really want to do it `by hand', and have the data in
> >>>>
> >>>>
> >>>a data frame,
> >>>
> >>>
> >>>>you can use something like:
> >>>>
> >>>>lm(y ~ x, df[complete.cases(df),])
> >>>>
> >>>>HTH,
> >>>>Andy
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>From: Christoph Scherber
> >>>>>
> >>>>>Dear all,
> >>>>>
> >>>>>I have a data frame with different numbers of NA´s in each
> >>>>>column, e.g.:
> >>>>>
> >>>>>x       y
> >>>>>1      2
> >>>>>NA  3
> >>>>>NA  4
> >>>>>4     NA
> >>>>>1     5
> >>>>>NA NA
> >>>>>
> >>>>>
> >>>>>I now want to do a linear regression on y~x with all the NA´s
> >>>>>removed.
> >>>>>The problem now is that is.na(x) (and is.na(y) obviously
> >>>>>gives vectors
> >>>>>with different lengths. How could I solve this problem?
> >>>>>
> >>>>>Thank you very much for any help.
> >>>>>
> >>>>>Best regards
> >>>>>Chris
> >>>>>
> >>>>>
> >>>>>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >>
> >>
> >>
> >
> >Thomas Lumley			Assoc. Professor, Biostatistics
> >tlumley at u.washington.edu	University of Washington, Seattle
> >
> >
> >
>
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list