[R] converting dataframe columns to vector and missing values

Thomas W Blackwell tblackw at umich.edu
Fri Sep 12 23:59:19 CEST 2003


On Fri, 12 Sep 2003, Spencer Graves wrote:

> Have you considered:
>
> 	  x <- Data[!is.na(Data[,n]), n]
>
> Does this do what you want?  Vectors, arrays, and data.frame can be
> indexed by number or by a logical vector -- and by names if such are
> supplied.  In this case, "!is.na(Data[,n])" is a logical vector of
> length = number of rows of Data.
>
> hope this helps.  spencer graves

Tom Blackwell adds ... Or even do the whole thing in one line as:

UCL.all <- lapply(InputData, function(x) HallBoot(na.omit(x)))
UCL.all      #  displays the result

Now the object  UCL.all  is a named list whose names are the names of
the columns in InputData, in order, and whose values are the output
from  HallBoot().  Spencer is protecting you from having to learn all
the intricacies of R list structure.

I think the example given in  help("na.omit")  is just a bit misleading.
It shows  na.omit()  used alone on a very small data frame named x.
It demonstrates that the return value of  na.omit(x)  omits the entire
third row, because that row has an NA in the second column, and it does
this by using  autoprint()  to display the result.  Nothing makes it
clear that the command  na.omit(x)  has not changed the object x at all.

But that is a common paradigm in R.  The arguments to a function are
passed by value, not by reference, and they ALWAYS remain unchanged by
having been used, unless they are explicitly overwritten.  (There must
be a few exceptions to this, but I can't think of them.)

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

> Bock, Michael wrote:
> > I am relatively new to R, but very pleased with what I can do with it so
> > far.
> > I am embarrassed to ask what seems like a simple question but I am at my
> > wits end. Basically I have written a function to calculate a bootstrapped
> > statistic on a list of values. The function works perfectly if I can feed it
> > the right data. I am exporting data into R as a dataframe and then assigning
> > each column to the list and running the function use a for loop. The problem
> > is what is the best way to convert the columns to a list. The column names
> > and the number of columns will vary depending on the dataset. I am currently
> > converting the dataframe to a matrix and the assigning each column of the
> > matrix to the list in turn:
> >
> > #InputData is the dataframe
> > RunTests <- function (InputData)
> > 	{
> > 	n <- length(InputData)
> > 	Chem <- colnames(InputData)
> > 	for (i in 1:n){
> > 		print (Chem[i])
> > 		Data <- data.matrix(InputData)
> > 		x <- Data[,n]
> > 		na.omit(x)
> > 		#print(x)
> > 		UCL <- HallBoot(x)
> > 		print (UCL)
> > 		}
> >
> > 	}
> > Although this works some of the time, missing values are not removed. This
> > is a huge problem as the number of observation is each column is quite
> > variable. Obviously the na.omit is not working the way I expect. Any help
> > would be appreciated, including a whole new approach to sending the data to
> > the HallBoot function.
> >
> > Michael J. Bock, PhD.
> > ARCADIS
> > 24 Preble St. Suite 100
> > Portland, ME 04101
> > 207.828.0046
> > fax 207.828.0062




More information about the R-help mailing list