[R] how to read csv file having variables unequal column siz

(Ted Harding) Ted.Harding at manchester.ac.uk
Fri Feb 12 13:18:31 CET 2010


On 12-Feb-10 11:48:15, Henrique Dallazuanna wrote:
> Try this:
> 
>#DF <- read.table(...)
> lapply(DF, function(col)col[!is.na(col)])
> 
> On Fri, Feb 12, 2010 at 9:31 AM, Amelia Livington
> <amelia_livington at yahoo.com> wrote:
>> Dear R helpers
>>
>> Suppose e.g. I have a csv file having three variables defined and each
>> of these variables have data items of say 40, 50, 45 length. When I
>> open this csv_file in 'R', I get 10 trailing 'NA's under first column
>> and 5 'NA' s in case of 3rd column.
>>
>> How do I get rid of these NA's s.t. when I read the first column,
>> there should be only 40 data items, 2nd column should have only 50
>> data items and last one should have 45 data items as in the original
>> csv file.
>>
>> Thanking in advance
>> Amelia

Note that Henrique's suggestion changes the result from a dataframe
to a list. Here is a small example.

  # file "temp.csv":
  # X1,X2,X3
  #  1, 2, 3
  #  4, 5, 6
  #  7, 8, 9
  # 10,11,12
  #   ,14,15
  #   ,17,18
  #   ,  ,21
  #   ,  ,22

  DF  <- read.csv("temp.csv")
  DF1 <- lapply(D, function(col)col[!is.na(col)])

  DF
  #   X1 X2 X3
  # 1  1  2  3
  # 2  4  5  6
  # 3  7  8  9
  # 4 10 11 12
  # 5 NA 14 15
  # 6 NA 17 18
  # 7 NA NA 21
  # 8 NA NA 22

  DF1
  # $X1
  # [1]  1  4  7 10
  # $X2
  # [1]  2  5  8 11 14 17
  # $X3
  # [1]  3  6  9 12 15 18 21 22

  str(DF)
  # 'data.frame':   8 obs. of  3 variables:
  #  $ X1: int  1 4 7 10 NA NA NA NA
  #  $ X2: int  2 5 8 11 14 17 NA NA
  #  $ X3: int  3 6 9 12 15 18 21 22
  str(DF1)
  # List of 3
  #  $ X1: int [1:4] 1 4 7 10
  #  $ X2: int [1:6] 2 5 8 11 14 17
  #  $ X3: int [1:8] 3 6 9 12 15 18 21 22

Basically, a dataframe has "matrix-like" structure, and all the
columns must be the same length. So you cannot delete NAs from
a dataframe and still have a dataframe. On the other hand, the
components of a list can be anything.

Amelia: It may be worth asking why you want to have "columns"
with lengths, respectively, 40, 50, 45? What do you intend to do
with the data read in from the CSV file? Depending on how you
want to use the data, there may be more straightforward ways
of dealing with these NAs.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Feb-10                                       Time: 12:18:26
------------------------------ XFMail ------------------------------



More information about the R-help mailing list