[R] problems with function read.table

Petr PIKAL petr.pikal at precheza.cz
Fri Sep 9 09:23:14 CEST 2011


Hi


> 
> Hi,
> 
> If you read carefully the help pages for read.table you get this:
> 
> 
> na.stringsa character vector of strings which are to be interpreted as
> NA<../../utils/help/NA> values.
> Blank fields are also considered to be missing values in logical, 
integer,
> numeric and complex fields.
> 
> So, both NAs and blank fields are considered as NAs directly by 
read.table.
> 
> Once you have imported your data, you can modify with any of the string
> manipulation functions (sub() or gsub()) to change your "#DIV/0!" to the
> string "NAs". Another option is to manipulate your Excel file and 
consider
> the division by cero with a "IF" and get back a NA if that happens.

The only problem is that in such case all columns which has "#DIV/0!" are 
converted to factors and you need to consider changing it back to numeric.

read.* functions accept as na.string definition not only one value but 
also vector of values and you can get rid of all non numeric and other 
weird Excel values by defining it as a na.strings in read.table call.

> x <- read.delim("clipboard")

> str(x)
'data.frame':   6 obs. of  3 variables:
 $ a: int  1 5 9 8 6 3
 $ b: int  3 5 7 0 NA 6
 $ r: Factor w/ 5 levels "#DIV/0!","0.333333333",..: 2 4 5 1 1 3

> y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> str(y)
'data.frame':   6 obs. of  3 variables:
 $ a: int  1 5 9 8 6 3
 $ b: int  3 5 7 0 NA 6
 $ r: num  0.333 1 1.286 NA NA ...
>

Regards
Petr


> 
> And finally, instead of using na.omits use option na.rm=T to get done 
your
> calculations:
> 
> > mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2
> 
> 
> 
> Regards,
> Carlos Ortega
> www.qualityexcellence.es
> 
> On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa at gmx.ch> wrote:
> 
> > Hello everyone
> >
> >
> >
> > I have a couple of questions about the usage of the R function
> > "read.table(.)". My point of departure is that I want to import a 
matrix
> > (consisting of time and daily stock returns of many stocks) in R. Most 
of
> > the data is numeric, however some values are missing (blanks) and in 
other
> > cases I have the character "#DIV/0!" (from excel). My goal is to do 
some
> > regression analysis with this matrix. My questions now are the 
following
> > ones:
> >
> >
> >
> > 1.       How can I in general tell R to automatically replace some 
specific
> > numbers or characters in tables by others? (for example to replace all
> > characters "#DIV/0!" by the number 0 or simply "NA")
> >
> > 2.       How can I tell R to fill blanks with a number 0 or "NA"?
> >
> > 3.       How can I tell R to omit the "NA" fields in the calculations 
but
> > not the whole row or column? (I realized that the function "na.omit" 
omits
> > the whole row)
> >
> >
> >
> > Many thanks for your help!
> >
> >
> >
> > Sincerely,
> >
> > Samir
> >
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list