[R] problems with function read.table

Petr PIKAL petr.pikal at precheza.cz
Fri Sep 9 14:29:52 CEST 2011


Hi

> Hello,
> 
> in the meanwhile i found the problem for the #. The problem was, that # 
and
> “none” are comments by default, so I turned comments of with 
comment.char=””
> as stated in the help.

Hm. You are not telling the whole story.
I made a sample excel sheet with blank values and #DIV/0! result (simple 
two columns with missing values and division by these missing values in 
first column)

> x<-read.delim("clipboard")
> x
            r a  b
1        1.25 5  4
2 1.166666667 7  6
3 0.142857143 1  7
4     #DIV/0! 5 NA
5         4.5 9  2
6           2 6  3
7     #DIV/0! 4 NA

simple read.delim gave me correctly NA in place of blank values and 
#DIV/0! in first column, which therefore became factor.

> y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> y
          r a  b
1 1.2500000 5  4
2 1.1666667 7  6
3 0.1428571 1  7
4        NA 5 NA
5 4.5000000 9  2
6 2.0000000 6  3
7        NA 4 NA

When I defined vector of na.strings I correctly got NA values for both 
blank and #DIV/0! in Excel sheet. So I wonder where you get problems with 
blank values 

> 
> For the problem with blanks I still didn’t find the solution, even it 
seems
> to be a frequent problem. I got the solution with fill=T, but this fills 
my
> row at the end with a NA. I want to have the NA exactly in the field 
where
> there is a blank in the .txt file.

It seems to me that you do not use correct delimiter. For Excel it is tab 
- "\t". For csv it can be either ";" or "," depending on your locale. You 
need to show us some small part of a file, preferably together with your 
read.* command and the result you got.

Regards
Petr

> 
> Date                      rtn               vwretd                ewretd
> sprtrn
> 19700102             0.000686              0.00547 0.033450
> 0.010211
> 19700105             0.009596 0.018947
> 0.004946
> 19700106             #DIV0!                -0.007233
> -0.006848
> 19700107             0.000678              -0.001272
> 0.003559              -0.002047
> 19700108             0.002034              0.000564                0.11
> 0.000540
> 19700109                                  -0.002797               0
> -0.003021
> 19700113             0.017335              0.000737 -0.001090
> 0
> 
> Can you provide a solution?
> 
> Thanks,
> Samir
> 
> p.s. I know that my questions seem obvious to you, I’m sorry for that, 
but I
> just started to work with R ;)
> 
> -----Ursprüngliche Nachricht-----
> Von: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
> Gesendet: Freitag, 9. September 2011 09:23
> An: Carlos Ortega
> Cc: Samir Benzerfa; r-help at r-project.org
> Betreff: Re: [R] problems with function read.table
> 
> Hi
> 
> 
> > 
> > Hi,
> > 
> > If you read carefully the help pages for read.table you get this:
> > 
> > 
> > na.stringsa character vector of strings which are to be interpreted as
> > NA<../../utils/help/NA> values.
> > Blank fields are also considered to be missing values in logical, 
> integer,
> > numeric and complex fields.
> > 
> > So, both NAs and blank fields are considered as NAs directly by 
> read.table.
> > 
> > Once you have imported your data, you can modify with any of the 
string
> > manipulation functions (sub() or gsub()) to change your "#DIV/0!" to 
the
> > string "NAs". Another option is to manipulate your Excel file and 
> consider
> > the division by cero with a "IF" and get back a NA if that happens.
> 
> The only problem is that in such case all columns which has "#DIV/0!" 
are 
> converted to factors and you need to consider changing it back to 
numeric.
> 
> read.* functions accept as na.string definition not only one value but 
> also vector of values and you can get rid of all non numeric and other 
> weird Excel values by defining it as a na.strings in read.table call.
> 
> > x <- read.delim("clipboard")
> 
> > str(x)
> 'data.frame':   6 obs. of  3 variables:
>  $ a: int  1 5 9 8 6 3
>  $ b: int  3 5 7 0 NA 6
>  $ r: Factor w/ 5 levels "#DIV/0!","0.333333333",..: 2 4 5 1 1 3
> 
> > y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> > str(y)
> 'data.frame':   6 obs. of  3 variables:
>  $ a: int  1 5 9 8 6 3
>  $ b: int  3 5 7 0 NA 6
>  $ r: num  0.333 1 1.286 NA NA ...
> >
> 
> Regards
> Petr
> 
> 
> > 
> > And finally, instead of using na.omits use option na.rm=T to get done 
> your
> > calculations:
> > 
> > > mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2
> > 
> > 
> > 
> > Regards,
> > Carlos Ortega
> > www.qualityexcellence.es
> > 
> > On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa at gmx.ch> 
wrote:
> > 
> > > Hello everyone
> > >
> > >
> > >
> > > I have a couple of questions about the usage of the R function
> > > "read.table(.)". My point of departure is that I want to import a 
> matrix
> > > (consisting of time and daily stock returns of many stocks) in R. 
Most 
> of
> > > the data is numeric, however some values are missing (blanks) and in 

> other
> > > cases I have the character "#DIV/0!" (from excel). My goal is to do 
> some
> > > regression analysis with this matrix. My questions now are the 
> following
> > > ones:
> > >
> > >
> > >
> > > 1.       How can I in general tell R to automatically replace some 
> specific
> > > numbers or characters in tables by others? (for example to replace 
all
> > > characters "#DIV/0!" by the number 0 or simply "NA")
> > >
> > > 2.       How can I tell R to fill blanks with a number 0 or "NA"?
> > >
> > > 3.       How can I tell R to omit the "NA" fields in the 
calculations 
> but
> > > not the whole row or column? (I realized that the function "na.omit" 

> omits
> > > the whole row)
> > >
> > >
> > >
> > > Many thanks for your help!
> > >
> > >
> > >
> > > Sincerely,
> > >
> > > Samir
> > >
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > 
> >    [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list