[R] regular expression for na.strings / read.table

jessica.gervais at tudor.lu jessica.gervais at tudor.lu
Tue Feb 12 15:30:30 CET 2008


Dear all,

I am working with a csv file.
Some data of the file are not valid and they are marked with a star '*'.
For example : *789.

I have attached with this email a example file (test.txt) that looks like
the data I have to work with.


I see 2 possibilities ..thast I cannot manage anyway in R:

1-first & easiest solution:
Read the data with read.csv in R, and define as na strings all cells
containing a star (*).
Something which would looks like this ...

>
DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))

> DATA
  X1 X.789 LNM. X78 X56  X89 X56.1 X100
1  2   700  AUW  78  56   89    56  100
2  3   400  TOC  78  56   89    56   10
3  4   389  RMN  78  56   89    56  *89
4  5   400  LNM  78  56 *452    56  100
5  6   200  UTC  78 *40   89    56  100
6  7   100  GAT  78  56    8    56 *100
7  8    79 *LNM  78  56    9    56  100
8  9    89  TCG  78  56  800    56 *100
9 10   78*  LNM  78  56   89    56  100


...but which would work (Stars are still there)! Do anyone knows how to do
that ?

2-Second solution:
- first read the file with DATA<-read.csv("test.txt")
- then replace all fields containing a * with NA in applying the following
function to the object DATA:
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
 DATA_cleaned
      X1   X.789 LNM. X78  X56  X89  X56.1 X100
 [1,] NULL NULL  NULL NULL NULL NULL NULL  NULL
 [2,] NULL NULL  NULL NULL NULL NULL NULL  NULL
 [3,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [4,] NULL NULL  NULL NULL NULL NA   NULL  NULL
 [5,] NULL NULL  NULL NULL NA   NULL NULL  NULL
 [6,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [7,] NULL NULL  NA   NULL NULL NULL NULL  NULL
 [8,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [9,] NULL NA    NULL NULL NULL NULL NULL  NULL

stars have deaseper, but all the rest too !
The pb comes from the fact that if a field does not contain any *, the
command
if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE !

I you have any idea, please let me know !

Many thanks,

Jessica
____________________________________

Jessica Gervais
Mail: jessica.gervais at tudor.lu

Resource Centre for Environmental Technologies,
Public Research Centre Henri Tudor,
Technoport Schlassgoart,
66 rue de Luxembourg,
P.O. BOX 144,
L-4002 Esch-sur-Alzette, Luxembourg

(See attached file: test.txt)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080212/b67d1cbd/attachment.txt 


More information about the R-help mailing list