[R] a simple problem

David Winsemius dwinsemius at comcast.net
Fri Mar 4 17:03:58 CET 2011


On Mar 4, 2011, at 9:50 AM, Asan Ramzan wrote:

> Hello R-help
>
> I am working with large data table that have the occasional label,
> a particular time point in an experiment. E.g:
>
> "Time (min)", "R1 R1", "R2 R1", "R3 R1", "R4 R1"
> .909, 1.117, 1.225, 1.048, 1.258
> 3.942, 1.113, 1.230, 1.049, 1.262
> 3.976, 1.105, 1.226, 1.051, 1.259
> 4.009, 1.114, 1.231, 1.053, 1.259
> 4.042, 1.107, 1.230, 1.048, 1.262
> 4.076, 1.108, 1.226, 1.045, 1.257
> 4.109, 1.109, 1.227, 1.047, 1.259
> 4.142, 1.108, 1.225, 1.052, 1.260
> 4.176, 1.105, 1.222, 1.046, 1.260
> 4.209, 1.106, 1.226, 1.050, 1.258
> 4.242, 1.105, 1.224, 1.047, 1.258
> 4.276, 1.104, 1.223, 1.048, 1.259
> 4.309, 1.106, 1.228, 1.050, 1.260
> 4.342, 1.103, 1.219, 1.049, 1.260
> 4.376, 1.107, 1.225, 1.052, 1.259
> 4.409, 1.105, 1.222, 1.047, 1.258
> 4.442, 1.106, 1.227, 1.048, 1.262
> 4.476, 1.105, 1.222, 1.049, 1.261
> 4.509, 1.102, 1.222, 1.047, 1.259
> 4.555, "Gly sar"
> 4.555, 1.107, 1.224, 1.048, 1.261
> 4.576, 1.109, 1.228, 1.053, 1.259
> 4.609, 1.103, 1.218, 1.046, 1.258
> 4.642, 1.105, 1.223, 1.048, 1.256
> 4.676, 1.108, 1.217, 1.048, 1.260
> 4.709, 1.124, 1.222, 1.047, 1.258
> When I try to read in the table, I get:
>> try<-read.table("200810_01.R",header=T,sep=",")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,  
> na.strings,  :
>  line 136 did not have 5 elements
>
> Is there any way to tell R to ignore these labels or better
> still interpret them as being label for particular time
> points, so when it comes to draw a line graph it is annotated
> with these labels.

Option 1:
Prepare your data properly with an editor:

Option 2:
You could read the file with readLines, identify the offending lines  
with grep or grepl, then separate the offenders and non-offenders.
lines <- readLines(textConnection('"Time (min)", "R1 R1", "R2 R1", "R3  
R1", "R4 R1"
.909, 1.117, 1.225, 1.048, 1.258
3.942, 1.113, 1.230, 1.049, 1.262
3.976, 1.105, 1.226, 1.051, 1.259
4.009, 1.114, 1.231, 1.053, 1.259
4.042, 1.107, 1.230, 1.048, 1.262
4.076, 1.108, 1.226, 1.045, 1.257
4.109, 1.109, 1.227, 1.047, 1.259
4.142, 1.108, 1.225, 1.052, 1.260
4.176, 1.105, 1.222, 1.046, 1.260
4.209, 1.106, 1.226, 1.050, 1.258
4.242, 1.105, 1.224, 1.047, 1.258
4.276, 1.104, 1.223, 1.048, 1.259
4.309, 1.106, 1.228, 1.050, 1.260
4.342, 1.103, 1.219, 1.049, 1.260
4.376, 1.107, 1.225, 1.052, 1.259
4.409, 1.105, 1.222, 1.047, 1.258
4.442, 1.106, 1.227, 1.048, 1.262
4.476, 1.105, 1.222, 1.049, 1.261
4.509, 1.102, 1.222, 1.047, 1.259
4.555, "Gly sar"
4.555, 1.107, 1.224, 1.048, 1.261
4.576, 1.109, 1.228, 1.053, 1.259
4.609, 1.103, 1.218, 1.046, 1.258
4.642, 1.105, 1.223, 1.048, 1.256
4.676, 1.108, 1.217, 1.048, 1.260
4.709, 1.124, 1.222, 1.047, 1.258'))

  read.table(textConnection(
         lines[ c(TRUE, !grepl("[[:alpha:]]", lines)[-1]) ]),
              skip=1)

  # the quotes and spaces don't work well with R column naming  
conventions

        V1     V2     V3     V4    V5
1   .909, 1.117, 1.225, 1.048, 1.258
2  3.942, 1.113, 1.230, 1.049, 1.262
3  3.976, 1.105, 1.226, 1.051, 1.259

snipped
23 4.642, 1.105, 1.223, 1.048, 1.256
24 4.676, 1.108, 1.217, 1.048, 1.260
25 4.709, 1.124, 1.222, 1.047, 1.258

So even more compact would be:

read.table(textConnection(
         lines[  !grepl("[[:alpha:]]", lines) ] ) )

Using the non-negated grepl expression should get you all the "labels"  
lines


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list