[R] 2 Seemingly Simple Problems

Fri May 31 16:07:34 CEST 2002

On Fri, 31 May 2002, MATT BORKOWSKI wrote:

> Alright...these two issues seem rather simple.  But I had trouble finding much
> about either of them in the archives.
>
> 1)  Using scan()
> I'm trying to use scan to read in a large data set since read.table() is taking
> quite a bit of time.  But when I try to do this I receive a error message along
> the lines of "Character where numeric expected."  Seems to me the problem is
> arising because my data is composed of both characters and numbers, but R
> is only expecting numerics.  I assume the key to this problem lies in the
> "what=" parameter.  But I'm not sure what to set this to so that R expects
> characters or numbers.

See the help page for scan, especially the examples.  However, since
read.table calls scan itself, you will get little gain provided you use
colClasses in read.table.

> 2) Testing for 'NA' values
> In this problem I have read in a large data set.  Some of the lines of data are
> not as long and therefore the last few columns have been filled in with 'NA.'
> Now I'm trying to read through rows of data backwards because the parameter
> I'm trying to extract from the data.frame is not always in column 5 but is always
> the second real value after the 'NA's' if that makes any sense.  But I don't think

(No.  The NAs are at the end of the row, so the second before?)

> that's all that important anyway.  The point is...I'm trying to extract the second
> value after the 'NA' values by ignoring the 'NA' values and couting any real
> values.  I'm trying to accomplish this with:
>
> if(data[r,c] != NA)  count <- count +1
>
> However, I receive the error: "Value missing where logical expected".  I assume
> this is happening because I'm testing for 'NA' values.  Is there anyway around
> this?  Is there a way to count the number of 'NA' numbers or a way to skip over
> them?

is.na(data[r,])  would be a good start. Something like

{xx <- is.na(data[r,]); n <- length(xx); data[r, n-1]}

for one row perhaps?  Or to vectorize

nn <- colSums(!is.na(data))  # number of non-NA values in each row
data[cbind(seq(along=nn), nn-1)]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._