[R] repeated searching of no-missing values

Bert Gunter gunter.berton at gene.com
Thu Dec 11 00:39:13 CET 2008


Yes. Read the help pages **carefully**!

  e.g. ?tapply says that the first argument is an **atomic** vector. A
factor is not an atomic vector. So tapply interprets it as such by looking
only at its representation, which is as integer values. 

apply works on **arrays,** which must be of a single type. So it silently
converts the data frame to the simplest common type it "can," which is an
array of characters.

etc.

I admit that these details are somewhat obscure and even annoying -- but
they **are** documented. I think that's all we can expect.  Some have
lamented the lack of the language's perfect consistency in these matters,
but I cannot understand how that would be possible given its nature,
intended, as it is, to be **easily** used for high level data manipulation,
graphics,statistical analysis etc. as well as programming. There are just
too many possible data structures to expect logical consistency in their
handling throughout (if one can even define what that means in specific
instances!). All these little inconveniences can be worked around easily, of
course. For example, if your new vector of numeric factor levels if f.new
and f.old is your original factor, levels(f.old)[f.new] converts f.new to
the appropriate character vector. And so forth. So the key is: pay
**careful** attention to the docs.


-- Bert Gunter

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Patrizio Frederic
Sent: Wednesday, December 10, 2008 2:09 PM
To: r-help at r-project.org
Subject: [R] repeated searching of no-missing values

hi all,
I have a data frame such as:

1 blue  0.3
1 NA    0.4
1 red   NA
2 blue  NA
2 green NA
2 blue  NA
3 red   0.5
3 blue  NA
3 NA    1.1

I wish to find the last non-missing value in every 3ple: ie I want a 3
by 3 data.frame such as:

1 red   0.4
2 blue  NA
3 blue  1.1

I have written a little script

data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue",
"green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA,
NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class =
"data.frame", row.names = c(NA,
-9L))

cl        = function(x) x[max(which(!is.na(x)))]
choose.last = function(x) tapply(x,x[,1],cl)

# now function choose.last works properly on numeric vectors:

> choose.last(data[,3])
  1   2   3
0.4  NA 1.1

# but not on factors (I loose the factor labels):

> choose.last(data[,2])
1 2 3
3 1 1

# moreover, if I apply this function to the whole data.frame
# the output is a character matrix

> apply(data,2,choose.last)
  V1  V2     V3
1 "1" "red"  "0.4"
2 "2" "blue" NA
3 "3" "blue" "1.1"

# and if I sapply, I loose factors labels

> sapply(data,choose.last)
  V1 V2  V3
1  1  3 0.4
2  2  1  NA
3  3  1 1.1

any hint?

Thanks in advance,

Patrizio

+-------------------------------------------------
| Patrizio Frederic, PhD
| Research associate in Statistics,
| Department of Economics,
| University of Modena and Reggio Emilia,
| Via Berengario 51,
| 41100 Modena, Italy
|
| tel:  +39 059 205 6727
| fax:  +39 059 205 6947
| mail: patrizio.frederic at unimore.it
+-------------------------------------------------

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list