[R] ave(x, y, FUN=length) produces character output when x is character

Bert Gunter gunter.berton at gene.com
Wed Dec 24 20:49:13 CET 2014


You said:
"The elements of the first vector are irrelevant because they are only
counted, so we should get the same result if it were a character
vector, but we don't: "

You don't get to invent your own rules! ?ave -- always nice to read
the Help docs **before posting** -- clearly states that the x argument
must be __numeric__. So if you choose to ignore what you are told, you
do so at your own risk. Who knows what you'll get? --  it's a user
error, not a bug.

And if (my understanding of) what you say is the case, this whole post
is silly. See ?table to do exactly what you claim is wanted without
trying to invent square wheels.

Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:
> R 3.0.1 on Linux 64...
>
> I was working with someone else's code.  They were using ave() in a way that
> I guess is nonstandard:  Isn't FUN always supposed to be a variant of
> mean()?  The idea was to count for every element of a factor vector how many
> times the level of that element occurs in the factor vector.
>
>
> gl() makes a factor:
>
>> gl(2,2,5)
>
> [1] 1 1 2 2 1
> Levels: 1 2
>
>
> ave() applies FUN to produce the desired count, and it works:
>
>> ave( 1:5, gl(2,2,5), FUN=length )
>
> [1] 3 3 2 2 3
>
>
> The elements of the first vector are irrelevant because they are only
> counted, so we should get the same result if it were a character vector, but
> we don't:
>
>> ave( as.character(1:5), gl(2,2,5), FUN=length )
>
> [1] "3" "3" "2" "2" "3"
>
> The output has character type, but it is supposed to be a collection of
> vector lengths.
>
>
> Two questions:
>
> (1) Is that a bug in ave()?  It certainly is unexpected.
>
> (2) What is the best way to do this sort of thing?
>
> The truth is that we start with a character vector and we want to create an
> integer vector that tells us for every element of the character vector how
> many times that string occurs.  Here are two vectors of length 6 that should
> give the same result:
>
>> intvec <- c(4,5,6,5,6,6)
>> charvec <- c("A","B","C","B","C","C")
>
>
> The code was used like this with integer vectors and it seemed to work:
>
>> ave( intvec, intvec, FUN=length )
>
> [1] 1 2 3 2 3 3
>
> When a character vector came along, it would fail by producing a character
> vector as output:
>
>> ave( charvec, charvec, FUN=length )
>
> [1] "1" "2" "3" "2" "3" "3"
>
> This seems more appropriate, and it might always work, but is it OK?:
>
>> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum )
>
> [1] 1 2 3 2 3 3
>
> I suspect that ave() isn't the best choice, but what is the best way to do
> this?
>
>
> Thanks in advance.
>
> Mike
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list