[R] hairy indexing problem
Agustin Lobo
alobo at ija.csic.es
Wed Jun 5 10:19:53 CEST 2002
I would use function w.conti that calculates a weighted
contingency matrix. That is, given 2 vectors of
categorical variables (i.e., species and
soil type) and a 3rd vector of a
quantitative variable (i.e. biomass), calculates
the sum of the quant. var. for each pair (i.e.,
the total biomass for each species in each soil type).
With your data, as you just have one categorical
variable, just set the second one to a constant to
calculate the sum of foo for each subject:
> matriz<-cbind(sub,foo,bar)
> matriz
sub foo bar
[1,] 2 1.7 3.2
[2,] 2 2.3 4.1
[3,] 3 7.6 2.3
[4,] 3 7.1 3.3
[5,] 3 7.3 2.3
[6,] 3 7.4 1.3
[7,] 5 6.2 6.1
[8,] 5 3.4 6.9
>
> a <- w.conti(matriz[,1],rep(1,nrow(matriz)),matriz[,2])
> a
v2
v1 1
2 4.0
3 29.4
5 9.6
Then, using the result of table you can calculate the mean from
the sum:
> a/as.vector(table(matriz[,1]))
v2
v1 1
2 2.00
3 7.35
5 4.80
>From your question I understand that you want new subjects according
to their number of rows, so
that subject 2 and 5 would become a new subject:
> new.sub <- as.vector(table(matriz[,1]))
> new.sub
[1] 2 4 2
> new.sub <- rep(new.sub,new.sub)
> new.sub
[1] 2 2 4 4 4 4 2 2
> a <- w.conti(new.sub,rep(1,nrow(matriz)),matriz[,2])
> a
v2
v1 1
2 13.6
4 29.4
> a/as.vector(table(new.sub))
v2
v1 1
2 3.40
4 7.35
>
w.conti is simply:
function (v1,v2,z)
{
xtabs(z~v1+v2)
}
(I could use xtabs() directely, but I never remember that expression,
while w.conti is easier to remember)
Of course, if you always need the mean, just add
the second step to w.conti.
Agus
Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es
On 4 Jun 2002, Russell Senior wrote:
>
> I've got a data frame that looks like this:
>
> subject foo bar
> 2 1.7 3.2
> 2 2.3 4.1
> 3 7.6 2.3
> 3 7.1 3.3
> 3 7.3 2.3
> 3 7.4 1.3
> 5 6.2 6.1
> 5 3.4 6.9
> ...
>
> That is, I've got multiple rows per subject. I need to compute
> summaries within categories where the subject has the same number of
> rows. For example, subject 2 and 5 both have two rows. I need to
> compute mean for those four values of foo. This looks like a good
> candidate for index vectors, but I need some help. I've tried
> something like:
>
> table(data) -> tmp
>
> and:
>
> tmp[tmp == 2]
>
> and even:
>
> as.numeric(attr(tmp[tmp == 2],"names"))
>
> to get a vector of subject numbers that have two rows in the original
> data frame. But I am getting stuck there. I want some kind of
> "is.member" function to use in a subsequent index vector expression,
> like:
>
> i <- as.numeric(attr(tmp[tmp == 2],"names"))
> data[is.member($subject,i)]$foo
>
> but there isn't an is.member() function. Can someone please give me a
> pointer on the canonical way to do this?
>
> Thanks!
>
> --
> Russell Senior ``The two chiefs turned to each other.
> seniorr at aracnet.com Bellison uncorked a flood of horrible
> profanity, which, translated meant, `This is
> extremely unusual.' ''
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list