[R] How to reference or sort rownames in a data frame

Robert A. LaBudde ral at lcfltd.com
Sun May 27 22:55:41 CEST 2007


As I was working through elementary examples, I was using dataset 
"plasma" of package "HSAUR".

In performing a logistic regression of the data, and making the 
diagnostic plots (R-2.5.0)

data(plasma,package='HSAUR')
plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
layout(matrix(1:4,nrow=2))
plot(plasma_1)

I find that data points corresponding to rownames 17 and 23 are 
outliers and high leverage.

I would then like to perform a fit without these two rows.

In principle this should be easy, using an update() with subset=-c(17,23).

The problem is that the rownames in this dataset are not ordered, 
and, in fact, the relevant rows are 30 and 31, not 17 and 23.

This brings up the following (elementary?) questions:

1. How do you reference rows in "subset=" for which you know the 
rownames, but not the row numbers?

2. How do you discovery the rows corresponding to particular 
rownames? (Using plasma[rownames(plasma)==17,] shows the data, but 
NOT the row number!) (Probably the same answer as in Q. 1 above.)

3. How do you sort (order) the rows of an existing data frame so that 
the rownames are in order?

I don't seem to know the magic words to find the answers to these 
questions in the help systems.

Obviously this can be done by writing new, brute force, functions 
scanning the subscripts, but there must be an (obvious?) direct way 
of doing this more elegantly.

Thanks for any pointers.
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list