Robert A LaBudde
ral at lcfltd.com
Mon May 28 06:12:37 CEST 2007
Thanks, Gabor.
I have to say I wouldn't have figured this out easily.
I'd summarize your comments by:
1. Remember to use arrays of logicals as indices.
2. Remember %in% for combination matches.
3. Remember which() to get indices.
It is the small tasks which appear most difficult to figure out in R.
At 10:29 PM 5/27/2007, Gabor wrote:
>On 5/27/07, Robert A. LaBudde <ral at lcfltd.com> wrote:
>>As I was working through elementary examples, I was using dataset
>>"plasma" of package "HSAUR".
>>
>>In performing a logistic regression of the data, and making the
>>diagnostic plots (R-2.5.0)
>>
>>data(plasma,package='HSAUR')
>>plasma_1<- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
>>layout(matrix(1:4,nrow=2))
>>plot(plasma_1)
>>
>>I find that data points corresponding to rownames 17 and 23 are
>>outliers and high leverage.
>>
>>I would then like to perform a fit without these two rows.
>>
>>In principle this should be easy, using an update() with subset=-c(17,23).
>>
>>The problem is that the rownames in this dataset are not ordered,
>>and, in fact, the relevant rows are 30 and 31, not 17 and 23.
>>
>>This brings up the following (elementary?) questions:
>>
>>1. How do you reference rows in "subset=" for which you know the
>>rownames, but not the row numbers?
>
>Use a logical vector:
>
> rownames(plasma) %in% c(17, 23)
>
>>
>>2. How do you discovery the rows corresponding to particular
>>rownames? (Using plasma[rownames(plasma)==17,] shows the data, but
>>NOT the row number!) (Probably the same answer as in Q. 1 above.)
>
> which(rownames(plasma) %in% c(17, 23)) # 30, 31
>
>>
>>3. How do you sort (order) the rows of an existing data frame so that
>>the rownames are in order?
>
>
> plasma[order(as.numeric(rownames(plasma))), ]
