[R] Extract row form a dataframe by row names in another vector and factor . Need explanation
Mohammad Tanvir Ahamed
mashranga at yahoo.com
Thu Mar 3 12:57:42 CET 2016
Dear Dennis
Thank you very much for your detail reply . It was really helpful to understand.
Tanvir Ahamed
Göteborg, Sweden | mashranga at yahoo.com
________________________________
From: Dennis Murphy <djmuser at gmail.com>
Sent: Thursday, 3 March 2016, 4:38
Subject: Re: [R] Extract row form a dataframe by row names in another vector and factor . Need explanation
Welcome to the wonderful world of factors. In your second case, v2,
the vector is character, so R matches the input character string to
the lookup table of row names. OTOH, v1 is a factor - it behaves
differently when used for subsetting, and this example illustrates why
you shouldn't use them for this purpose. Let's look at it:
> v1
[1] f g h i j
Levels: f g h i j
> str(v1)
Factor w/ 5 levels "f","g","h","i",..: 1 2 3 4 5
> levels(v1)
[1] "f" "g" "h" "i" "j"
> as.integer(v1)
[1] 1 2 3 4 5
> str(levels(v1))
chr [1:5] "f" "g" "h" "i" "j"
When you used v1 to subset rows, it uses the labels of the factor for
subsetting. Since these were not set, R defaults to the factor's
underlying numeric codes. This is why res1 selected the first five
observations. These alternatives do what you want:
dat[levels(v1), ]
dat[as.character(v1), ] # behaves like v2 (an atomic vector)
# Another approach: define a factor with appropriate labels:
x <- as.character(dat1[, "BB"])
v3 <- factor(x, levels = unique(x), labels = unique(x))
dat[v3, ]
There are a couple alternative avenues you could have chosen (e.g.,
match() or which()), but they are overkill for this simple case.
Your real problem was converting a character matrix to a data frame in
the first place - this converted all of the columns to factors with
different sets of levels:
str(dat1)
This illustrates one of the important differences between data frames
and matrices. In a matrix, every element must be of the same class.
Specifically, a matrix is an atomic vector with a 'dim' attribute. In
contrast, each _column_ of a data frame must have elements of the same
class, but they do not have to be the same class from one column to
the next.
One way to have avoided the conversion to factor would have been to
use the argument stringsAsFactors = FALSE in the data.frame() call -
by default, it is TRUE. More importantly, the conversion to data frame
for dat1 was unnecessary - observe:
> dat1<-matrix(letters[1:20],ncol=4)
> colnames(dat1)<-c("AA","BB","CC","DD")
> dat[dat1[, "BB"], ]
SA1 SA2 SA3 SA4 SA5
f 6 16 26 36 46
g 7 17 27 37 47
h 8 18 28 38 48
i 9 19 29 39 49
j 10 20 30 40 50
For the same reason, it was unnecessary to convert dat to a data
frame. Let's look at a matrix version instead:
dat2 <- matrix(seq(50), nrow = 10)
rownames(dat2) <- letters[1:10]
colnames(dat2) <- paste0("SA", 1:5)
dat2[dat1[, "BB"], ] # desired result
Hint: You might want to spend some time to carefully learn the
different major data types in R and the various modes of indexing. In
general, it is not a good default practice to convert matrices to data
frames.
Dennis
On Wed, Mar 2, 2016 at 6:05 PM, Mohammad Tanvir Ahamed via R-help
<r-help at r-project.org> wrote:
> Hi,Here i have written an example to explain my problem
> ## Data Generationdat<-data.frame(matrix(1:50,ncol=5))
> rownames(dat)<-letters[1:10]
> colnames(dat)<- c("SA1","SA2","SA3","SA4","SA5")
>
> dat1<-data.frame(matrix(letters[1:20],ncol=4))
> colnames(dat1)<-c("AA","BB","CC","DD")
>
> ## Row names
> v1<-dat1[,"BB"] # Factor
> v2<-as.vector(dat1[,"BB"]) # Vector
>
> is(v1) # Factor
> is(v2) # Vector
>
> # Result
> res1<-dat[v1,]
> res2<-dat[v2,]
> ##########################################################i assumed res1 and res2 are same . but it is not . Can any body please explain why ?
>
>
> Tanvir Ahamed
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list