[R] new user question on dataframe comparisons and plots

Stephen Tucker brown_emu at yahoo.com
Thu Aug 2 07:55:14 CEST 2007


Hi Conor,

I hope I interpreted your question correctly. I think for the first one you
are looking for a conditioning plot? I am going to create and use some
nonsensical data - 'iris' comes with R so this should be reproducible on your
machine:

library(lattice)
data(iris)
x <- iris
# make some factors using cut()
x[,2:3] <- lapply(x[,2:3],cut,3)
# add column of TRUE FALSE
x <- cbind(x,TF=sample(c(TRUE,FALSE),nrow(x),replace=TRUE))
xyplot(petal.wid~petal.len | ## these are numeric
       sepal.wid*sepal.len,  ## these are factors
       groups=TF,            ## TRUE or FALSE
       panel=function(x,y,...) {
         panel.xyplot(x,y,...)
         panel.loess(x,y,...)
       },
       data=x,auto.key=TRUE)


merge() should work when you have different factors, when you specify
all=TRUE.

## get counts for TRUE and FALSE
> y <- tapply(x$species,INDEX=x$TF,
+            function(x) as.data.frame(table(x)))
## merge results
> (z <- `names<-`(merge(y$`TRUE`,y$`FALSE`,by="x",all=TRUE),
+           c("factor","true","false")))
      factor true false
1 versicolor   29    21
2  virginica   23    27

## reshape the data frame
> library(reshape)
> melt(z,id=1)
      factor variable value
1 versicolor     true    29
2  virginica     true    23
3 versicolor    false    21
4  virginica    false    27

Hope this helps. If it doesn't you can post a small (reproducible) piece of
data and we can maybe help you out a little better...

Best regards,

ST


--- Conor Robinson <conor.robinson at gmail.com> wrote:

> I'm coming from the scipy community and have been using R on and for
> the past week or so.  I'm still feeling out the language structure,
> but so far so good.  I apologize in advance if I pose any obvious
> questions, due to my current lack of diction when searching for my
> issue, or recognizing it if I did see it.
> 
> Question 1, plots:
> 
> I have a data frame with 4 type factor columns, also in the data frame
> I have one single, type logical column with the response data (T or
> F).  I would like to plot a 4*4 grid showing all the two way attribute
> interactions like with plot(data.frame) or pairs(data.frame,
> panel=panel.smooth), however show the response's True and False as
> different colors, or any other built in graphical analysis that might
> be relevant in this case.  I'm sure this is simple since this is a
> common procedure, thanks in advance for humoring me.  Also, what is
> the correct term for this type of plot?
> 
> 
> Question 2, data frame analysis:
> 
> I have two sub data frames split by whether my logical column is T or
> F.  I want to compare the same factor column between both of the two
> sub data frames (there are a few hundred different unique possibles
> for this factor column eg AAAA - ZZZZ enumerated).  I've used table()
> on the attribute columns from each sub frame to get counts.
> 
> pos <- data.frame(table(df.true$CAT))
> 
> AAAA  10
> BASD  0
> ZAQM 4
> ...
> 
> neg <- data.frame(table(df.false$CAT))
> 
> AAAA 1000
> BASD  3
> ZAQM  9
> PPWS 10
> ...
> 
> The TRUE sub frame has less unique factors that the sub frame FALSE, I
> would like an output data frame that is one column all the factors
> from the TRUE sub frame and the second column the counts from the TRUE
> attributes / counts from the corresponding FALSE attributes ie
> %response for each represented factor.  It's fine (better even) if all
> factors are included and there is just a zero for the attributes with
> no TRUEs.
> 
> I've been going off making my own function and running into trouble
> with the data frame not being a vector etc etc, but I have a feeling
> there is a *much* better way ie built in function, but I've hit my
> current level of R understanding.
> 
> Thank you,
> Conor
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list