[R] 2 matrix scatter x [a lot]

Dennis Murphy djmuser at gmail.com
Tue Aug 16 00:53:56 CEST 2011


Hi:

Here's one way, using the following reproducible example.

# Method 1: the variable names are the same in each data frame
# Create two separate data frames
ds1 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
ds2 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

# Melt the data to create a factor whose levels are the variable
# names and a variable 'value' to contain the corresponding values
library('reshape')
dm1 <- melt(ds1)
# Since the values are different in each data frame, change the
# name of the value variable in each
names(dm1)[2] <- 'val1'
dm2 <- melt(ds2)
names(dm2)[2] <- 'val2'
# Since I know the two melted data frames have the same
# dimensions, I can cbind the value variable of the second
# to the first
dm <- cbind(dm1, val2 = dm2[['val2']])

# Conditioning plots:
library('lattice')
library('ggplot2')

# lattice version
xyplot(val2 ~ val1 | variable, data = dm)
# ggplot2 version
ggplot(dm, aes(x = val1, y = val2)) + geom_point() +
   facet_wrap( ~ variable)

# Method 2: Variable names are different
ds1 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
ds2 <- data.frame(y1 = rnorm(10), y2 = rnorm(10), y3 = rnorm(10))

dm1 <- melt(ds1)
names(dm1)[2] <- 'val1'
dm2 <- melt(ds2)
names(dm2)[2] <- 'val2'
dm <- cbind(dm1, val2 = dm2[['val2']])
# Change the level labels of variable to represent the
# column numbers instead:
dm$Variable <- factor(dm$variable,
                 labels = seq_len(length(levels(dm$variable))))

xyplot(val2 ~ val1 | Variable, data = dm, xlab = 'x', ylab = 'y')
ggplot(dm, aes(x = val1, y = val2)) + geom_point() +
   facet_wrap( ~ Variable) + labs(x = 'x', y = 'y')

You've probably got something more complicated than this in terms of
variable names, but the outline above should be enough to get you
started.

HTH,
Dennis

On Mon, Aug 15, 2011 at 3:13 PM, Ben qant <ccquant at gmail.com> wrote:
> Hello,
>
> I'm pretty new to R. Basically, how do I speed up the for loop below. Or
> better yet, get rid of the for loop all together.
>
> objective: plot two data sets column against column by index. These data
> sets have alot NA's. Some columns are all NA's. I need the plots to overlay.
> I don't like the plots in matplot(). Needs to be much faster than the code
> below...
>
> #simple sample data.. my data sets have 61 rows and over 11k columns each.
> x = matrix(1:4,2,2)
> y = matrix(4:1,2,2)
> y[2,2] = NA
> y[1,1] = NA
>
> #calc'd here to save time on plotting
> xlim.v = c(min(x, na.rm = TRUE),max(x,na.rm = TRUE))
> ylim.v = c(min(y, na.rm = TRUE),max(y,na.rm = TRUE))
>
> for(i in 1:ncol(x)){
>  xy = na.omit(cbind(x[,i],y[,i]))
>  if(length(dim(xy)[1]) > 0){
>    plot(xy[,1],xy[,2],xlim = xlim.v,ylim= ylim.v); par(new=T);
>  }
> }
>
> Thanks!
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list