[R] plot question

hadley wickham h.wickham at gmail.com
Tue Oct 2 21:38:26 CEST 2007


On 10/2/07, Tiandao Li <Tiandao.Li at usm.edu> wrote:
> Hello,
>
> I have a question about how to plot a series of data. The folloqing is my
> data matrix of n
> > n
>              25p    5p  2.5p 0.5p
> 16B-E06.g 45379  4383  5123   45
> 16B-E06.g 45138  4028  6249   52
> 16B-E06.g 48457  4267  5470   54
> 16B-E06.g 47740  4676  6769   48
> 37B-B02.g 42860  6152 19276   72
> 35B-A02.g 48325 12863 38274  143
> 35B-A02.g 48410 12806 39013  175
> 35B-A02.g 48417  9057 40923  176
> 35B-A02.g 51403 13865 43338  161
> 45B-C12.g 50939  3656  5783   43
> 45B-C12.g 52356  5524  6041   55
> 45B-C12.g 49338  5141  5266   41
> 45B-C12.g 51567  3915  5677   43
> 35A-G04.g 40365  5513  6971   32
> 35B-D01.g 54217 12607 13067   93
> 35B-D01.g 55283 11441 14964  101
> 35B-D01.g 55041  9626 14928   94
> 35B-D01.g 54058  9465 14912   88
> 35B-A04.g 42745 12080 34271  105
> 35B-A04.g 41055 12423 34874  126
>
> colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
> Intensity. I want to plot the data this way.
> x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
> y-axis is Intensity
> Inside of plot is the points of intensity over 4 concentrations, points
> from different genes have different color or shape. A regression line of
> each genes crosss different concetrations, and at the end of line is gene
> IDs.

I might do it something like this:

df <- structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L,
3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c("16B-E06.g",
"35A-G04.g", "35B-A02.g", "35B-A04.g", "35B-D01.g", "37B-B02.g",
"45B-C12.g"), class = "factor"), X25p = c(45379L, 45138L, 48457L,
47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L,
49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L,
41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L,
9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L,
9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L,
6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L,
5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L,
34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L,
161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L
)), .Names = c("gene", "X25p", "X5p", "X2.5p", "X0.5p"),
class = "data.frame", row.names = c(NA, -20L))

library(reshape)
library(ggplot2)

dfm <- melt(df, id=1)
names(dfm) <- c("gene", "conc", "intensity")
dfm$conc <- as.numeric(gsub("[Xp]", "", as.character(dfm$conc)))

qplot(conc, intensity, data=dfm, colour=gene, log="xy") + geom_smooth(method=lm)

Note that I've converted the concentrations to numeric values and
plotted them on a log scale.  If you want to treat concentration as a
factor, then you'll need the following code:

dfm$conc <- factor(dfm$conc)
qplot(conc, intensity, data=dfm, colour=gene, group=gene, log="y") +
geom_smooth(method=lm, xseq=levels(dfm$conc))

But in that case, fitting a linear model seems a bit dubious.

Note that you can also use this format of data with lattice:

library(lattice)
xyplot(intensity ~ conc, data=dfm, type=c("p","r"), group=gene, auto.key=T)

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list