[R] Errors melt()ing data...

Neil Shephard nshephard at gmail.com
Thu Feb 28 12:42:14 CET 2008


Hi,

I'm trying to melt() some data for subsequent cast()ing and am
encoutering errors.

The overall process requires a couple of casts()s and melt()s.

########Start Session 1##########
## I have the data in a (fully) melted format and can cast it fine...
> norm1[1:10,]
   Pool       SNP Sample.Name variable       value
1     1 rs1045485      CA0092 Height.1 0.003488853
2     1 rs1045485      CA0142 Height.2 0.333274200
3     1 rs1045485      CO0007 Height.2 0.396250961
4     1 rs1045485      CA0047 Height.2 0.535686831
5     1 rs1045485      CO0149 Height.2 0.296611673
6     1 rs1045485      CA0106 Height.2 0.786115546
7     1 rs1045485      CO0191 Height.1 0.669268523
8     1 rs1045485      CA0097 Height.2 0.609603217
9     1 rs1045485      CA0076 Height.1 0.004257584
10    1 rs1045485      CO0017 Height.2 0.589261427
## This gets the data
> t.norm1    <- cast(norm1, Sample.Name + SNP + Pool ~ variable, sum)
> t.norm1[1:10,]
   Sample.Name        SNP Pool    Height.1  Height.2
1       CA0001  rs1045485    1 0.003311454 0.4789782
2       CA0001  rs1045487    1 0.001818583 0.5089827
3       CA0001 rs11212570    1 0.006078444 0.4496129
4       CA0001 rs13010627    1 0.008753049 0.5424499
5       CA0001    rs13113    1 0.186821486 0.2294912
6       CA0001 rs13402616    1 0.012030235 0.4161610
7       CA0001   rs170548    1 0.002425579 0.3111907
8       CA0001 rs17503908    1 0.002179705 0.3063292
9       CA0001  rs1799794    1 0.003632984 0.5049848
10      CA0001  rs1799796    1 0.389774160 0.0000000
## I now melt it and cast again to the desired format
> t          <- melt(t.norm1, id = c("Sample.Name", "SNP"))
> cast.height.norm1 <- cast(t, SNP ~ Sample.Name + variable, sum)
> cast.height.norm1[1:10,1:5]
          SNP CA0001_Height.1 CA0001_Height.2 CA0002_Height.1 CA0002_Height.2
1   rs1045485     0.003311454       0.4789782     0.401218142     0.343031163
2   rs1045487     0.001818583       0.5089827     0.007329439     0.453102612
3  rs11212570     0.006078444       0.4496129     0.015164118     0.434320814
4  rs13010627     0.008753049       0.5424499     0.013440474     0.463863778
5     rs13113     0.186821486       0.2294912     0.224865477     0.272916077
6  rs13402616     0.012030235       0.4161610     0.191099755     0.285744704
7    rs170548     0.002425579       0.3111907     0.365986770     0.240187431
8  rs17503908     0.002179705       0.3063292     0.011100347     0.232259627
9   rs1799794     0.003632984       0.5049848     0.430635350     0.008364312
10  rs1799796     0.389774160       0.0000000     0.173564141     0.235928006
########Finish Session 1##########

This is the format that I'm aiming for and everythings worked fine.
However, I wish to derive two transformed variables (polar.1 and
polar.2) based on each row of t.norm1 and then melt() and cast() the
data into the same desired format.

########Start Session 2##########
## Now generate polar co-ordinates
t.norm1$polar.1 <- log10(sqrt(t.norm1$Height.1^2 + t.norm1$Height.2^2))
t.norm1$polar.2 <- atan((t.norm1$Height.2 / t.norm1$Height.1))
## And cast the polar data
> t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2")), id=c("Sample.Name", "SNP"))
Error in if (!missing(id.var) && !(id.var %in% varnames)) { :
  missing value where TRUE/FALSE needed
> traceback()
4: melt_check(data, id.var, measure.var)
3: melt.data.frame(as.data.frame(data), id = attr(data, "idvars"))
2: melt.cast_df(subset(t.norm1, select = c("Sample.Name", "SNP",
       "Pool", "polar.1", "polar.2")), id = c("Sample.Name", "SNP"),
       measure = c("polar.1", "polar.2"))
1: melt(subset(t.norm1, select = c("Sample.Name", "SNP", "Pool",
       "polar.1", "polar.2")), id = c("Sample.Name", "SNP"), measure =
c("polar.1",
       "polar.2"))
########Finish Session 2##########

As far as I can tell the error is occurring within melt_check() where
there is a check to see if the id.var is missing and whether the
id.var exists within the data frames names, both of which are true
since the subset() call works fine on its own...

########Start Session 3##########
> test <- subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2"))
> names(test)
[1] "Sample.Name" "SNP"         "Pool"        "polar.1"     "polar.2"
########Start Session 3##########

What I find particularly strange is that there isn't really any
difference between
########Session 1
> t          <- melt(t.norm1, id = c("Sample.Name", "SNP"))

....and
########Session 2
t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool",
"polar.1", "polar.2")), id=c("Sample.Name", "SNP"))

..since I've done nothing to alter the "Sample.Name" and "SNP"
columns, all thats changing is the names of the two columns that are
the measure.var which in this instance is everything thats not defined
as being and id.var in the call to melt().

If anyone can provide any insight to what I'm doing wrong I'd be very grateful.

Thanks,

Neil
--
Email - nshephard at gmail.com / n.shephard at sheffield.ac.uk



More information about the R-help mailing list