[R] suggestion for optimal plotting to show significant differences

PIKAL Petr petr.pikal at precheza.cz
Fri Feb 13 14:35:55 CET 2015


Hi Rich

As usual in R one can come to same result by different means. Using lme from nlme

fit<- lme(value ~ day, data=test, random= ~1|item)
ranef(fit)
  (Intercept)
a  -4.8794330
b  -3.5063635
c  -1.7121614
d  -0.2002288
e   0.5118846
f   1.7805122
g   3.4489819
h   4.5568079
fixef(fit)
(Intercept)         day
6.933644464 0.009715795

which is close to the way how "value" was constructed (day coefficient shall be 1)

> mean(apply(ranef(fit), 2,diff))
[1] 1.348034

But AFAIK lme or nlme is applicable only if I can find some suitable model function.

I therefore like your suggestion to treat day as a factor which, if I understand it well, enables me to look at item differences each day and compare if they are or are not important.

Thanks for your insight
Best regards

Petr


> -----Original Message-----
> From: Richard M. Heiberger [mailto:rmh at temple.edu]
> Sent: Thursday, February 12, 2015 9:27 PM
> To: PIKAL Petr
> Cc: r-help at r-project.org
> Subject: Re: [R] suggestion for optimal plotting to show significant
> differences
>
> ## The next step is to think of this in the analysis of covariance
> setting.
>
> ## this example is based on
>
> library(HH)
>
> demo("ancova", package="HH", ask=FALSE)
>
> item<-rep(letters[1:8], each=18)
> day<-rep((0:5)*100, 24)
> set<-rep(rep(1:3, each=6), 8)
> test<-data.frame(item, day, set)
> set.seed(111)
> test$value<-(test$day/100+1)+rnorm(144)
> test$value<-test$value+(as.numeric(test$item)*1.3)
> test$dayf <- factor(test$day)
>
>
> Fitalphabeta <- aov(value ~ day, data=test)
> AB <- xyplot(value ~ day | item, data=test, layout=c(8,1),
> scales=list(alternating=1),
>              main="regression, ignoring item\ncommon slope, common
> intercept") +
>                layer(panel.abline(coef(Fitalphabeta)))
> AB
>
> Fitalphabeta0 <- aov(value ~ 0 + item, data=test)
> AB0 <- xyplot(value ~ day | item, data=test, layout=c(8,1),
> scales=list(alternating=1),
>               main="anova, ignoring day\nzero slope, variable
> intercept") +
>
> layer(panel.abline(a=coef(Fitalphabeta0)[panel.number()], b=0))
> AB0
>
> Fitalphaibeta <- aov(value ~ 0 + item + day, data=test)
> AiB <- xyplot(value ~ day | item, data=test, layout=c(8,1),
> scales=list(alternating=1),
>               main="ancova\ncommon slope, variable intercept") +
>
> layer(panel.abline(a=coef(Fitalphaibeta)[panel.number()],
> b=rev(coef(Fitalphaibeta))[1]))
> AiB
>
> Fitalphaibetaj <- aov(value ~ 0 + item / day, data=test)
> AiBj <- xyplot(value ~ day | item, data=test, layout=c(8,1),
> scales=list(alternating=1),
>                main="ancova with interaction\nvariable slope, variable
> intercept") +
>
> layer(panel.abline(a=coef(Fitalphaibetaj)[panel.number()],
> b=coef(Fitalphaibetaj)[8 + panel.number()]))
> AiBj
>
> ## this is very tight on the standard 7x7 plotting surface.
> ## it will look ok with 10in wide and 7in high
> ## pdf("AB2x2.pdf", height=7, width=10)
> print(AB,   split=c(1, 1, 2, 2), more=TRUE)
> print(AB0,  split=c(1, 2, 2, 2), more=TRUE)
> print(AiB,  split=c(2, 1, 2, 2), more=TRUE)
> print(AiBj, split=c(2, 2, 2, 2), more=FALSE)
> ## dev.off()
>
> ## this arrangement matches the arrangement in demo("ancova")
> ## this too is very tight on the standard 7x7 plotting surface.
> ## it will look ok with 10in wide and 10in high
> ## pdf("AB2x3.pdf", height=10, width=10)
> print(AB,   split=c(1, 2, 2, 3), more=TRUE)
> print(AB0,  split=c(2, 3, 2, 3), more=TRUE)
> print(AiB,  split=c(2, 2, 2, 3), more=TRUE)
> print(AiBj, split=c(2, 1, 2, 3), more=FALSE)
> ## dev.off()
>
>
> On Thu, Feb 12, 2015 at 9:48 AM, PIKAL Petr <petr.pikal at precheza.cz>
> wrote:
> > Hi Rich
> >
> >> -----Original Message-----
> >> From: Richard M. Heiberger [mailto:rmh at temple.edu]
> >> Sent: Wednesday, February 11, 2015 10:53 PM
> >> To: PIKAL Petr
> >> Cc: r-help at r-project.org
> >> Subject: Re: [R] suggestion for optimal plotting to show significant
> >> differences
> >>
> >> Petr,
> >>
> >> My first attempt is to use the simple=TRUE argument to
> interaction2wt.
> >>
> >> Then the bwplots in the item|item panel show the behavior of value
> over
> >> day for each item.  You get a plot similar to this panel with the
> >> growth curve plots from nlme, for example,
> >>     bwplot(value ~ day | item, data=test, horizontal=FALSE) I am
> >> treating set as a replication and each box is cumulated over the
> three
> >> sets.
> >
> > Yes, that is the point - set is actually replication of result.
> >
> >>
> >> My analysis question is about day.  You have it as numeric.  My
> >> inclination would be to make day factor.  Then you could model the
> >> interaction of day and item.
> >
> > Hm. I hoped I can do
> >
> > fit<-lm(value~day*item, data=test)
> > summary(fit)
> >
> > in which case I can compare differences in intercepts and/or slopes
> for each item.
> >
> > However I am rather lost in aov
> >
> > test$dayf <- factor(test$day)
> >
> > fit1 <- aov(value~item+dayf, data=test)
> > summary(fit)
> > fit2 <- aov(value~item/dayf, data=test)
> > summary(fit)
> > and
> > fit3 <- aov(value~item*dayf, data=test)
> > summary(fit)
> > which gives bascally the same result as fit2
> >
> >> anova(fit1, fit2)
> > Analysis of Variance Table
> >
> > Model 1: value ~ item + dayf
> > Model 2: value ~ item/dayf
> >   Res.Df    RSS Df Sum of Sq      F Pr(>F)
> > 1    131 160.59
> > 2     96 128.60 35    31.993 0.6824 0.8993
> >
> > TukeyHSD(fit1, which="item")
> > TukeyHSD(fit2, which="item")
> >
> > Both models seems to give quite similar results and I am not sure
> what actually differs in those models. I believe that model2 tests each
> item within particular day (but I am not sure about it).
> >
> > However this discussion is probably deviating more to the statistics
> issue than to R itself.
> >
> > I just thought that somebody helps me with a method for comparison of
> item performance in case that relation of value to day is not simple
> linear (as in my example) and cannot be expressed by some formula (like
> examples in nlme).
> >
> > So far the best options are either your bwplot or my ggplots.
> >
> > p<-ggplot(test, aes(x=day, y=value, colour=item))
> > p+geom_point()+stat_smooth(method="lm", formula= y~poly(x,2))
> >
> > p<-ggplot(test, aes(x=dayf, y=value, colour=item))
> > p+geom_boxplot()
> >
> > p<-ggplot(test, aes(x=item, y=value))
> > p+geom_boxplot()+facet_wrap(~day)
> >
> > Thank you for your suggestions which directed me to interaction plots
> which I was not aware of.
> >
> > Best regards
> > Petr
> >
> >
> >>
> >> Rich
> >>
> >> On Mon, Feb 9, 2015 at 6:01 AM, PIKAL Petr <petr.pikal at precheza.cz>
> >> wrote:
> >> > Hallo Richard.
> >> >
> >> > I tried your suggestion but it seems to be no better than simple
> >> ggplot. Let me extend the example a bit to 8 items which is more
> >> realistic.
> >> >
> >> > item<-rep(letters[1:8], each=18)
> >> > day<-rep((0:5)*100, 24)
> >> > set<-rep(rep(1:3, each=6), 8)
> >> > test<-data.frame(item, day, set)
> >> > set.seed(111)
> >> > test$value<-(test$day/100+1)+rnorm(144)
> >> > test$value<-test$value+(as.numeric(test$item)*1.3)
> >> >
> >> > Value is increasing during time (day) for each tested subject
> (item),
> >> each item is measured 3 times (set) each day.
> >> >
> >> > Here is some graph
> >> > p<-ggplot(test, aes(x=day, y=value, colour=item))
> >> > p+geom_point()+stat_smooth(method="lm", formula= y~poly(x,2))
> >> >
> >> > I can do lm or aov, however I am not sure about proper formula.
> >> >
> >> > fit<-lm(value~day, data=test)
> >> > summary(fit)
> >> > # this shows that value is increasing with day
> >> >
> >> > fit<-lm(value~day/item, data=test)
> >> > summary(fit)
> >> > # this suggests that value is decreasing with day (which is wrong)
> >> >
> >> > fit<-lm(value~day*item, data=test)
> >> > summary(fit)
> >> > # and this tells me that value is increasing with day and items
> have
> >> different intercepts but the same rate of growth (I hope I got it
> >> right).
> >> >
> >> > I do not have your book available but I went through help pages.
> >> >
> >> > Your interaction graph is not much better than ggplot.
> >> > I can do
> >> >
> >> > interaction2wt(value ~ item * day, data=test)
> >> >
> >> > which probably is closer to actual problem.
> >> >
> >> > The basic problem is that increase of value with days is in fact
> not
> >> linear and actually it can increase in the beginning and then
> stagnate
> >> or it can stagnate in beginning and then increase. I am not aware of
> >> any way how to compare time behaviour of different items in such
> >> situations if I cannot state some common formula in which case I
> would
> >> use probably nlme.
> >> >
> >> > Thank for your insight, I try to go through it more deeply.
> >> >
> >> > Best regards
> >> > Petr
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Richard M. Heiberger [mailto:rmh at temple.edu]
> >> >> Sent: Friday, February 06, 2015 6:14 PM
> >> >> To: PIKAL Petr
> >> >> Cc: r-help at r-project.org
> >> >> Subject: Re: [R] suggestion for optimal plotting to show
> significant
> >> >> differences
> >> >>
> >> >> I would try one of these illustrations for starts.
> >> >> interaction2wt (two-way tables) is designed to be used with aov()
> >> for
> >> >> testing.
> >> >> interaction2wt shows all main effects and all two-way
> interactions
> >> >> for many factors.
> >> >>
> >> >>
> >> >>
> >> >> test <-
> >> >> structure(list(item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> >> 1L,
> >> >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L,
> >> >> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"),
> class
> >> >> = "factor"), day = c(0L, 100L, 200L, 300L, 400L, 500L, 0L, 100L,
> >> >> 200L, 300L, 400L, 500L, 0L, 100L, 200L, 300L, 400L, 500L, 0L,
> 100L,
> >> >> 200L, 300L, 400L, 500L, 0L, 100L, 200L, 300L, 400L, 500L, 0L,
> 100L,
> >> >> 200L, 300L, 400L, 500L), set = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
> 2L,
> >> >> 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L,
> >> >> 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), value =
> c(1.08163365169503,
> >> >> 2.61998412608805, 3.07820466606394, 4.44993419381934,
> >> >> 5.29163171545805, 6.29155990999293, -0.123163011367676,
> >> >> 2.07767236834003, 2.32537052874901, 3.09372794501084,
> >> >> 6.65273721166635, 5.92304962329131, 1.50504697705548,
> >> >> 2.66253728086866, 2.63420157418685, 2.78195098580416,
> >> >> 6.47578642973288, 5.89587443775143, 0.848864231485078,
> >> >> 1.27549677119713, 2.19573089053609, 2.45659926134292,
> >> >> 5.15424403414103, 5.4813151140983, 1.25731482647214,
> >> >> 2.09662105167973, 1.75954023316977, 4.81624002288939,
> >> >> 4.65029189325307, 6.39946904227214, 0.944996929887344,
> >> >> 1.74667265331284, 2.42956264345558, 5.17852980415141,
> >> >> 3.5453435965834, 6.9011238437191)), .Names = c("item", "day",
> "set",
> >> >> "value"), row.names = c(NA, -36L), class =
> >> >> "data.frame")
> >> >>
> >> >>
> >> >>
> >> >> library(HH)
> >> >>
> >> >> test$set <- factor(test$set)
> >> >> test$day <- factor(test$day)
> >> >> test$item <- factor(test$item)
> >> >>
> >> >> interaction2wt(value ~ item * day * set, data=test)
> >> >>
> >> >> test$item.day <- interaction(test$item, test$day)
> >> >> position(test$item.day) <- outer(c(-10,10),
> >> >> as.numeric(levels(test$day)), `+`)
> >> >>
> >> >> xyplot(value ~ as.position(item.day) | set, groups=item,
> >> >>         data=test, horizontal=FALSE, pch=c(17,16),
> >> >>         xlab="day",
> >> >>         scales=list(
> >> >>           x=list(
> >> >>             alternating=1,
> >> >>             at=levels(test$day), ## placement of tick labels and
> >> marks
> >> >>             tck=1)),
> >> >>         key=list(
> >> >>           text=list(c("A","B"), col=c("blue","red")),
> >> >>           points=list(pch=c(17, 16), col=c("blue","red")),
> >> >>        space="top", columns=2, border=TRUE),
> >> >>        layout=c(3,1))
> >> >>
> >> >>
> >> >> ## see also the examples in
> >> >> demo(package="HH", bwplot.examples)
> >> >>
> >> >> On Fri, Feb 6, 2015 at 6:09 AM, PIKAL Petr
> <petr.pikal at precheza.cz>
> >> >> wrote:
> >> >> > Dear all
> >> >> >
> >> >> > I would like to ask for your opinion about possible graphical
> >> >> representation of such data.
> >> >> >
> >> >> >> dput(test)
> >> >> > structure(list(item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L,
> >> >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
> >> 2L,
> >> >> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
> "B"),
> >> >> > class
> >> >> =
> >> >> > "factor"), day = c(0L, 100L, 200L, 300L, 400L, 500L, 0L, 100L,
> >> >> > 200L, 300L, 400L, 500L, 0L, 100L, 200L, 300L, 400L, 500L, 0L,
> >> 100L,
> >> >> > 200L, 300L, 400L, 500L, 0L, 100L, 200L, 300L, 400L, 500L, 0L,
> >> 100L,
> >> >> > 200L, 300L, 400L, 500L), set = c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L,
> >> >> > 2L, 2L,
> >> >> 2L,
> >> >> > 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> >> 2L,
> >> >> > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), value = c(1.08163365169503,
> >> >> > 2.61998412608805, 3.07820466606394, 4.44993419381934,
> >> >> > 5.29163171545805, 6.29155990999293, -0.123163011367676,
> >> >> > 2.07767236834003, 2.32537052874901, 3.09372794501084,
> >> >> > 6.65273721166635, 5.92304962329131, 1.50504697705548,
> >> >> > 2.66253728086866, 2.63420157418685, 2.78195098580416,
> >> >> > 6.47578642973288, 5.89587443775143, 0.848864231485078,
> >> >> > 1.27549677119713, 2.19573089053609, 2.45659926134292,
> >> >> > 5.15424403414103, 5.4813151140983, 1.25731482647214,
> >> >> 2.09662105167973,
> >> >> > 1.75954023316977, 4.81624002288939, 4.65029189325307,
> >> >> > 6.39946904227214, 0.944996929887344, 1.74667265331284,
> >> >> > 2.42956264345558, 5.17852980415141, 3.5453435965834,
> >> >> > 6.9011238437191)), .Names = c("item", "day", "set", "value"),
> >> >> > row.names = c(NA, -36L), class = "data.frame")
> >> >> >>
> >> >> >
> >> >> > One option I came with is
> >> >> >
> >> >> > library(ggplot2)
> >> >> > p<-ggplot(test, aes(x=day, y=value, colour=item))
> >> >> > p+geom_point()+stat_smooth(method="lm", formula= y~poly(x,2))
> >> >> >
> >> >> > but -
> >> >> > I have more items (around 5-10), and I want to show if the
> >> >> > difference
> >> >> between items is or is not significant. The actual development of
> >> >> value with day is usually not linear nor growing steadily and
> >> >> actually I cannot usually evaluate some analytical equation for
> my
> >> >> data to compare equation parameters.
> >> >> >
> >> >> > I thought about boxplots, but there is not many repetitions and
> >> >> actually 5+ boxplots can be quite messy.
> >> >> >
> >> >> > I can plot only mean for each set and item but in that case I
> lose
> >> >> information if the difference is or is not significant.
> >> >> >
> >> >> > I appreciate any suggestion.
> >> >> >
> >> >> > Best regards
> >> >> > Petr
> >> >> >
> >

________________________________
Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.


More information about the R-help mailing list