[R] ggplot2: Get the regression line with 95% confidence bands

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Dec 13 07:28:26 CET 2023


Às 00:36 de 13/12/2023, Robert Baer escreveu:
> coord_cartesian also seems to work for y, and including the breaks = . 
> How about:
> 
> df=data.frame(year= c(2012,2015,2018,2022),
>                score=c(495,493, 495, 474))
> 
> ggplot(df, aes(x = year, y = score)) +
>    geom_point() +
>    geom_smooth(method = "lm", formula = y ~ x) +
>    labs(title = "Standard linear regression for France", x = "Year", y = 
> "PISA score in mathematics") +
>    coord_cartesian(ylim=c(470,500)) +
>    scale_x_continuous(breaks = 2012:2022)
> 
> On 12/12/2023 3:19 PM, varin sacha via R-help wrote:
>> Dear Ben,
>> Dear Daniel,
>> Dear Rui,
>> Dear Bert,
>>
>> Here below my R code.
>> I really appreciate all your comments. My R code is perfectly working 
>> but there is still something I would like to improve. The X-axis is 
>> showing   2012.5 ;   2015.0   ;   2017.5   ;  2020.0
>> I would like to see on X-axis only the year (2012 ; 2015 ; 2017 ; 
>> 2020). How to do?
>>
>>
>> #########
>> library(ggplot2)
>> df=data.frame(year= c(2012,2015,2018,2022), score=c(495,493, 495, 474))
>>
>> ggplot(df, aes(x = year, y = score)) + geom_point() + 
>> geom_smooth(method = "lm", formula = y ~ x) +
>>   labs(title = "Standard linear regression for France", x = "Year", y 
>> = "PISA score in mathematics") + 
>> scale_y_continuous(limits=c(470,500),oob=scales::squish)
>> #########
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Le lundi 11 décembre 2023 à 23:38:06 UTC+1, Ben Bolker 
>> <bbolker using gmail.com> a écrit :
>>
>>
>>
>>
>>
>>
>>
>> On 2023-12-11 5:27 p.m., Daniel Nordlund wrote:
>>> On 12/10/2023 2:50 PM, Rui Barradas wrote:
>>>> Às 22:35 de 10/12/2023, varin sacha via R-help escreveu:
>>>>> Dear R-experts,
>>>>>
>>>>> Here below my R code, as my X-axis is "year", I must be missing one
>>>>> or more steps! I am trying to get the regression line with the 95%
>>>>> confidence bands around the regression line. Any help would be
>>>>> appreciated.
>>>>>
>>>>> Best,
>>>>> S.
>>>>>
>>>>>
>>>>> #############################################
>>>>> library(ggplot2)
>>>>>    df=data.frame(year=factor(c("2012","2015","2018","2022")),
>>>>> score=c(495,493, 495, 474))
>>>>>    ggplot(df, aes(x=year, y=score)) + geom_point( ) +
>>>>> geom_smooth(method="lm", formula = score ~ factor(year), data = df) +
>>>>> labs(title="Standard linear regression for France", y="PISA score in
>>>>> mathematics") + ylim(470, 500)
>>>>> #############################################
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> Hello,
>>>>
>>>> I don't see a reason why year should be a factor and the formula in
>>>> geom_smooth is wrong, it should be y ~ x, the aesthetics envolved.
>>>> It still doesn't plot the CI's though. There's a warning and I am not
>>>> understanding where it comes from. But the regression line is plotted.
>>>>
>>>>
>>>>
>>>> ggplot(df, aes(x = as.numeric(year), y = score)) +
>>>>    geom_point() +
>>>>    geom_smooth(method = "lm", formula = y ~ x) +
>>>>    labs(
>>>>      title = "Standard linear regression for France",
>>>>      x = "Year",
>>>>      y = "PISA score in mathematics"
>>>>    ) +
>>>>    ylim(470, 500)
>>>> #> Warning message:
>>>> #> In max(ids, na.rm = TRUE) : no non-missing arguments to max;
>>>> returning -Inf
>>>>
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>>
>>>>
>>> After playing with this for a little while, I realized that the problem
>>> with plotting the confidence limits is the addition of ylim(470, 500).
>>> The confidence values are outside the ylim values.  Remove the limits,
>>> or increase the range, and the confidence curves will plot.
>>>
>>> Hope this is helpful,
>>>
>>> Dan
>>>
>>    Or use + scale_y_continuous(limits = c(470, 500), oob = 
>> scales::squish)
>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

In the code below I don't use coord_cartesian because to set ylim will 
cut part of the confidence intervals.

To have labels only in the years present in the data set, get them from 
the data.



library(ggplot2)

df <- data.frame(year= c(2012,2015,2018,2022),
                  score=c(495,493, 495, 474))

# in this case unique is not needed, it's here
# because it might with some data sets
brks_year <- df$year # |> unique()

ggplot(df, aes(x = year, y = score)) +
   geom_point() +
   geom_smooth(method = "lm", formula = y ~ x) +
   labs(title = "Standard linear regression for France",
        x = "Year", y = "PISA score in mathematics") +
   scale_x_continuous(breaks = brks_year)



Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com



More information about the R-help mailing list