The apa7 package provides functions to create APA-style tables, including correlation matrices and regression tables. The functions return flextable objects that can be processed further using the flextable package. Although there are other fantastic packages for creating tables (e.g., gt, tinytable, and kableExtra), the flextable package has the fullest support for the .docx format, which is essential for anyone working in APA style. Thanks to the tireless efforts of David Gohel, flextable can handle almost anything that can be done in a .docx table.
Suppose we have a table we want to format. As a raw tibble, it looks like this:
d <-tibble(Model =paste("Model", c(rep(1,2), rep(2, 3))),Predictor =c("Constant", "Socioeconomic status", "Constant", "Socioeconomic status", "Age"),b =c(-4.5, 1.23, -5.1, 1.45, -.23),beta =c(NA, .24, NA, .31, .031),t =c(-18.457, 2.345, -22.457, 2.114, .854),df =c(85,85, 84, 84, 84),p =c(.0001, .0245, .0001, .0341, .544))d#> # A tibble: 5 × 7#> Model Predictor b beta t df p#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 Model 1 Constant -4.5 NA -18.5 85 0.0001#> 2 Model 1 Socioeconomic status 1.23 0.24 2.35 85 0.0245#> 3 Model 2 Constant -5.1 NA -22.5 84 0.0001#> 4 Model 2 Socioeconomic status 1.45 0.31 2.11 84 0.0341#> 5 Model 2 Age -0.23 0.031 0.854 84 0.544
Initial Results with flextable
If we use flextable::flextable with the flextable::thema_apa function as the default theme, we get something close to what we want:
set_flextable_defaults(theme_fun = theme_apa, font.family ="Times New Roman", text.align ="center",table_align ="left")flextable(d)
Model
Predictor
b
beta
t
df
p
Model 1
Constant
-4.50
-18.46
85.00
0.00
Model 1
Socioeconomic status
1.23
0.24
2.35
85.00
0.02
Model 2
Constant
-5.10
-22.46
84.00
0.00
Model 2
Socioeconomic status
1.45
0.31
2.11
84.00
0.03
Model 2
Age
-0.23
0.03
0.85
84.00
0.54
Unfortunately, the journey from “close to what we want” to “exactly what we want” often requires a series of polishing moves that can take a long time and/or specialized knowledge to get right. Some steps along this path might include:
Convert Model to a row titles (i.e., Model 1, Model 2)
Left align the Predictor columns
Make Predictor column wider
Negative numbers should have real minus signs (−) instead of hyphens (-).
Align numeric columns on decimals.
Format p-values and align them on the decimals
Remove leading zeroes for beta and p
Italicize statistic headings (e.g., b → b)
Of course, there are flextable functions that can do most of these things, but applying them repeatedly is tedious. This is not a criticism of flextable. As general-purpose package for table creation, flextable should not be expected to anticipate the complex rules of specific style guides. The fact that it has the theme_apa function is more than generous already.
The apa_flextable tries to take care of all of these APA-Style polishing moves with minimal fuss.
Polished table with apa_flextable
apa_flextable(d, row_title_column = Model)
Predictor
b
β
t
df
p
Model1
Constant
−4.50
−18.46
85
<.001
Socioeconomicstatus
1.23
.24
2.35
85
.02
Model2
Constant
−5.10
−22.46
84
<.001
Socioeconomicstatus
1.45
.31
2.11
84
.03
Age
−0.23
.03
0.85
84
.54
The row titles can be aligned to the left, center, or right.
A common problem with table formatting functions is that they try to do too much in one function, making it difficult to customize the output. Like the entire flextable ecosystem, the apa_flextable function is designed to be flexible in terms of its inputs and allows for further customization afterwards.
The apa_flextable function returns a flextable object that can be further processed with flextable functions, if needed. For example, suppose we wanted to bold the beta coefficient for the first predictor:
Some flextable functions that take care of common formatting problems. All of these can be applied to specific column names/positions or row positions. The row positions can be selected conditionally based on data in each row.
It is also possible to modify the automatic formatting. For example, suppose we want any column called “Predictor” to be renamed to “Variable” and to make all variables to be upper case (i.e., capital letters).
The column_format function creates an object for a single column.
The column_formats function creates a default list of column_format objects.
We can also set the rounding accuracy of all columns to .001 instead of the default of .01.
# Make new formatter object with default accuracy of .001my_formats <-column_formats(accuracy = .001)# Add Predictor column formattermy_formats$Predictor <- cf_predictor# Remove formatter for beta columnmy_formats$beta <-NULLapa_flextable(d, row_title_column = Model, column_formats = my_formats)
Variable
b
beta
t
df
p
Model1
CONSTANT
−4.500
−18.457
85
<.001
SOCIOECONOMICSTATUS
1.230
0.24
2.345
85
.024
Model2
CONSTANT
−5.100
−22.457
84
<.001
SOCIOECONOMICSTATUS
1.450
0.31
2.114
84
.034
AGE
−0.230
0.03
0.854
84
.544
The my_formats object is a list of column_format objects. Each column_format object can specify the name, header, formatter, and other options for a column. The column_formats function creates a list of column_format objects with default settings that can be modified as needed.
The apa_flextable function performs a number of formatting operations on the data before and after the data are sent to flextable. Roughly speaking, apa_flextable, by default, performs these operations.
Add space between adjacent column spanners.
Apply as_grouped_data and restructure row titles, if row_title is specified.
Format data with apa_format_columns if auto_format_columns = TRUE
Separate headers into multiple rows if separate_headers = TRUE
Apply flextable
Apply surround to make borders to separate row groups, if any.
Apply apa_style To style table and convert markdown if apa_style = TRUE
Apply pretty_widths if pretty_widths = TRUE
For the intrepid, these steps can be applied sequentially without apa_flextable. Here is what that might look like (column spanners added for illustration, not because the table needs them).
d |># Create column spannersrename_with(.cols =c(b, beta), \(x) paste0("Coefficients_", x)) |>rename_with(.cols =c(t, df, p), .fn = \(x) paste0("Significance Test_", x)) |># Step 1: Space between column spannersadd_break_columns(ends_with("beta")) |># Step 2: Make row titles flextable::as_grouped_data("Model") |>mutate(row_title = Model, .before =1) |>fill(Model) |># Step 3: Format dataapa_format_columns() %>%# Step 4: Convert to flextableflextable(col_keys =colnames(select(., -Model, -row_title))) |>mk_par(i =~!is.na(row_title), value =as_paragraph(row_title)) |>merge_h(i =~!is.na(row_title)) |># Step 5: Separate headers into column spanners and deckered heads flextable::separate_header() |># Step 6: Make borders between row groupssurround(i =~!is.na(row_title),border.top =list(color ="gray20",style ="solid",width =1 ) ) |># Step 7: Style table and convert markdownapa_style() |>align(j =1, i =~is.na(row_title)) |>align(i =~!is.na(row_title), align ="center") |># Step 8: Pretty widthspretty_widths()
Predictor
Coefficients
SignificanceTest
b
β
t
df
p
Model1
Constant
−4.50
−18.46
85.00
<.001
Socioeconomicstatus
1.23
.24
2.35
85.00
.02
Model2
Constant
−5.10
−22.46
84.00
<.001
Socioeconomicstatus
1.45
.31
2.11
84.00
.03
Age
−0.23
.03
0.85
84.00
.54
Helper functions
Break columns
Groups of related variables can be separated by adding break columns. Here we take the diamonds data set and calculate the means and standard deviations of several variables, separated by Cut
The apa_flextable function, by default, assumes that separated headers are desired when column names have underscores. Under the hood, it calls flextable::separate_header and inserts interior borders. This feature can be turned off by setting separate_headers = FALSE.
apa_flextable(d_diamonds)
Cut
Carat
Depth
Table
M
SD
M
SD
M
SD
Fair
1.05
0.52
64.04
3.64
59.05
3.95
Good
0.85
0.45
62.37
2.17
58.69
2.85
VeryGood
0.81
0.46
61.82
1.38
57.96
2.12
Premium
0.89
0.52
61.26
1.16
58.75
1.48
Ideal
0.70
0.43
61.71
0.72
55.95
1.25
By default, small blank columns are inserted between column spanner groups. If you want to insert them yourself, the add_break_columns function insert break columns before or after any variable. As input, it can take any quoted or unquoted variable name, or any tidyselect function (e.g., starts_with, ends_with, contains,where, `everything).
We can insert breaks after Carat_SD and Depth_SD directly like so:
d_diamonds |>add_break_columns(Carat_SD, Depth_SD)#> # A tibble: 5 × 9#> Cut Carat_M Carat_SD apa7breakcolumn1 Depth_M Depth_SD apa7breakcolumn2#> <ord> <dbl> <dbl> <lgl> <dbl> <dbl> <lgl> #> 1 Fair 1.05 0.516 NA 64.0 3.64 NA #> 2 Good 0.849 0.454 NA 62.4 2.17 NA #> 3 Very Good 0.806 0.459 NA 61.8 1.38 NA #> 4 Premium 0.892 0.515 NA 61.3 1.16 NA #> 5 Ideal 0.703 0.433 NA 61.7 0.719 NA #> # ℹ 2 more variables: Table_M <dbl>, Table_SD <dbl>
Alternately, we can add a break column after each variable ending with “SD” except for the last one. The apa_flextable function knows to treat any column beginning with apa7breakcolumn as a break column.
The flextable package has functions like add_header_row and add_header for adding header rows after a flextable has been made. It also has the separate_headers function for creating header rows from the variable names. Both approaches are needed at times, but I like using separate_headers because it is usually easier to manipulate the column names before the table is made than it is afterwards.
By default, separate_headers converts names with underscores into column spanners (header labels that span multiple columns, usually at higher rows in the header) and deckered heads (single-column labels under the column spanners). Each underscore separates labels in separate header rows.
One can make column spanner labels by hand, with custom functions, or with the column_spanner function. It adds the same spanner label to multiple columns, using quoted or unquoted variable names (combined in a vector with c) or tidyselect functions like starts_with, ends_with, or contains. Any selected variables will be relocated after the first selected variable, in the order selected. The relocation can be prevented by setting relocate = FALSE.
d |>column_spanner_label("Significance test", c(t,df,p)) |>column_spanner_label("Coefficients", starts_with("b")) |>apa_flextable(row_title_column = Model)
Predictor
Coefficients
Significancetest
b
β
t
df
p
Model1
Constant
−4.50
−18.46
85
<.001
Socioeconomicstatus
1.23
.24
2.35
85
.02
Model2
Constant
−5.10
−22.46
84
<.001
Socioeconomicstatus
1.45
.31
2.11
84
.03
Age
−0.23
.03
0.85
84
.54
Decimal/Character alignment
The align_chr function does three things:
Rounds to a desired accuracy (default = .01) via scales:number
Replaces minus signs with a true text minus sign via signs::signs
Pads numbers (via apa7::num_pad) with figure spaces (\u2007) on both sides of the decimal (or any other character) so that all numbers in the column have the same width.
The hanging_indent function is a hack, but a necessary one. Sometimes we need paragraphs to be indented in one way or another, and flextable does not have exactly what we need. So hanging_indent splits the text (via stringr::str_wrap) and indents it as specified with figure spaces \u2007.
Note that align_chr aligns the text on the decimal, but pads the left side only so that the right side of the text can be of variable width.
d_quote <-tibble(Quote =c("Believe those who are seeking the truth. Doubt those who find it.","Resentment is like drinking poison and waiting for the other person to die.","What you read when you don’t have to, determines what you will be when you can’t help it.","Advice is what we ask for when we already know the answer but wish we didn’t.","Do not ask whether a statement is true until you know what it means.","Tact is the art of making a point without making an enemy.","Short cuts make long delays.","The price one pays for pursuing any profession or calling is an intimate knowledge of its ugly side.","There is a stubbornness about me that never can bear to be frightened at the will of others. My courage always rises at every attempt to intimidate me","There is a crack in everything, that’s how the light gets in.","If you choose to dig a rather deep hole, someday you will have no choice but to keep on digging, even with tears.","We long for self-confidence, till we look at the people who have it.","Writing is a way to end up thinking something you couldn’t have started out thinking.","A little inaccuracy sometimes saves tons of explanation.","Each snowflake in an avalanche pleads not guilty.","What I write is smarter than I am. Because I can rewrite it." ),Attribution =c("Andre Gide","Carrie Fisher","Charles Francis Potter","Erica Jong","Errett Bishop","Howard W. Newton","J.R.R. Tolkien","James Baldwin","Jane Austin","Leonard Cohen","Liyun Chen","Mignon McLaughlin","Peter Elbow","Saki","Stanislaw J. Lec","Susan Sontag" )) |>arrange(nchar(Quote))d_quote |>mutate(Quote =paste0(seq_along(Quote), ".\u2007", Quote) |>align_chr(side ="left") |>hanging_indent(width =55, indent =7)) |>apa_flextable() |>align(j ="Attribution", part ="all") |>width(width =c(4.5, 2))
It is also possible to make the list lettered (upper or lowercase) or with Roman numerals (upper or lowercase). Set the type argument to “A”, “a”, “I”, or “i”.
d_quote |>add_list_column(Quote, type ="A", sep =") ") |>apa_flextable() |>align(j ="Attribution", part ="all") |>width(width =c(.3, 4.2, 2))
When data is supplied to apa_flextable, any variable that ends with apa7starcolumn will be left aligned, and the variable to its immediate left will be right aligned. Here we convert the p column to stars, placing baba7starcolumn after column b.
d_star <-tibble(Predictor =c("Constant", "Socioeconomic status"),b =c(.45,.55),p =c(.02, .0002)) |>add_star_column(b, p = p) d_star#> # A tibble: 2 × 4#> Predictor b bapa7starcolumn p#> <chr> <dbl> <chr> <dbl>#> 1 Constant 0.45 "^\\*^" 0.02 #> 2 Socioeconomic status 0.55 "^\\*\\*\\*^" 0.0002apa_flextable(d_star)
Predictor
b
p
Constant
0.45
*
.02
Socioeconomicstatus
0.55
***
<.001
Suppose that the stars are already appended to some numbers. We can separate them into a new apa7starcolumn using separate_star_column.
d_star <-tibble(Predictor =c("Constant", "Socioeconomic status"),b =c("1.10***", "2.32*"),beta =c(NA, .34)) |>separate_star_column(b)d_star#> # A tibble: 2 × 4#> Predictor b bapa7starcolumn beta#> <chr> <chr> <chr> <dbl>#> 1 Constant 1.10 "^\\*\\*\\*^" NA #> 2 Socioeconomic status 2.32 "^\\*^" 0.34apa_flextable(d_star)
Predictor
b
β
Constant
1.10
***
Socioeconomicstatus
2.32
*
.34
APA format with full control
Sometimes you want a table to be particular way, and no package can anticipate the exact structure and formatting required. With a combination of tidyverse, flextable, and apa7 functions, it is possible to get flextable to output almost any kind of APA table you need.
Here I would like the unstandardized and standardized regression coefficients with the two models side by side. I want the p-values converted to stars and appended to the unstandardized coefficients.
Cross-tabulation with Chi-square Test of Independence
ggplot2::diamonds |>select(Cut = cut, Color = color ) |>apa_chisq()
Color
Fair
Good
VeryGood
Premium
Ideal
n
%
n
%
n
%
n
%
n
%
D
163
10.1%
662
13.5%
1,513
12.5%
1,603
11.6%
2,834
13.2%
E
224
13.9%
933
19.0%
2,400
19.9%
2,337
16.9%
3,903
18.1%
F
312
19.4%
909
18.5%
2,164
17.9%
2,331
16.9%
3,826
17.8%
G
314
19.5%
871
17.8%
2,299
19.0%
2,924
21.2%
4,884
22.7%
H
303
18.8%
702
14.3%
1,824
15.1%
2,360
17.1%
3,115
14.5%
I
175
10.9%
522
10.6%
1,204
10.0%
1,428
10.4%
2,093
9.7%
J
119
7.4%
307
6.3%
678
5.6%
808
5.9%
896
4.2%
Note.χ2 (24)=310.32,p < .001,Adj.Cramer’sV=.04
It is not a bad table for so little effort, but the pattern is not easily visible. A plot reveals the direction of the effect such that stones with better cuts tend to have less color.
library(ggplot2)#> #> Attaching package: 'ggplot2'#> The following objects are masked from 'package:psych':#> #> %+%, alphaggplot2::diamonds |>select(Cut = cut, Color = color) |>count(Cut, Color) |>mutate(p = scales::percent(n /sum(n), accuracy = .1), .by = Cut) |>ggplot(aes(Cut, n, fill = Color)) +geom_col(position =position_fill(),alpha = .6,width = .96) +geom_text(aes(label =paste0(p, " (", n, ")")),position =position_fill(vjust = .5),size.unit ="pt",size =14* .8,color ="gray10" ) +theme_minimal(base_family ="Roboto Condensed", base_size =14) +scale_y_continuous("Cumulative Proportion",expand =expansion(c(0, .025)),labels = \(x) scales::percent(x, accuracy =1) ) +scale_x_discrete(expand =expansion()) +theme(panel.grid.major.x =element_blank())
By default, apa_flextable calls ftExtra:col_format_md on the entire table. This makes formatting easy and consistent, but the process is a little slow. It is not so bad with only a few tables, but a document with many tables can take a while to render. If possible, setting markdown = FALSE will speed things up, if needed. It is possible to prevent markdown formatting selectively with markdown_body = FALSE or markdown_header. I usually just live with it or cache code chunks with finished tables so that I do not have to wait every time I render the document.