Automated reporting of statistical models in R

Automated reporting of statistical models in R

This is my notes for learning Formatted correlation output with effect sizes and Automated reporting of statistical models in R • report.

One of the most time-consuming part of data analysis in science is the copy-pasting of specific values of some R output to a manuscript or a report. This task is frustrating, prone to errors, and increases the variability of statistical reporting. At the same time, standardizing practices of what and how to report is crucial for reproducibility and clarity.

R
1
2
3
4
5
6
7
8
9
10
11
# devtools::install_github("easystats/report")

library(report)

# Do a correlation
df <- iris
cor_results <- cor.test(df$Sepal.Length,
df$Petal.Length)

# Formatted output
report(cor_results)

The Pearson’s product-moment correlation between df$Sepal.Length and df$Petal.Length is positive, significant and large (r = 0.87, p < .001).

From R to Manuscript

report‘s primary goal is to bridge the gap between R’s output and the formatted results contained in your manuscript. It automatically produced reports of models and dataframes according the best pratice guidelines (e.g. APA’s style guide), ensuring standardization and quality in resulting reporting.

R
1
2
3
library(report)
lm(Sepal.Length ~ Species, iris) %>%
report()

We fitted a linear model to predict Sepal.Length with Species.The model’s explanatory power is substantial (R2 = 0.62, adj. R2 = 0.61). The model’s intercept is at 5.01. Whithin this model:

  • The effect of Species (versicolor) is positive and can be considered as very large and significant (beta = 0.93, 95% CI [0.73, 1.13], std. beta = 1.12, p < .001).
  • The effect of Species (virginica) is positive and can be considered as very large and significant (beta = 1.58, 95% CI [1.38, 1.79], std. beta = 1.91, p < .001).

Report all the things

General Workflow

The report package works in a two step fashion. First, you create a report object with the report function (which takes different arguments depending on the type of object you are reporting). Then, this report object can be displayed either textually, using to_text(), or as a table, using to_table(). Moreover, you can access a more detailed (but less digested) version of the reporting using to_fulltext() and to_fulltable(). Finally, to_values() makes it easy to access all the internals of a model.

Supported Packages

Currently supported objects by report include cor.test, t.test, correlation, glm, lme4::merMod, rstanarm::stanreg.

Examples

R
1
report(iris)

The data contains 150 observations of the following variables:

  • Sepal.Length: Mean = 5.84, SD = 0.83, range = [4.30, 7.90], 0 missing
  • Sepal.Width: Mean = 3.06, SD = 0.44, range = [2, 4.40], 0 missing
  • Petal.Length: Mean = 3.76, SD = 1.77, range = [1, 6.90], 0 missing
  • Petal.Width: Mean = 1.20, SD = 0.76, range = [0.10, 2.50], 0 missing
  • Species: 3 levels: setosa (n = 50); versicolor (n = 50) and virginica (n = 50)
R
1
2
cor.test(iris$Sepal.Length, iris$Petal.Length) %>%
report()

The Pearson’s product-moment correlation between iris$Sepal.Length and iris$Petal.Length is positive, significant and large (r = 0.87, p < .001).

R
1
2
3
lm(Sepal.Length ~ Petal.Length + Species, data=iris) %>%
report() %>%
to_fulltable()
Parameter Coefficient SE 95% CI t df p Coefficient (std.) Fit
(Intercept) 3.68 0.11 [ 3.47, 3.89] 34.72 146 < .001 1.50
Petal.Length 0.90 0.06 [ 0.78, 1.03] 13.96 146 < .001 1.93
Species (versicolor) -1.60 0.19 [ -1.98, -1.22] -8.28 146 < .001 -1.93
Species (virginica) -2.12 0.27 [ -2.66, -1.58] -7.74 146 < .001 -2.56
AIC 106.23
BIC 121.29
R2 0.84
R2 (adj.) 0.83
RMSE 0.33

Get Started

Supported Objects

Dataframes

R
1
2
3
iris %>%
report() %>%
to_fulltext()

The data contains 150 observations of the following variables:

  • Sepal.Length: Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04, range: [4.30, 7.90]Skewness = 0.31, Kurtosis = -0.57, 0 missing
  • Sepal.Width: Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44, range: [2, 4.40]Skewness = 0.32, Kurtosis = 0.18, 0 missing
  • Petal.Length: Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85, range: [1, 6.90]Skewness = -0.27, Kurtosis = -1.40, 0 missing
  • Petal.Width: Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04, range: [0.10, 2.50]Skewness = -0.10, Kurtosis = -1.34, 0 missing
  • Species: 3 levels: setosa (n = 50, 33.33%); versicolor (n = 50, 33.33%) and virginica (n = 50, 33.33%)
R
1
2
3
iris %>%
report() %>%
to_table()
Variable Level n_Obs Mean SD Min Max n_Missing
Sepal.Length 150 5.84 0.83 4.30 7.90 0
Sepal.Width 150 3.06 0.44 2.00 4.40 0
Petal.Length 150 3.76 1.77 1.00 6.90 0
Petal.Width 150 1.20 0.76 0.10 2.50 0
Species setosa 50
Species versicolor 50
Species virginica 50

Grouped Dataframes

R
1
2
3
iris %>%
group_by(Species) %>%
report(median = T, range = F)

The data contains 150 observations, grouped by Species, of the following variables:

  • setosa (n = 50):
    • Sepal.Length: Median = 5.00, MAD = 0.30, 0 missing
    • Sepal.Width: Median = 3.40, MAD = 0.37, 0 missing
    • Petal.Length: Median = 1.50, MAD = 0.15, 0 missing
    • Petal.Width: Median = 0.20, MAD = 0.00, 0 missing
  • versicolor (n = 50):
    • Sepal.Length: Median = 5.90, MAD = 0.52, 0 missing
    • Sepal.Width: Median = 2.80, MAD = 0.30, 0 missing
    • Petal.Length: Median = 4.35, MAD = 0.52, 0 missing
    • Petal.Width: Median = 1.30, MAD = 0.22, 0 missing
  • virginica (n = 50):
    • Sepal.Length: Median = 6.50, MAD = 0.59, 0 missing
    • Sepal.Width: Median = 3.00, MAD = 0.30, 0 missing
    • Petal.Length: Median = 5.55, MAD = 0.67, 0 missing
    • Petal.Width: Median = 2.00, MAD = 0.30, 0 missing
R
1
2
3
4
iris %>%
group_by(Species) %>%
report(median = T, range = F) %>%
to_fulltable()
Group Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis n_Missing
setosa Sepal.Length 50 5.01 0.35 5.00 0.30 4.30 5.80 0.12 -0.35 0
setosa Sepal.Width 50 3.43 0.38 3.40 0.37 2.30 4.40 0.04 0.74 0
setosa Petal.Length 50 1.46 0.17 1.50 0.15 1.00 1.90 0.10 0.80 0
setosa Petal.Width 50 0.25 0.11 0.20 0.00 0.10 0.60 1.22 1.43 0
versicolor Sepal.Length 50 5.94 0.52 5.90 0.52 4.90 7.00 0.10 -0.60 0
versicolor Sepal.Width 50 2.77 0.31 2.80 0.30 2.00 3.40 -0.35 -0.45 0
versicolor Petal.Length 50 4.26 0.47 4.35 0.52 3.00 5.10 -0.59 -0.07 0
versicolor Petal.Width 50 1.33 0.20 1.30 0.22 1.00 1.80 -0.03 -0.49 0
virginica Sepal.Length 50 6.59 0.64 6.50 0.59 4.90 7.90 0.11 -0.09 0
virginica Sepal.Width 50 2.97 0.32 3.00 0.30 2.20 3.80 0.35 0.52 0
virginica Petal.Length 50 5.55 0.55 5.55 0.67 4.50 6.90 0.53 -0.26 0
virginica Petal.Width 50 2.03 0.27 2.00 0.30 1.40 2.50 -0.13 -0.66 0

Correlations and t-tests

R
1
2
report(cor.test(iris$Sepal.Length, iris$Petal.Length))
report(t.test(iris$Sepal.Length, iris$Petal.Length))

The Pearson’s product-moment correlation between iris$Sepal.Length and iris$Petal.Length is positive, significant and large (r = 0.87, p < .001).

The Welch Two Sample t-test suggests that the difference between iris$Sepal.Length and iris$Petal.Length (mean of x = 5.84, mean of y = 3.76, difference = 2.09) is significant (t(211.54) = 13.10, 95% CI [1.77, 2.40], p < .001).

Linear Models

R
1
2
3
lm(Sepal.Length ~ Petal.Length + Species, data=iris) %>%
report() %>%
to_text()

We fitted a linear model to predict Sepal.Length with Petal.Length and Species.The model’s explanatory power is substantial (R2 = 0.84, adj. R2 = 0.83). The model’s intercept is at 3.68. Whithin this model:

  • The effect of Petal.Length is positive and can be considered as very large and significant (beta = 0.90, 95% CI [0.78, 1.03], std. beta = 1.93, p < .001).
  • The effect of Species (versicolor) is negative and can be considered as very large and significant (beta = -1.60, 95% CI [-1.98, -1.22], std. beta = -1.93, p < .001).
  • The effect of Species (virginica) is negative and can be considered as very large and significant (beta = -2.12, 95% CI [-2.66, -1.58], std. beta = -2.56, p < .001).
R
1
2
3
glm(vs ~ mpg + cyl, mtcars,
family = "binomial") %>%
report()

We fitted a logistic model to predict vs with mpg and cyl.The model’s explanatory power is substantial (Tjur’s R2 = 0.67). The model’s intercept is at 15.97. Whithin this model:

  • The effect of mpg is negative and can be considered as small and not significant (beta = -0.16, 95% CI [-0.71, 0.34], std. beta = -0.98, p > .1).
  • The effect of cyl is negative and can be considered as large and significant (beta = -2.15, 95% CI [-5.19, -0.54], std. beta = -3.84, p < .05).

How to Cite Packages

R
1
cite_packages(sessionInfo())
  1. Austin Wehrwein (2019). awtools: awtools: misc tools and themes for austinwehrwein.com. R package version 0.2.1. http://github.com/awhstin/awtools
  2. Bob Rudis (2019). hrbrthemes: Additional Themes, Theme Components and Utilities for ‘ggplot2’. R package version 0.6.0. https://CRAN.R-project.org/package=hrbrthemes
  3. Dirk Eddelbuettel and Romain Francois (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/
  4. Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
  5. Goodrich B, Gabry J, Ali I & Brilleman S. (2018). rstanarm: Bayesian applied regression modeling via Stan. R package version 2.17.4. http://mc-stan.org/
  6. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
    Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/
  7. Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2019). dplyr: A Grammar of Data Manipulation. R package version 0.8.3. https://CRAN.R-project.org/package=dplyr
  8. Makowski, D. & Lüdecke, D. (2019). The report package for R: Ensuring the use of best practices for results reporting. CRAN. Available from https://github.com/easystats/report.
# R

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×