Type: Package
Title: Tidy Statistical Summaries for Exploratory Data Analysis
Version: 0.1.0
Description: Provides a tidy set of functions for summarising data, including descriptive statistics, frequency tables with normality testing, and group-wise significance testing. Designed for fast, readable, and easy exploration of both numeric and categorical data.
Maintainer: Kleanthis Koupidis <kleanthis.koupidis@gmail.com>
URL: https://github.com/kleanthisk10/tidySummaries
BugReports: https://github.com/kleanthisk10/tidySummaries/issues
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: magrittr, tidyr, dplyr, tibble, purrr, stats, crayon, rlang
RoxygenNote: 7.3.2
Suggests: ggplot2, testthat, knitr, rmarkdown
NeedsCompilation: no
Packaged: 2025-05-01 19:58:17 UTC; Akis
Author: Kleanthis Koupidis [aut, cre], Nikolaos Koupidis [aut]
Repository: CRAN
Date/Publication: 2025-05-05 10:10:02 UTC

Select Non-Numeric Columns

Description

Returns a tibble with only the non-numeric columns of the input, and optionally drops rows with NAs.

Usage

select_non_numeric_cols(dataset, remove_na = FALSE)

Arguments

dataset

A vector, matrix, data frame, or tibble.

remove_na

Logical. If TRUE, rows with any NA values will be dropped. Default is FALSE.

Value

A tibble with only non-numeric columns.

Examples

select_non_numeric_cols(iris)

df <- tibble::tibble(a = 1:6, b = c("x", "y", NA, NA, "z", NA))
select_non_numeric_cols(df, remove_na = TRUE)

Select Numeric Columns

Description

Returns a tibble with only the numeric columns of the input, and optionally drops rows with NAs.

Usage

select_numeric_cols(dataset, remove_na = FALSE)

Arguments

dataset

A vector, matrix, data frame, or tibble.

remove_na

Logical. If TRUE, rows with any NA values will be dropped. Default is FALSE.

Value

A tibble with only numeric columns.

Examples

select_numeric_cols(iris)

Multiple Pattern-Replacement Substitutions

Description

Applies multiple regular expression substitutions to a character vector or a specific column of a data frame. Performs replacements sequentially

Usage

str_replace_many(x, pattern, replacement, column = NULL, ...)

Arguments

x

A character vector or a data frame containing the text to modify.

pattern

A character vector of regular expressions to match.

replacement

A character vector of replacement strings, same length as 'pattern'.

column

Optional. If 'x' is a data frame, the name of the character column to apply the replacements to.

...

Additional arguments passed to 'gsub()', such as 'ignore.case = TRUE'.

Value

- If 'x' is a character vector, returns a modified character vector. - If 'x' is a data frame, returns the data frame with the specified column modified.

Examples

# Example on a character vector
text <- c("The cat and the dog", "dog runs fast", "no animals")
str_replace_many(text, pattern = c("cat", "dog"), replacement = c("lion", "wolf"))

# Example on a data frame
library(tibble)
df <- tibble(id = 1:3, text = c("The cat sleeps", "dog runs fast", "no pets"))
str_replace_many(df, pattern = c("cat", "dog"), replacement = c("lion", "wolf"), column = "text")


Summarise Boxplot Statistics with Outliers

Description

Computes the five-number summary (min, Q1, median, Q3, max), interquartile range (IQR), range, and outliers for each numeric variable in a data frame or a numeric vector.

Usage

summarise_boxplot_stats(x)

Arguments

x

A numeric vector, matrix, data frame, or tibble.

Value

A tibble with columns: 'variable', 'min', 'q1', 'median', 'q3', 'max', 'iqr', 'range', 'n_outliers', 'outliers'.

Examples

summarise_boxplot_stats(iris)
summarise_boxplot_stats(iris$Sepal.Width)
summarise_boxplot_stats(data.frame(a = c(rnorm(98), 10, NA)))


Summarise Coefficient of Variation

Description

Calculates the coefficient of variation (CV = sd / mean) for numeric vectors, matrices, data frames, or tibbles.

Usage

summarise_coef_of_variation(x)

Arguments

x

A numeric vector, matrix, data frame, or tibble.

Value

A tibble: - If input has one numeric column or is a numeric vector: a tibble with a single value. - If input has multiple numeric columns: a tibble with variable names and coefficient of variation values.

Examples

summarise_coef_of_variation(iris)
summarise_coef_of_variation(iris$Petal.Length)
summarise_coef_of_variation(data.frame(a = rnorm(100), b = runif(100)))

Summarise Correlation Matrix with Optional Significance Tests

Description

Computes correlations between numeric variables of a data frame, or between two vectors. Optionally tests statistical significance (p-value)

Usage

summarise_correlation(
  x,
  y = NULL,
  method = c("pearson", "kendall", "spearman"),
  cor_test = FALSE
)

Arguments

x

A numeric vector, matrix, data frame, or tibble.

y

Optional. A second numeric vector, matrix, or data frame (same dimensions as 'x').

method

Character. One of "pearson" (default), "kendall", or "spearman".

cor_test

Logical. If TRUE, uses 'cor.test()' and includes p-values. If FALSE, uses 'cor()' only.

Value

A tibble with variables, correlations, and optionally p-values. Significant results (p < 0.05) are printed in red in the console.

Examples

summarise_correlation(iris)
summarise_correlation(iris$Sepal.Length, iris$Petal.Length, cor_test = TRUE)


Summarise Frequency Table

Description

Computes the frequency and relative frequency (or percentage) of factor or character variables in a data frame or vector.

Usage

summarise_frequency(
  data,
  select = NULL,
  as_percent = FALSE,
  sort_by = NULL,
  top_n = Inf
)

Arguments

data

A character/factor vector, or a data frame/tibble.

select

Optional. One or more variable names to compute frequencies for. If NULL, all factor/character columns are used.

as_percent

Logical. If TRUE, relative frequencies are returned as percentages (%). Default is FALSE (proportions).

sort_by

Optional. If "N", sorts by frequency; if "group", sorts alphabetically; or "%N" (if as_percent = TRUE). Default is no sorting.

top_n

Integer. Show only the top N values

Value

A tibble with the following columns:

variable

The name of the variable.

group

The group/category values of the variable.

N

The count (frequency) of each group.

%N

The proportion or percentage of each group.

Examples

summarise_frequency(iris, select = "Species")
summarise_frequency(iris, as_percent = TRUE, sort_by = "N", top_n = 2)
summarise_frequency(data.frame(group = c("A", "A", "B", "C", "A")), as_percent = TRUE)


Summarize Grouped Statistics

Description

Groups a data frame by one or more variables and summarizes the selected numeric columns using basic statistic functions. Handles missing values by replacement with zero or removal of rows.

Usage

summarise_group_stats(
  df,
  group_var,
  values,
  m_functions = c("mean", "sd", "length"),
  replace_na = FALSE,
  remove_na = FALSE
)

Arguments

df

A data frame or tibble containing the data.

group_var

A character vector of column names to group by.

values

A character vector of numeric column names to summarize.

m_functions

A character vector of functions to apply (e.g., "mean", "sd", "length"). Default is c("mean", "sd", "length").

replace_na

Logical. If TRUE, missing values in numeric columns are replaced with 0. Default is FALSE.

remove_na

Logical. If TRUE, rows with missing values in group or value columns are removed. Default is FALSE.

Value

A tibble with grouped and summarized results.

Examples

summarise_group_stats(iris, group_var = "Species",
 values = c("Sepal.Length", "Petal.Width"))
summarise_group_stats(mtcars, 
group_var = c("cyl", "gear"), 
values = c("mpg", "hp"), remove_na = TRUE)


Summarise Kurtosis

Description

Calculates the kurtosis (default: **excess kurtosis**) of numeric vectors, matrices, data frames, or tibbles. Supports both the "standard" and "unbiased" methods and optionally returns **raw kurtosis**.

Usage

summarise_kurtosis(x, method = c("standard", "unbiased"), excess = TRUE)

Arguments

x

A numeric vector, matrix, data frame, or tibble.

method

Character. Method for kurtosis calculation: '"standard"' (default) or '"unbiased"'.

excess

Logical. If TRUE (default), returns **excess kurtosis** (minus 3); if FALSE, returns **raw kurtosis**.

Value

A tibble: - If input has one numeric column (or is a vector), a single-row tibble. - If input has multiple numeric columns, a tibble with variable names and kurtosis values.

Examples

summarise_kurtosis(iris)
summarise_kurtosis(iris, method = "unbiased")
summarise_kurtosis(iris, excess = FALSE)  # Raw kurtosis
summarise_kurtosis(iris$Sepal.Width)


Summarise Skewness

Description

Calculates skewness for numeric vectors, matrices, data frames, or tibbles using Pearson’s moment coefficient.

Usage

summarise_skewness(x)

Arguments

x

A numeric vector, matrix, data frame, or tibble.

Value

A tibble: - If input has one numeric column or is a numeric vector: a tibble with a single value. - If input has multiple numeric columns: a tibble with variable names and skewness values.

Examples

summarise_skewness(iris)
summarise_skewness(as.vector(iris$Sepal.Width))
summarise_skewness(data.frame(a = rnorm(100), b = rgamma(100, 2)))

Summarise Descriptive Statistics with Optional Testing

Description

Computes descriptive statistics for numeric data. Optionally groups by a variable and includes Shapiro-Wilk and group significance testing. Can color console output for significant differences.

Usage

summarise_statistics(
  data,
  group_var = NULL,
  normality_test = FALSE,
  group_test = FALSE,
  show_colors = TRUE
)

Arguments

data

A numeric vector, matrix, or data frame.

group_var

Optional. A character name of a grouping variable.

normality_test

Logical. If TRUE, performs Shapiro-Wilk test for normality.

group_test

Logical. If TRUE and 'group_var' is set, performs group-wise significance tests (t-test, ANOVA, etc.).

show_colors

Logical. If TRUE and 'group_test' is TRUE, prints colored console output for significant group results. Default is TRUE.

Value

A tibble with descriptive statistics and optional test results per numeric variable.

Examples

summarise_statistics(iris, group_var = "Species", group_test = TRUE)