Identify principal components (PCs) of normalized gene counts and correlate these PCs to interesting covariates. This function wraps `correlate_and_plot()` to visualize, with a heatmap, the relationship between PCs and covariates that meet a false discovery rate (FDR) threshold and return a list of significant covariates.

run_pca_and_plot_correlations(
  normalized_counts,
  clean_metadata,
  scaled = TRUE,
  percent_p_value_cutoff = 1,
  correlation_type = "pearson",
  plot_covariates_vs_pca = TRUE,
  maximum_fdr = 0.1
)

Arguments

normalized_counts

A counts data frame normalized by CQN, TMM, or another preferred method, with genes as rownames.

clean_metadata

A data frame with sample identifiers as rownames and variables as factors or numeric as determined by "sageseqr::clean_covariates()".

scaled

Defaults to TRUE. Variables scaled to have unit variance before the analysis takes place.

percent_p_value_cutoff

The p-value threshold in percent.

correlation_type

Allowed values are "pearson", "spearman" and "kendall". See "psych::corr.test(method)".

plot_covariates_vs_pca

Defaults to TRUE. If false, no plot is returned.

maximum_fdr

Maximum allowable false discovery rate (FDR). Defaults to 0.1.

Value

A list.

  • significant_covariates - A vector of covariates where correlation p-value meets the FDR threshold.

  • pc_results - A customized heatmap of significant covariates and PCs correlated.

  • effects_significant_vars - A vector correlation values.