Residuals of the best fit linear regression model are computed for each observation. Batch effects are adjusted for in the returned counts matrix while preserving the effect of the predictor variable.

compute_residuals(
  clean_metadata,
  filtered_counts,
  dropped,
  cqn_counts = cqn_counts$E,
  primary_variable,
  random_effect = NULL,
  model_variables = NULL,
  is_num = NULL,
  num_var = NULL,
  cores = NULL
)

Arguments

clean_metadata

A data frame with sample identifiers as rownames and variables as factors or numeric as determined by "sageseqr::clean_covariates()".

filtered_counts

A counts data frame with genes removed that have low expression.

dropped

a vector of gene names to drop from filtered counts, as they were not cqn normalized

cqn_counts

A counts data frame normalized by CQN.

primary_variable

Vector of variables that will be collapsed into a single fixed effect interaction term.

random_effect

A vector of variables to consider as random effects instead of fixed effects.

model_variables

Optional. Vector of variables to include in the linear (mixed) model. If not supplied, the model will include all variables in md.

is_num

Is there a numerical covariate to use as an interaction with the primary variable(s). default= NULL

num_var

A numerical metadata column to use in an inaction with the primary variable(s). default= NULL

cores

An integer of cores to specify in the parallel backend (eg. 4).

Details

Counts are normalized prior to linear modeling to compute residuals. A precision weight is assigned to each gene feature to estimate the mean-variance relationship. Counts normalized by conditional quantile normalization (CQN) are used in place of log2 normalized counts.