Filter genes with low expression. This function is more permissive by setting conditions that corresponds to metadata variables. The gene matrix is split by condition and the counts per million (CPM) for a given condition is computed by "sageseqr::simple_filter()".

filter_genes(
  clean_metadata,
  count_df,
  cpm_threshold,
  conditions_threshold,
  conditions = NULL
)

Arguments

clean_metadata

A data frame with sample identifiers as rownames and variables as factors or numeric as determined by "sageseqr::clean_covariates()".

count_df

A counts data frame with sample identifiers as column names and gene Ids are rownames.

cpm_threshold

The minimum number of CPM allowed.

conditions_threshold

Percentage of samples that should contain the minimum CPM.

conditions

Optional. Conditions to bin gene counts that correspond to variables in `md`.