Get GC content, gene Ids, gene symbols, gene biotypes, gene lengths and other metadata from Ensembl BioMart. Object returned contains gene Ids as rownames.

get_biomart(
  count_df,
  synid,
  version,
  host,
  filters,
  organism,
  custom = FALSE,
  gtfID = NULL,
  gtfVersion = NULL,
  fastaID = NULL,
  fastaVersion = NULL,
  cores = NULL,
  isexon = FALSE
)

Arguments

count_df

A counts data frame with sample identifiers as column names and gene Ids are rownames.

synid

A character vector with a Synapse Id.

version

Optional. A numeric vector with the Synapse file version number.

host

An optional character vector specifying the release version. This specification is highly recommended for a reproducible workflow. (see "biomaRt::listEnsemblArchives()")

filters

A character vector listing biomaRt query filters. (For a list of filters see "biomaRt::listFilters()")

organism

A character vector of the organism name. This argument takes partial strings. For example,"hsa" will match "hsapiens_gene_ensembl".

custom

Defaults to FALSE If TRUE, the GC and Gene Length, and gene name are calculated from user specified FASTA and GTF File.

gtfID

Defaults to NULL. A character vector with a Synapse ID corresponding to a gtf formatted gene annotation file.

gtfVersion

Optional. A numeric vector with the gene GTF Synapse file version number.

fastaID

Defaults to NULL. A character vector with a Synapse ID corresponding to a FASTA formatted genome annotation file.

fastaVersion

Optional. A numeric vector with the genome FASTA Synapse file version number.

cores

An integer of cores to specify in the parallel backend (eg. 4).

isexon

Defaults to FALSE. If TRUE, the GC and Gene Length parameters will only consider exotic regions and omit intronic regions.