Get GC content, gene Ids, gene symbols, gene biotypes, gene lengths and other metadata from Ensembl BioMart. Object returned contains gene Ids as rownames.
get_biomart(
count_df,
synid,
version,
host,
filters,
organism,
custom = FALSE,
gtfID = NULL,
gtfVersion = NULL,
fastaID = NULL,
fastaVersion = NULL,
cores = NULL,
isexon = FALSE
)
A counts data frame with sample identifiers as column names and gene Ids are rownames.
A character vector with a Synapse Id.
Optional. A numeric vector with the Synapse file version number.
An optional character vector specifying the release version.
This specification is highly recommended for a reproducible workflow.
(see "biomaRt::listEnsemblArchives()"
)
A character vector listing biomaRt query filters.
(For a list of filters see "biomaRt::listFilters()"
)
A character vector of the organism name. This argument takes partial strings. For example,"hsa" will match "hsapiens_gene_ensembl".
Defaults to FALSE If TRUE, the GC and Gene Length, and gene name are calculated from user specified FASTA and GTF File.
Defaults to NULL. A character vector with a Synapse ID corresponding to a gtf formatted gene annotation file.
Optional. A numeric vector with the gene GTF Synapse file version number.
Defaults to NULL. A character vector with a Synapse ID corresponding to a FASTA formatted genome annotation file.
Optional. A numeric vector with the genome FASTA Synapse file version number.
An integer of cores to specify in the parallel backend (eg. 4).
Defaults to FALSE. If TRUE, the GC and Gene Length parameters will only consider exotic regions and omit intronic regions.