# Installation remotes::install_github("Sage-Bionetworks/sageseqr")
The sageseqr
package integrates the targets
R package, the config
package for R, and Synapse. targets
tracks dependency relationships in the workflow and only updates data when it has changed. A config
file allows inputs and parameters to be explicitly defined in one location. Synapse is a data repository that allows sensitive data to be stored and shared responsibly.
The workflow takes RNA-seq gene counts and sample metadata as inputs, normalizes counts by conditional quantile normalization (CQN), removes outliers based on a user-defined threshold, empirically selects meaningful covariates and returns differential expression analysis results. The data is also visualized in several ways to help you understand meaningful trends. The visualizations include a heatmap identifying highly correlated covariates, a sample-specific x and y marker gene check, boxplots visualizing the distribution of continuous variables and a principal component analysis (PCA) to visualize sample distribution.
The series of steps that make up the workflow are called targets. The target objects are stored in a cache and can either be read or loaded into your environment with the targets
functions tar_read
or tar_load
. Source code for each target can be visualized by setting show_source = TRUE
with loadd
and readd
.
Importantly, running clean
will remove the data stored as targets (but, the data is never completely gone!). You may specific targets by name by passing them to the tar_destroy()
function.
The targets are called by the targets
tar_make()
function and are:
Raw data: - import_metadata
- imports the raw metadata directly from synapse - import_counts
- imports the raw counts directly from synapse - biomart_results
- the complete list of genes with biomaRt annotations.
Exploratory data visualizations: - gene_coexpression
- the distribution of correlated gene counts. - boxplots
- the distribution of continuous variables. - sex_plot
- the distribution of samples by x and y marker genes. - sex_plot_pca
- a PCA of sex-specific expression to visualize more dimensionality than sex_plot
. - correlation_plot
- the correlation of covariates. - significant_covariates_plot
- the correlation of covariates to gene expression. - outliers
- the clustering of samples by PCA. - plot_de_volcano
- volcano plot of differentially expressed genes.
Transformed or normalized data: - clean_md
- metadata with factor and numeric types. - filtered_counts
- counts matrix with low gene expression removed. - biotypes
- gene proportions summarized by biotype. - cqn_counts
- CQN normalized counts. - model
- model selected by multivariate forward stepwise regression (evaluated by Bayesian Information Criteria (BIC)). - de
- differential expression results including adjusted p-values and gene list. - report
- output markdown report rendered as HTML.
Anyone can create a Synapse account and access public data in a variety of disciplines: Alzheimer’s Disease Knowledge portal, CommonMind Consoritum.