Write your own config file • sageseqr

A configuration file defines the inputs for the RNA-seq workflow. Update the default parameters to include the Synapse ID where your data is stored and to the factor and continuous variables you want to test in the covariate model selection. The full list of configurable options are:
counts:
  synID:          Required. Synapse ID to counts data frame with identifiers to 
                  the metadata as column names and gene ids in a column.
  version:        Optional. Include Synapse file version number (e.g. 3).
  gene id:        Required. Column name that corresponds to the gene ids (e.g. feature).
metadata:
  synID:          Required. Synapse ID to cleaned metadata file with sample 
                  identifiers in a column and variables of interest as column names. 
  version:        Optional. Include Synapse file version number (e.g. 3).
  sample id:      Required. Column name that corresponds to the sample ids (e.g. donorid).
biomart:
  synID:          Optional. If left blank, Ensembl will be queried with the gene 
                  ids provided in the counts. Otherwise, you may provide the
                  Synapse ID to gene metadata from Ensembl. This must include gene
                  length and GC content in order to implement Conditional Quantile
                  Normalization. 
  version:        Optional. Include Synapse file version number (e.g. 3).
  filters:        Required. Column name that corresponds to the gene ids (e.g. 
                  ensembl_gene_id).
  host:           Optional. A character vector specifying the BioMart database release
                  version. This specification is highly recommended for a reproducible           
                  workflow. Defaults to ensembl.org.
  organism:       Required. A character vector of the organism name. This argument 
                  takes partial strings. For example,"hsa" will match "hsapiens_gene_ensembl".
  exon only:      Optional. Set to TRUE if you want gene lengths and GC-content
                  to be calculated only for exons of gene features. Recomended 
                  depending on you experimental design paradigm. Default is FALSE,
                  which considers the entire transcript start to stop (ie. includes
                  intronic regions).
  custom build:   Optional. If you want to bulid the biomart object from a user specified
                  or custom GTF and genome FASTA file specify this value as TRUE.
                  Default is FALSE. This would be reccomended for users analyzing
                  data from a model system with a trans-gene inserted into the 
                  genome.
  gtfID:          Required IF custom build is set to TRUE. Synapse ID to the user
                  specified GTF file to build the biomart object from.
  gtfVersion:     Optional.Include Synapse file version number (e.g. 3).
  fastaID:        Required IF custom build is set to TRUE. Synapse ID to the user
                  specified genome FASTA file to build the biomart object from.
  fastaVersion:   Optional. Include Synapse file version number (e.g. 3).
factors:          Required. List of factor variables in brackets. Variables must be 
                  present in the metadata as column names (e.g. [ "donorid", "source"]).
random_effect:    Optional. List of factor variables (must also be included in `factors`)
                  that are to be treated as random effects in the linear regression model.
                  (eg. ["donorid"])
continuous:       Required. List of continuous variables in brackets. Variables must 
                  be present in the metadata as column names (e.g. [ "rin", "rin2"]).
x_var:            Required. This is your predictor or primary variable of interest. 
                  Additionally, a boxplot will visualize the distribution of 
                  continuous variables using the x_var as a dimension.
conditions:       Optional. Filtering low-expression genes is a common practice to improve
                  sensitivity in detection of differentially expressed genes. Low
                  count genes that have less than a user-defined threshold of counts 
                  per million (CPM) in a user-defined percentage of samples per the
                  conditions provided here will be removed. (e.g. ["diagnosis", "sex"]). 
cpm threshold:    Optional. The minium allowable CPM to keep a gene in the analysis.
percent threshold:Optional. The percentage of samples that should contain the minimum number
                  of CPM. If a condition is passed, the percentage will apply to 
                  the samples in that sub-population. 
sex check:        Optional. The exact variable name that corresponds to reported gender or sex
                  to check the distribution of x and y marker expression across
                  samples.
dimensions:       Required. Specify the PCA dimensions by variable name. 
  color:
  shape:
  size:
skip model:       Optional. If TRUE, the exploratory data report is run. Model selection
                  is not computed.
force null model: Optional. Variables to add to the model aprori eg. sex that users want
                  to account for.
force model with: Optional. Force differential expression with this user defined model
                  instead of the output of stepwise regression.
cores:            Optional. Specify an integer of cores to use with a BiocParallel 
                  parallel backend.  Null value results in the number of available
                  cores minus one being used. Parallel backend ccurrently only supports
                  BiocParallel::SnowParam(). BatchtoolsParam, MulticoreParam, 
                  BiocParallelDoparParam, and SerialParam are not currently supported.
de FC:            The fold-change (FC) of significant differentially expressed 
                  (de) genes must exceed this value. This value will be transformed
                  into log2 FC.
de p-value threshold: The adjusted p-value of significant differentially expressed 
                  (de) genes must exceed this value.
de contrasts:     Required. 
  primary:        Required. Variable(s) in the metadata to define comparisons between 
                  groups. Currently must be either one numeric variable, or one or
                  more catagorical variables.
  is_numeric_int: Optional. Specifies if there is a numeric interaction variable specified.
                  default (FALSE)
  numeric:        Optional. The numeric in variable which interacts with the 
                  primary variable(s). default (NULL)
  contrasts:      Optional. A list specifying contrasts of the primary variable(s) 
                  to consider for differential sequencing results if using factor(s)
                  as your primary variable. If not specified all combinations will
                  be tested. If specified this will speed up the pipeline. Specify 
                  the contrast with the factor values involved in the contrast 
                  seperated by a hyphen. (eg for diagnosis, `contrasts: ["AD-CT"]`
                  where AD is the value in diagnosis column for cases and CT is 
                  the value for controls. For multi-level contrasts, eg. `primary:
                  ["diagnosis", "Sex"] would have contrasts specified as; `contrasts:
                   ["ZZ_F-CT_F", "ZZ_M-CT_M"]` to look at cases vs controls in 
                  females and cases vs controls in males independently. While the
                  order before or after the hyphen doesn't matter, the order of values
                  before/after the underscore does matter. The value order must be
                  the same as the `primary:` specification. 
                  eg. `primary: ["diagnosis","sex"]` must be CT_M 
                  while `primary: ["sex","diagnosis"]` must be M_CT. 
  <any named list:> If there are multiple comparisons, set them up as nested lists.
visualization gene list: Label this list of genes in the volcano plot.
report:           Required. The name of your project. This will become the name of your 
                  output html file.
store output:     Required. Folder Synapse Id to store output on Synapse.