This package is intended to assist Sage Bionetworks data curators to check the status of metadata and documentation files uploaded via the dccvalidator shiny application. The dccmonitor package contains functions to gather validation information. Optionally, the included Shiny application can be used to view the results.

Requirements

dccmonitor uses the reticulate package with the Synapse Python Client. See the reticulate documentation for more information on setting up reticulate to work with your local Python environment. Additionally, see the Synapse Python Client for installation instructions. The Synapse Python Client should be installed in the same Python environment used by reticulate.

Using the dccmonitor Shiny application requires that the user have a Synapse account, and have permission to access necessary Synapse files. These files include the specific Synapse folder of interest for monitoring (see Customization for details), along with access to all dccvalidator Synapse dependencies (ex: template and annotation files).

Installation and Use

dccmonitor can be installed and used in two ways: package installation, and cloning the repository.

Package Installation

dccmonitor can be installed via devtools:

devtools::install_github("Sage-Bionetworks/dccmonitor")

To run the Shiny application, there needs to be a config.yml file in the working directory with the required options (see Customization). After creating the config.yml file, the application can be run with:

library(dccmonitor)
Sys.setenv(R_CONFIG_ACTIVE = "default") # Replace "default" with the configuration name if not using default.
run_app()

Cloning Repository

Clone the repository to a local directory. Customize the config.yml file, as needed (see Customization).

If your working directory is the application directory, you can change the app.R file to have your configuration name, if not using the default configuration. Then the following will start the app:

renv::restore() # Update project package library
shiny::runApp()

Customization

Customizing the application is done via the config.yml file and the specific file validation checks.

config.yml

A configuration file is required for the application to behave correctly. Create a configuration for the application called config.yml (see this repo’s config.yml for an example) in your working directory if you have installed the package, or alter the config.yml file if you have cloned the repository. If you are not using default as your configuration, set the active configuration as described in the instructions for your choice of installation, package or cloned repository. Alternatively, if you have forked the repository and added a new set of configuration options to the config.yml file, you can open a Pull Request to see about adding the new options.

Note: a default configuration is required in the config.yml file. Additionally, if using a non-default (named) configuration, any settings in default that are not overwritten by the named configuration will be inherited.

Of the configuration options, only two are specific to dccmonitor while the rest are used to customize the dccvalidator checks. The dccmonitor specific configurations are:

  • production: TRUE to use production Synapse endpoints; FALSE to use staging endpoints. Used for testing new versions of the client before they are officially released. See Synapse client documentation for information on endpoints.
  • teams: Synapse administration team ID. The app data should only be accessible to curators with admin privileges.
  • annotations_storage: Synapse folder ID for storing annotation csv files created in the. The folder should only be accessible to curators with admin privileges.
  • consortium_fileview: Synapse file view ID. A Synapse file view that shows all the files in the parent folder (this is the folder that dccvalidator uploads metadata and documentation files to), along with their annotations. The file view should include the columns: id, name, createdOn, createdBy, modifiedOn, currentVersion, study, metadataType, species, assay. Note that the following file-specific annotations are required for dccmonitor to function properly:
    • documentation: study
    • manifest: study, metadataType = manifest
    • individual metadata: study, species, metadataType = individual
    • biospecimen metadata: study, species, metadataType = biospecimen
    • assay metadata: study, species, assay, metadataType = assay
  • annotation_keys: List of annotation keys that should not be included in annotations. These keys would be any that could be considered PHI/PII.

A brief overview of the dccvalidator specific configurations are below, but more information can be found in the dccvalidator documentation:

  • annotations_table: Synapse ID for the annotations master table.
  • annotations_link: Address for the annotations website.
  • templates: List of master templates for each type of metadata or manifest file.
  • species_list: List of species.
  • complete_columns: List of columns that are required to be complete for each type of metadata or manifest file.

Validation Checks

dccmonitor uses the dccvalidator’s check_all() to validate the metadata and manifest files. Currently, there is not a simple way to change the validation checks that are done.