vignettes/using-the-dccvalidator-app-amp-ad.Rmd
using-the-dccvalidator-app-amp-ad.Rmd
As the AMP-AD Knowledge Portal has grown to 50+ studies and over 70,000 data files, we’ve realized a need to be more standardized in our approaches to data curation. Thus, we built an application that performs many of the routine data quality checks we previously conducted by hand, with the hopes that it will help you, the data contributor, get your data checked, validated, and shared more easily and quickly.
The dccvalidator tool checks your metadata for, among other things:
To use this application you must:
Some portions of the app submit data to Synapse. This allows curators at Sage to troubleshoot issues if needed; no one outside the Sage curation team will be able to download the data.
This topic has a general overview of the data contribution process and detailed instructions for each step, including uploading documentation, metadata requirements, validating and reviewing the metadata, and uploading the dataset.
syncToSynapse
(see Synapse documentation for uploading data in bulk){target="_blank"}.Each study in AMP-AD has the accompanying documentation in the portal:
Each study should include metadata that would help a new researcher understand and reuse the data. In most cases, we will expect 4 files:
We provide templates for all of the metadata files within the portal: https://www.synapse.org/#!Synapse:syn18512044
You can download these files, fill out the first tab, and save it as a .csv or .tsv file. The other tabs exist to describe the variables and allowed values in the template. If you do not have any data for some of the columns, you can leave them blank (but do not remove the column header).
If you don’t see a template for the assay(s) in your study, or if not all of the metadata types above seem relevant to your study, please get in touch with us at AMPAD_SageAdmin@synapse.org.
The data validation portion of the app allows you to upload metadata files (as .csv) and the manifest (as .tsv or .txt) and view the results of a series of automated checks.
Examples of the types of checks we perform are:
Once data has passed validation, and the AMP-AD data curators permit edit permissions to the staging folder for your study, you will use your newly created manifest file to upload your data and metadata using syncToSynapse
. You can execute syncToSynapse
in the Python client or R client. For uploads with more than 100 files or large file sizes, the Python client or command line client will upload substantially faster than the R client. For getting started with the Synapse programmatic clients, please visit our Synapse docs.