Skip to content

Database to Staging

bin.database_to_staging

Workflow to trigger a consortium release

flowchart TD
    A["Start consortium release"] --> B["Remove old files in GENIE release dir"]
    %% B --> C["Login to Synapse"]
    %% C --> D{"Test / Staging / Prod Environment?"}
    %% D -->|Test| E["Set test databaseSynIdMappingId"]
    %% D -->|Staging| F["Set staging databaseSynIdMappingId, skip mutations in cis"]
    %% D -->|Production| G["Set production databaseSynIdMappingId"]
    B --> H["Prepare for processing by getting databaseSynIdMapping table, oncotree link, cbioportal folder, database synapse ids, etc"]
    %% H --> I{"Oncotree link provided?"}
    %% I -->|No| J["Extract Oncotree URL from database"]
    %% I -->|Yes| K["Use provided Oncotree URL"]
    %% J & K --> L["Check Oncotree URL accessibility"]
    %% L --> M["Validate cBioPortal path exists"]
    %% M --> N["Get Synapse IDs for consortium, process tracker, etc."]
    %% N --> O["Create or retrieve case_lists folder"]
    %% O --> P{"Staging?"}
    %% P -- No --> Q["Start process tracking"]
    H --> R["Query for all the centers for which data should be released"]

    %% Expanded stagingToCbio logic
    %% R --> S2["Create GENIE release directory if missing"]
    %% S1 --> S2["Extract Synapse Table IDs (patient, sample, maf, bed, seg, etc.)"]
    R --> S3["Create snapshots & pull patient, sample, bed data"]
    S3 --> S4["Merge patient + sample tables into clinicalDf"]
    S4 --> S5["Run GENIE filters"]
    S5 --> SD1["Variant Filters"]
    SD1 --> SD2["Germline filter"]
    SD1 --> SD3["MAF in BED"]
    SD2 & SD3 --> SD5["List of variants to remove"]
    S5 --> SE1["Sample Filters"]
    SE1 --> SE2["SEQ_DATE"]
    SE1 --> SE3["No Bed file"]
    SE1 --> SE4["Oncotree"]
    SE1 --> SE5["Mutation In Cis"]
    SE2 & SE3 & SE4 & SE5 --> SE6["List of samples to remove"]
    SE6 --> SS3["Merge/Filter/store clinical file"]
    %% SS3 --> S6["Merge/Filter/store clinical file"]
    SD5 & SS3 --> S7["Merge/Filter/store MAF file"]
    SS3 --> S8["Merge/Filter/store CNA file"]
    SS3 --> S9["Merge/Filter/store Assay information file"]
    SS3 --> S10["Merge/Filter/store SV file"]
    S8 & S7 & S9 & S10 --> S11["Merge/Filter/store Data Gene Matrix"]
    S11 --> S12["Download and upload gene panel files"]
    SS3 --> S13["Merge/Filter/store SEG file"]
    SS3 --> S14["Merge/Filter/store BED files"]
    %% SS3 --> S15["Return list of gene panel entities"]

    SS3 & S12 & S13 & S14 --> T["Remove old case list files"]
    T --> U["Generate new case lists and upload to Synapse"]
    U --> V["Revise metadata files with correct GENIE version"]
    V --> W["Run cBioPortal validation script"]
    %% W --> X{"Production?"}
    %% X -->|Yes| Y["Upload validation logs to Synapse"]
    W --> Z["Create release folder with links to files"]
    %% Z --> AA{"Production?"}
    %% AA -->|Yes| AB["End process tracking"]
    Z --> AC["Run dashboard updater"]
    AC --> AD["Generate dashboard HTML"]
    AD --> AF["End"]

Attributes

logger = logging.getLogger(__name__) module-attribute

PWD = os.path.dirname(os.path.abspath(__file__)) module-attribute

parser = argparse.ArgumentParser(description='Release GENIE consortium files') module-attribute

test_group = parser.add_mutually_exclusive_group() module-attribute

args = parser.parse_args() module-attribute

Functions

generate_dashboard_html(genie_version, staging=False, testing=False)

Generates dashboard html writeout that gets uploaded to the release folder

PARAMETER DESCRIPTION
genie_version

GENIE release

TYPE: str

staging

Use staging files. Default is False

TYPE: bool DEFAULT: False

testing

Use testing files. Default is False

TYPE: bool DEFAULT: False

generate_data_guide(genie_version, oncotree_version=None, database_mapping=None)

Generates the GENIE data guide

main(genie_version, processing_date, cbioportal_path, oncotree_link=None, consortium_release_cutoff=184, test=False, staging=False, debug=False, skip_mutationsincis=False)

  • Does parameter checks
  • Updates process tracking start
  • initiates database to staging
  • create case lists
  • revise meta files
  • run cBioPortal validation
  • create link versions
  • update process tracking end
  • Create dashboard tables and plots
PARAMETER DESCRIPTION
genie_version

GENIE version,

processing_date

processing date

cbioportal_path

Path to cbioportal validator

oncotree_link

Link to oncotree codes

DEFAULT: None

consortium_release_cutoff

release cut off value in days

DEFAULT: 184

test

Test flag, uses test databases

DEFAULT: False

staging

Staging flag, uses staging databases

DEFAULT: False

debug

Synapse debug flag

DEFAULT: False

skip_mutationsincis

Skip mutation in cis filter

DEFAULT: False