Skip to content

extract

genie.extract

This module contains all the functions that extract data from Synapse

Attributes

logger = logging.getLogger(__name__) module-attribute

stdout_handler = logging.StreamHandler(stream=(sys.stdout)) module-attribute

Functions

get_center_input_files(syn, synid, center, process='main', downloadFile=True)

Walks through each center's input directory to get a list of tuples of center files

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

synid

Synapse Id of a folder

TYPE: str

center

GENIE center name

TYPE: str

process

Process type include "main", "mutation". Defaults to "main".

TYPE: str DEFAULT: 'main'

downloadFile

Downloads the file. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
list

List of Synapse entities

TYPE: List[List[Entity]]

_map_name_to_filetype(name)

Maps file name to filetype

PARAMETER DESCRIPTION
name

File name

TYPE: str

RETURNS DESCRIPTION
str

filetype

TYPE: str

get_file_mapping(syn, synid)

Get mapping between Synapse entity name and Synapse ids of all entities in a folder

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

synid

Synapse Id of folder

TYPE: str

RETURNS DESCRIPTION
dict

mapping between Synapse Entity name and Id

TYPE: dict

get_public_to_consortium_synid_mapping(syn, release_synid)

Gets the mapping between potential public release names and the consortium release folder

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

release_synid

Release folder fileview

TYPE: str

RETURNS DESCRIPTION
dict

Mapping between potential public release and consortium release synapse id

TYPE: dict

get_syntabledf(syn, query_string)

Get dataframe from Synapse Table query

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

query_string

Table query

TYPE: str

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: Query results in a dataframe

_get_synid_database_mappingdf(syn, project_id)

Get database to synapse id mapping dataframe

PARAMETER DESCRIPTION
syn

Synapse object

project_id

Synapse Project ID with a 'dbMapping' annotation.

RETURNS DESCRIPTION

database to synapse id mapping dataframe

getDatabaseSynId(syn, tableName, project_id=None, databaseToSynIdMappingDf=None)

Get database synapse id from database to synapse id mapping table

PARAMETER DESCRIPTION
syn

Synapse object

project_id

Synapse Project ID with a database mapping table.

DEFAULT: None

tableName

Name of synapse table

databaseToSynIdMappingDf

Avoid calling rest call to download table if the mapping table is already downloaded

DEFAULT: None

RETURNS DESCRIPTION
str

Synapse id of wanted database

_get_database_mapping_config(syn, synid)

Gets Synapse database to Table mapping in dict

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

synid

Synapse id of database mapping table

TYPE: str

RETURNS DESCRIPTION
dict

{'databasename': 'synid'}

TYPE: dict

get_genie_config(syn, project_id)

Get configurations needed for the GENIE codebase

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

project_id

Synapse project id

TYPE: str

RETURNS DESCRIPTION
dict

GENIE table type/name to Synapse Id

TYPE: dict

Gets oncotree link unless a link is specified by the user

PARAMETER DESCRIPTION
syn

Synapse connection

TYPE: Synapse

genie_config

database name to synid mapping

TYPE: dict

oncotree_link

link to oncotree. Default is None

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
str

oncotree link

TYPE: str