Skip to content

Cna

genie_registry.cna

Attributes

logger = logging.getLogger(__name__) module-attribute

Classes

cna

Bases: FileTypeFormat

Functions
validate_no_dup_symbols_after_remapping(cnvDF, skip_database_checks)

Validates that there are no duplicated Hugo_Symbol values after remapping the previous Hugo_Symbol column using the bed database table. See validateSymbol for more details on the remapping method.

PARAMETER DESCRIPTION
skip_database_checks

Whether to skip this validation check since it requires access to the internal bed database

TYPE: bool

RETURNS DESCRIPTION
str

error message

TYPE: str

Functions

validateSymbol(gene, bedDf, returnMappedDf=True)

Validates the gene symbol against the gene symbol in the bed database. Note that gene symbols in the bed database have gone through processing and have been remapped to allowed actual genes if needed.

Two conditions must be met for the gene to be VALID
  1. The gene exists in the bed database table's Hugo_Symbol column

  2. The gene exists in the bed database table's ID column. Under this condition, the gene in the cna file will be REMAPPED temporarily to the bed database table's Hugo_Symbol value for the purpose of validation. The ID column is the original Hugo_Symbol column of the bed files before the Hugo_Symbol column gets mapped to valid possible gene values in the Actual Gene Positions (GRCh37) database table. See the bed fileformat module's remap_symbols function and how it gets used in processing for more info on this.

The validation throws a WARNING if the gene doesn't satisfy either of the above two conditions

PARAMETER DESCRIPTION
gene

Gene name

TYPE: str

bedDf

The bed database table as a pandas dataframe

TYPE: DataFrame

returnMappedDf

Return a mapped gene. Defaults to True

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Union[str, float, bool]

Union[str, float, bool]:

Union[str, float, bool]

Returns gene symbol (str if valid, a float("nan") if invalid) if returnMappedDf is True

Union[str, float, bool]

Returns boolean for whether a gene is valid if returnMappedDf is False

makeCNARow(row, symbols)

Make CNA Row (Deprecated function)

CNA values are no longer stored in the database

PARAMETER DESCRIPTION
row

one row in the CNA file

symbols

list of Gene symbols

mergeCNAvalues(x)

Merge CNA values, make sure if there are two rows that are the same gene, the values are merged

checkIfOneZero(x)