Cna
genie_registry.cna
¶
Attributes¶
logger = logging.getLogger(__name__)
module-attribute
¶
Classes¶
cna
¶
Bases: FileTypeFormat
Functions¶
validate_no_dup_symbols_after_remapping(cnvDF, skip_database_checks)
¶
Validates that there are no duplicated Hugo_Symbol values after remapping the previous Hugo_Symbol column using the bed database table. See validateSymbol for more details on the remapping method.
| PARAMETER | DESCRIPTION |
|---|---|
skip_database_checks
|
Whether to skip this validation check since it requires access to the internal bed database
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
error message
TYPE:
|
Functions¶
validateSymbol(gene, bedDf, returnMappedDf=True)
¶
Validates the gene symbol against the gene symbol in the bed database. Note that gene symbols in the bed database have gone through processing and have been remapped to allowed actual genes if needed.
Two conditions must be met for the gene to be VALID
-
The gene exists in the bed database table's Hugo_Symbol column
-
The gene exists in the bed database table's ID column. Under this condition, the gene in the cna file will be REMAPPED temporarily to the bed database table's Hugo_Symbol value for the purpose of validation. The ID column is the original Hugo_Symbol column of the bed files before the Hugo_Symbol column gets mapped to valid possible gene values in the Actual Gene Positions (GRCh37) database table. See the bed fileformat module's remap_symbols function and how it gets used in processing for more info on this.
The validation throws a WARNING if the gene doesn't satisfy either of the above two conditions
| PARAMETER | DESCRIPTION |
|---|---|
gene
|
Gene name
TYPE:
|
bedDf
|
The bed database table as a pandas dataframe
TYPE:
|
returnMappedDf
|
Return a mapped gene. Defaults to True
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Union[str, float, bool]
|
Union[str, float, bool]: |
Union[str, float, bool]
|
Returns gene symbol (str if valid, a float("nan") if invalid) if returnMappedDf is True |
Union[str, float, bool]
|
Returns boolean for whether a gene is valid if returnMappedDf is False |
makeCNARow(row, symbols)
¶
Make CNA Row (Deprecated function)
CNA values are no longer stored in the database
| PARAMETER | DESCRIPTION |
|---|---|
row
|
one row in the CNA file
|
symbols
|
list of Gene symbols
|
mergeCNAvalues(x)
¶
Merge CNA values, make sure if there are two rows that are the same gene, the values are merged