Bed
genie_registry.bed
¶
GENIE bed class and functions
Attributes¶
LOGGER = logging.getLogger(__name__)
module-attribute
¶
Classes¶
bed
¶
Bases: FileTypeFormat
GENIE bed format
Functions¶
create_gene_panel(beddf, seq_assay_id, gene_panel_path, parentid)
¶
Create bed file and gene panel files from the bed file
| PARAMETER | DESCRIPTION |
|---|---|
beddf
|
bed dataframe
|
seq_assay_id
|
GENIE SEQ_ASSAY_ID
|
gene_panel_path
|
Gene panel folder path
|
parentid
|
Synapse id of gene panel folder
|
| RETURNS | DESCRIPTION |
|---|---|
|
pd.DataFrame: configured bed dataframe |
preprocess(newpath)
¶
Standardize and grab seq assay id from the bed file path
| PARAMETER | DESCRIPTION |
|---|---|
newpath
|
bed file path
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
GENIE seq assay id |
process_steps(beddf, newPath, parentId, databaseSynId, seq_assay_id)
¶
Process bed file, update bed database, write bed file to path
| PARAMETER | DESCRIPTION |
|---|---|
beddf
|
input bed data
TYPE:
|
newPath
|
Path to new bed file
TYPE:
|
parentId
|
Synapse id to store gene panel file
TYPE:
|
databaseSynId
|
Synapse id of bed database
TYPE:
|
seq_assay_id
|
GENIE seq assay id
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to new bed file
TYPE:
|
Functions¶
create_gtf(dirname)
¶
Create exon.gtf and gene.gtf from GRCh37 gtf
| PARAMETER | DESCRIPTION |
|---|---|
dirname
|
Directory where these files should live
|
| RETURNS | DESCRIPTION |
|---|---|
exon_gtf_path
|
exon GTF |
gene_gtf_path
|
gene GTF |
_add_feature_type_tobeddf(filepath, featuretype)
¶
Add Feature_Type to dataframe
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
path to bed
|
featuretype
|
exon, intron, or intergenic
|
| RETURNS | DESCRIPTION |
|---|---|
df
|
empty dataframe or dataframe with appended feature type |
add_feature_type(temp_bed_path, exon_gtf_path, gene_gtf_path)
¶
Add Feature_Type to bed file (exon, intron, intergenic)
| PARAMETER | DESCRIPTION |
|---|---|
temp_bed_path
|
BED file without feature type
|
exon_gtf_path
|
exon gtf
|
gene_gtf_path
|
gene gtf
|
| RETURNS | DESCRIPTION |
|---|---|
genie_combined_path
|
Path to final bed file |
_check_region_overlap(row, gene_positiondf)
¶
Check if the submitted bed symbol + region overlaps with the actual gene's positions
| PARAMETER | DESCRIPTION |
|---|---|
row
|
row in bed file (genomic region)
|
gene_positiondf
|
Reference gene position dataframe
|
Return
True if the region does overlap.
_get_max_overlap_index(overlap, bed_length, boundary)
¶
Calculate the ratio of overlap between the submitted bed region and gene position dataframe and return the index of the max overlapping region
| PARAMETER | DESCRIPTION |
|---|---|
overlap
|
Possible overlapping region
|
bed_length
|
Length of submitted region
|
boundary
|
specified ratio overlap
|
| RETURNS | DESCRIPTION |
|---|---|
|
Index of regions with maximum overlap or None |
_map_position_within_boundary(row, positiondf, boundary=0.9)
¶
Map positions and checks if posision is contained within the specified percentage boundary
| PARAMETER | DESCRIPTION |
|---|---|
row
|
Row in bed file (genomic region)
|
positiondf
|
Reference bed position dataframe
|
boundary
|
Percent boundary defined
DEFAULT:
|
Return
pd.Series: mapped position
remap_symbols(row, gene_positiondf)
¶
Remap hugo symbols if there is no overlap between submitted bed region and gene positions.
| PARAMETER | DESCRIPTION |
|---|---|
row
|
start and end position
|
gene_positiondf
|
Actual gene position dataframe
|
Return
bool or Series: if the gene passed in need to be remapped or the remapped gene