Skip to content

Bed

genie_registry.bed

GENIE bed class and functions

Attributes

LOGGER = logging.getLogger(__name__) module-attribute

Classes

bed

Bases: FileTypeFormat

GENIE bed format

Functions
create_gene_panel(beddf, seq_assay_id, gene_panel_path, parentid)

Create bed file and gene panel files from the bed file

PARAMETER DESCRIPTION
beddf

bed dataframe

seq_assay_id

GENIE SEQ_ASSAY_ID

gene_panel_path

Gene panel folder path

parentid

Synapse id of gene panel folder

RETURNS DESCRIPTION

pd.DataFrame: configured bed dataframe

preprocess(newpath)

Standardize and grab seq assay id from the bed file path

PARAMETER DESCRIPTION
newpath

bed file path

RETURNS DESCRIPTION
dict

GENIE seq assay id

process_steps(beddf, newPath, parentId, databaseSynId, seq_assay_id)

Process bed file, update bed database, write bed file to path

PARAMETER DESCRIPTION
beddf

input bed data

TYPE: DataFrame

newPath

Path to new bed file

TYPE: str

parentId

Synapse id to store gene panel file

TYPE: str

databaseSynId

Synapse id of bed database

TYPE: str

seq_assay_id

GENIE seq assay id

TYPE: str

RETURNS DESCRIPTION
str

Path to new bed file

TYPE: str

Functions

create_gtf(dirname)

Create exon.gtf and gene.gtf from GRCh37 gtf

PARAMETER DESCRIPTION
dirname

Directory where these files should live

RETURNS DESCRIPTION
exon_gtf_path

exon GTF

gene_gtf_path

gene GTF

_add_feature_type_tobeddf(filepath, featuretype)

Add Feature_Type to dataframe

PARAMETER DESCRIPTION
filepath

path to bed

featuretype

exon, intron, or intergenic

RETURNS DESCRIPTION
df

empty dataframe or dataframe with appended feature type

add_feature_type(temp_bed_path, exon_gtf_path, gene_gtf_path)

Add Feature_Type to bed file (exon, intron, intergenic)

PARAMETER DESCRIPTION
temp_bed_path

BED file without feature type

exon_gtf_path

exon gtf

gene_gtf_path

gene gtf

RETURNS DESCRIPTION
genie_combined_path

Path to final bed file

_check_region_overlap(row, gene_positiondf)

Check if the submitted bed symbol + region overlaps with the actual gene's positions

PARAMETER DESCRIPTION
row

row in bed file (genomic region)

gene_positiondf

Reference gene position dataframe

Return

True if the region does overlap.

_get_max_overlap_index(overlap, bed_length, boundary)

Calculate the ratio of overlap between the submitted bed region and gene position dataframe and return the index of the max overlapping region

PARAMETER DESCRIPTION
overlap

Possible overlapping region

bed_length

Length of submitted region

boundary

specified ratio overlap

RETURNS DESCRIPTION

Index of regions with maximum overlap or None

_map_position_within_boundary(row, positiondf, boundary=0.9)

Map positions and checks if posision is contained within the specified percentage boundary

PARAMETER DESCRIPTION
row

Row in bed file (genomic region)

positiondf

Reference bed position dataframe

boundary

Percent boundary defined

DEFAULT: 0.9

Return

pd.Series: mapped position

remap_symbols(row, gene_positiondf)

Remap hugo symbols if there is no overlap between submitted bed region and gene positions.

PARAMETER DESCRIPTION
row

start and end position

gene_positiondf

Actual gene position dataframe

Return

bool or Series: if the gene passed in need to be remapped or the remapped gene