greatpy.tl.get_nb_asso_per_region

greatpy.tl.get_nb_asso_per_region(test, regdom)

Determine number of peaks associated with each gene in the regulatory domain.

Parameters:
test : str or pd.DataFrame

path of the file with the tests pics => columns: [“chr”,”chr_start”,”chr_end”]

regdom : str or pd.DataFrame

path of the file with the regulatory domains => columns: [“chr” “chr_start” “chr_end” “name” “tss” “strand”].

Returns:

res – dict with the number of associated genes per genomic region :

  • key = associated gene

  • value = number of peaks associated with the gene

Return type:

dict

Examples

>>> test = pd.DataFrame(
    {
        "chr":["chr1"],
        "chr_start":[1052028],
        "chr_end": [1052049]}
    )
>>> regdom = pd.DataFrame(
    {
        "chr":["chr1","chr1"],
        "chr_start":[1034992,1079306],
        "chr_end": [1115089,1132016],
        "name":["RNF223","C1orf159"],
        "tss":[1074306,1116089],
        "strand":['-','-']
    })
>>> get_association(test,regdom)
...    {'RNF223':2}