greatpy vs Great

For each file, the function returns a scatterplot allowing the comparison between the binomial and hypergeometric p-value of greatpy vs GREAT, the Pearson coefficient for each of these representations.

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import greatpy as great
import pandas as pd
from math import inf
from numpy import log,nan, int64,cov,corrcoef
from scipy.stats import pearsonr
import os
import re

import warnings
warnings.filterwarnings('ignore')

import time

Compute the results from multiple files.

[56]:

t = []

for path in os.listdir("../data/tests/test_data/input/"):
    sp = path.split(".")
    id = sp[0][:2]
    name = sp[0][3:]
    great_out = ""
    great_asso = ""

    for out_path in os.listdir("../data/tests/test_data/output/"):
        if out_path.split("_")[0] == id:
            if re.match(".*hg19.*", out_path) != None: assembly = "hg19"
            else: assembly = "hg38"
            if re.match(".*output.*", out_path) != None: great_out = "../data/tests/test_data/output/" + out_path
            else: great_asso = "../data/tests/test_data/output/" + out_path
    if assembly == "hg38" :
        t.append("../data/tests/test_data/input/" + path)
        regdom = f"../data/human/{assembly}/regulatory_domain.bed"
        size = f"../data/human/{assembly}/chr_size.bed"

[ ]:

results = great.tl.enrichment_multiple(
    tests = t,
    regdom_file=regdom,
    chr_size_file=size,
    annotation_file="../data/human/ontologies.csv",
    annpath=None,
    binom=True,
    hypergeom=True
)

Make the plots

[55]:

pp,asso,stat = great.pl.get_all_comparison(results)

../_images/notebooks_03_great_vs_greatpy_8_0.png

../_images/notebooks_03_great_vs_greatpy_8_1.png

../_images/notebooks_03_great_vs_greatpy_8_2.png

../_images/notebooks_03_great_vs_greatpy_8_3.png

../_images/notebooks_03_great_vs_greatpy_8_4.png

../_images/notebooks_03_great_vs_greatpy_8_5.png

../_images/notebooks_03_great_vs_greatpy_8_6.png

Great vs greatpy correlation

[58]:

pd.options.display.float_format = '{:.2f}'.format

[47]:

stat

[47]:

	name	pearson_binom	pearson_hypergeom
0	ERF	0.58	0.61
1	MAX	0.60	0.67
2	random	0.24	0.12
3	ultra	0.53	0.68
4	srf	0.63	0.60
5	FOXO3	0.44	0.55
6	height	0.49	0.63

The correlation is not very good because we could not use the same ontology file as Great

The one we use is from a 2022 release while GREAT uses a 2012 file