greatpy.pl.get_all_comparison

greatpy.pl.get_all_comparison(results, out_dir='../data/tests/test_data/output/', information_folder='../data/human/', good_gene_associations=True, disp_scatterplot=True, stats=True)

Plot the comparaison between greatpy and Great from some files compute by great.tl.enrichment_multiple.

Parameters:

results : dict

Dictionary of results from great.tl.enrichment_multiple

out_dir : str

Path of the output directory with the results of great webserver.

Default is ../data/tests/test_data/output/

information_folder : str

path of the folder with the information files for the tests.

Default is ../data/human/

The input folder should contains the files :

information_folder/assembly_eg_hg38/regulatory_domain.bed
information_folder/assembly_eg_hg38/chr_size.bed

good_gene_associations : bool

If True, the function return the number of good gene associations

disp_scatterplot : bool

If True, the function display the scatterplot of the comparaison

stats : bool

If True, the function return the statistics of the comparaison

Returns:

pp (pd.DataFrame) – Dataframe of the number of row lost between before preprocessing and after preprocessing
asso (pd.DataFrame) – DataFrame of the number of good gene associations for each file
stats_df (pd.DataFrame) – DataFrame of the statistics of the comparaison for each file

Example

>>> test = [
...    '../data/tests/test_data/input/09_ERF.bed', '../data/tests/test_data/input/10_MAX.bed',
...    '../data/tests/test_data/input/01_random.bed', '../data/tests/test_data/input/04_ultra_hg38.bed',
...    '../data/tests/test_data/input/02_srf_hg38.bed', '../data/tests/test_data/input/08_FOXO3.bed',
...    '../data/tests/test_data/input/06_height_snps_hg38.bed'
...     ]
>>> results = great.tl.enrichment_multiple(
...    tests = t,
...    regdom_file=regdom,
...    chr_size_file=size,
...    annotation_file="../data/human/ontologies.csv",
...    annpath=None,
...    binom=True,
...    hypergeom=True
...    )
>>> pp,asso,stat = get_all_comparison(results)

>>> pp
...    |    | name   |   before_pp_greatpy_size |   before_pp_great_size |   final_size |   %_of_GO_from_great_lost |
...    |---:|:-------|-------------------------:|-----------------------:|-------------:|--------------------------:|
...    |  0 | ERF    |                     6014 |                   2410 |         1833 |                     23.94 |
...    |  1 | MAX    |                     2996 |                   2395 |         1481 |                     38.16 |
...    |  2 | random |                      579 |                    197 |          117 |                     40.61 |
...    |  3 | ultra  |                     3265 |                   2175 |         1393 |                     35.95 |
...    |  4 | srf    |                     4810 |                   2681 |         1854 |                     30.85 |

>>> asso
...    |    | name   |   number_good_gene_asso |   number_genes_asso_lost |   number_gene_asso_excess |
...    |---:|:-------|------------------------:|-------------------------:|--------------------------:|
...    |  0 | ERF    |                    1456 |                        0 |                        36 |
...    |  1 | MAX    |                     428 |                        0 |                         4 |
...    |  2 | random |                      57 |                        0 |                         1 |
...    |  3 | ultra  |                     496 |                        0 |                         2 |
...    |  4 | srf    |                     923 |                        0 |                         7 |

>>> stat
...    |    | name   |   pearson_binom |   pearson_hypergeom |
...    |---:|:-------|----------------:|--------------------:|
...    |  0 | ERF    |        0.57644  |            0.609552 |
...    |  1 | MAX    |        0.601492 |            0.670499 |
...    |  2 | random |        0.240765 |            0.124707 |
...    |  3 | ultra  |        0.52949  |            0.675438 |
...    |  4 | srf    |        0.631909 |            0.597787 |