greatpy.tl.enrichment

greatpy.tl.enrichment(test_file, regdom_file, chr_size_file, annotation_file, binom=True, hypergeom=True)

Compute the enrichment GO terms for the test genomic region

Parameters:
test_file : str or pd.DataFrame

Genomic set of peaks to be tested

regdom_file : str or pd.DataFrame

Regulatory domain of all genes in the genome

chr_size_file : str or pd.DataFrame

Table with the size of each chromosome

annotation_file : str or pd.DataFrame

Table with the annotation of each gene in the genome

binom : bool (default True)

If True, the binomial test is used.

hypergeom : bool (default True)

If True, the hypergeometric test is used.

Returns:

dataframe contains for every GO ID associate with a every associated gene the p-value for the hypergeometric test

Return type:

pd.DataFrame

Examples

>>> test,regdom,size,ann = loader(
    "../data/tests/test_data/input/03_srf_hg19.bed",
    "../data/human/hg19/regulatory_domain.bed",
    "../data/human/hg19/chr_size.bed",
    "../data/human/ontologies.csv"
    )
>>> enrichment = enrichment(
    test = test,
    regdom = regdom,
    chr_size_file = size,
    ann = ann,
    binom=True,
    hypergeom=True
    )
>>> enrichment.head()
...    |            | go_term                                                   |   binom_p_value |   binom_fold_enrichment |   hypergeom_p_value |   hypergeometric_fold_enrichment |   intersection_size |   recall |
...    |:-----------|:----------------------------------------------------------|----------------:|------------------------:|--------------------:|---------------------------------:|--------------------:|---------:|
...    | GO:0072749 | cellular response to cytochalasin B                       |     2.21968e-12 |                227251   |          0.0428032  |                         23.3627  |                   5 |  5       |
...    | GO:0051623 | positive regulation of norepinephrine uptake              |     2.21968e-12 |                227251   |          0.0428032  |                         23.3627  |                   5 |  5       |
...    | GO:0098973 | structural constituent of postsynaptic actin cytoskeleton |     2.1174e-10  |                 91052.6 |          0.160543   |                          5.84068 |                   5 |  1.25    |
...    | GO:0097433 | dense body                                                |     6.40085e-10 |                 16061.8 |          0.00141783 |                         11.6814  |                   8 |  1.33333 |
...    | GO:0032796 | uropod organization                                       |     2.6988e-09  |                 54544.9 |          0.00182991 |                         23.3627  |                   5 |  2.5     |
>>> enrichment = enrichment(
    test = test,
    regdom = regdom,
    ann = ann,
    asso = get_association(test,regdom),
    binom=True,
    hypergeom=False
    )
...    |            | go_term                                                   |   binom_p_value |   binom_fold_enrichment |   intersection_size |   recall |
...    |:-----------|:----------------------------------------------------------|----------------:|------------------------:|--------------------:|---------:|
...    | GO:0072749 | cellular response to cytochalasin B                       |     2.21968e-12 |                227251   |                   5 |  5       |
...    | GO:0051623 | positive regulation of norepinephrine uptake              |     2.21968e-12 |                227251   |                   5 |  5       |
...    | GO:0098973 | structural constituent of postsynaptic actin cytoskeleton |     2.1174e-10  |                 91052.6 |                   5 |  1.25    |
...    | GO:0097433 | dense body                                                |     6.40085e-10 |                 16061.8 |                   8 |  1.33333 |
...    | GO:0032796 | uropod organization                                       |     2.6988e-09  |                 54544.9 |                   5 |  2.5     |
>>> enrichment = enrichment(
    test = test,
    regdom = regdom,
    ann = ann,
    asso = get_association(test,regdom),
    binom=False,
    hypergeom=True
    )
>>> enrichment.head()
...    |            | go_term                                                                                    |   hypergeom_p_value |   hypergeometric_fold_enrichment |   intersection_size |    recall |
...    |:-----------|:-------------------------------------------------------------------------------------------|--------------------:|---------------------------------:|--------------------:|----------:|
...    | GO:0015629 | actin cytoskeleton                                                                         |         2.25347e-06 |                          2.73071 |                  27 | 0.116883  |
...    | GO:1903979 | negative regulation of microglial cell activation                                          |         0.000302551 |                         17.522   |                   3 | 0.75      |
...    | GO:1902626 | assembly of large subunit precursor of preribosome                                         |         0.000302551 |                         17.522   |                   3 | 0.75      |
...    | GO:0001077 | proximal promoter DNA-binding transcription activator activity, RNA polymerase II-specific |         0.000504006 |                          1.94689 |                  29 | 0.0833333 |
...    | GO:0000977 | RNA polymerase II regulatory region sequence-specific DNA binding                          |         0.000511704 |                          2.03154 |                  26 | 0.0869565 |