greatpy.tl.enrichment
- greatpy.tl.enrichment(test_file, regdom_file, chr_size_file, annotation_file, binom=True, hypergeom=True)
Compute the enrichment GO terms for the test genomic region
- Parameters:
- test_file : str or pd.DataFrame
Genomic set of peaks to be tested
- regdom_file : str or pd.DataFrame
Regulatory domain of all genes in the genome
- chr_size_file : str or pd.DataFrame
Table with the size of each chromosome
- annotation_file : str or pd.DataFrame
Table with the annotation of each gene in the genome
- binom : bool (default True)
If True, the binomial test is used.
- hypergeom : bool (default True)
If True, the hypergeometric test is used.
- Returns:
dataframe contains for every GO ID associate with a every associated gene the p-value for the hypergeometric test
- Return type:
pd.DataFrame
Examples
>>> test,regdom,size,ann = loader( "../data/tests/test_data/input/03_srf_hg19.bed", "../data/human/hg19/regulatory_domain.bed", "../data/human/hg19/chr_size.bed", "../data/human/ontologies.csv" ) >>> enrichment = enrichment( test = test, regdom = regdom, chr_size_file = size, ann = ann, binom=True, hypergeom=True ) >>> enrichment.head() ... | | go_term | binom_p_value | binom_fold_enrichment | hypergeom_p_value | hypergeometric_fold_enrichment | intersection_size | recall | ... |:-----------|:----------------------------------------------------------|----------------:|------------------------:|--------------------:|---------------------------------:|--------------------:|---------:| ... | GO:0072749 | cellular response to cytochalasin B | 2.21968e-12 | 227251 | 0.0428032 | 23.3627 | 5 | 5 | ... | GO:0051623 | positive regulation of norepinephrine uptake | 2.21968e-12 | 227251 | 0.0428032 | 23.3627 | 5 | 5 | ... | GO:0098973 | structural constituent of postsynaptic actin cytoskeleton | 2.1174e-10 | 91052.6 | 0.160543 | 5.84068 | 5 | 1.25 | ... | GO:0097433 | dense body | 6.40085e-10 | 16061.8 | 0.00141783 | 11.6814 | 8 | 1.33333 | ... | GO:0032796 | uropod organization | 2.6988e-09 | 54544.9 | 0.00182991 | 23.3627 | 5 | 2.5 |
>>> enrichment = enrichment( test = test, regdom = regdom, ann = ann, asso = get_association(test,regdom), binom=True, hypergeom=False ) ... | | go_term | binom_p_value | binom_fold_enrichment | intersection_size | recall | ... |:-----------|:----------------------------------------------------------|----------------:|------------------------:|--------------------:|---------:| ... | GO:0072749 | cellular response to cytochalasin B | 2.21968e-12 | 227251 | 5 | 5 | ... | GO:0051623 | positive regulation of norepinephrine uptake | 2.21968e-12 | 227251 | 5 | 5 | ... | GO:0098973 | structural constituent of postsynaptic actin cytoskeleton | 2.1174e-10 | 91052.6 | 5 | 1.25 | ... | GO:0097433 | dense body | 6.40085e-10 | 16061.8 | 8 | 1.33333 | ... | GO:0032796 | uropod organization | 2.6988e-09 | 54544.9 | 5 | 2.5 |
>>> enrichment = enrichment( test = test, regdom = regdom, ann = ann, asso = get_association(test,regdom), binom=False, hypergeom=True ) >>> enrichment.head() ... | | go_term | hypergeom_p_value | hypergeometric_fold_enrichment | intersection_size | recall | ... |:-----------|:-------------------------------------------------------------------------------------------|--------------------:|---------------------------------:|--------------------:|----------:| ... | GO:0015629 | actin cytoskeleton | 2.25347e-06 | 2.73071 | 27 | 0.116883 | ... | GO:1903979 | negative regulation of microglial cell activation | 0.000302551 | 17.522 | 3 | 0.75 | ... | GO:1902626 | assembly of large subunit precursor of preribosome | 0.000302551 | 17.522 | 3 | 0.75 | ... | GO:0001077 | proximal promoter DNA-binding transcription activator activity, RNA polymerase II-specific | 0.000504006 | 1.94689 | 29 | 0.0833333 | ... | GO:0000977 | RNA polymerase II regulatory region sequence-specific DNA binding | 0.000511704 | 2.03154 | 26 | 0.0869565 |