greatpy.tl.loader

greatpy.tl.loader(test_data, regdom_file, chr_size_file, annotation_file)

Load all datasets needed for the enrichment calculation

Parameters:
test_data : None or str or pd.DataFrame

Genomic set of peaks to be tested

regdom_file : None or str or pd.DataFrame

Regulatory domain of all genes in the genome

chr_size_file : None or str or pd.DataFrame

Table with the size of each chromosome

annotation_file : None or str or pd.DataFrame

Table with the annotation of each gene in the genome

Returns:

  • test_data (pd.DataFrame) – Genomic set of peaks to be tested in the good format

  • regdom (pd.DataFrame) – Regulatory domain of all genes in the genome in the good format

  • size (pd.DataFrame) – Table with the size of each chromosome in the good format

  • ann (pd.DataFrame) – Table with the annotation of each gene in the genome in the good format

Examples

>>> test,regdom,size,ann = loader(
    "../../data/tests/test_data/input/02_srf_hg38.bed",
    "../../data/human/hg38/regulatory_domain.bed",
    "../../data/human/hg38/chr_size.bed",
    "../data/human/ontologies.csv"
    )
>>> test.head()
...    |    | chr   |   chr_start |   chr_end |
...    |---:|:------|------------:|----------:|
...    |  0 | chr1  |     1052028 |   1052049 |
...    |  1 | chr1  |     1065512 |   1065533 |
...    |  2 | chr1  |     1067375 |   1067397 |
...    |  3 | chr1  |     1068083 |   1068119 |
...    |  4 | chr1  |    10520283 |  10520490 |
>>> regdom.head()
...    |    | chr   |   chr_start |   chr_end | name      |   tss | strand   |
...    |---:|:------|------------:|----------:|:----------|------:|:---------|
...    |  0 | chr1  |           0 |     22436 | MIR6859-1 | 17436 | -        |
...    |  1 | chr1  |       16436 |     22436 | MIR6859-2 | 17436 | -        |
...    |  2 | chr1  |       16436 |     22436 | MIR6859-3 | 17436 | -        |
...    |  3 | chr1  |       16436 |     28370 | MIR6859-4 | 17436 | -        |
...    |  4 | chr1  |       22436 |     34370 | WASH7P    | 29370 | -        |
>>> size.head()
...    |    | chrom   |      size |
...    |---:|:--------|----------:|
...    |  0 | chr1    | 248956422 |
...    |  1 | chr2    | 242193529 |
...    |  2 | chr3    | 198295559 |
...    |  3 | chr4    | 190214555 |
...    |  4 | chr5    | 181538259 |
>>> ann.head()
...    |    | id         | name                                                   | symbol        |
...    |---:|:-----------|:-------------------------------------------------------|:--------------|
...    |  0 | GO:0003924 | GTPase activity                                        | DNAJC25-GNG10 |
...    |  1 | GO:0007186 | G protein-coupled receptor signaling pathway           | DNAJC25-GNG10 |
...    |  2 | GO:0003723 | RNA binding                                            | NUDT4B        |
...    |  3 | GO:0005829 | cytosol                                                | NUDT4B        |
...    |  4 | GO:0008486 | diphosphoinositol-polyphosphate diphosphatase activity | NUDT4B        |