greatpy.tl.loader
- greatpy.tl.loader(test_data, regdom_file, chr_size_file, annotation_file)
Load all datasets needed for the enrichment calculation
- Parameters:
- test_data : None or str or pd.DataFrame
Genomic set of peaks to be tested
- regdom_file : None or str or pd.DataFrame
Regulatory domain of all genes in the genome
- chr_size_file : None or str or pd.DataFrame
Table with the size of each chromosome
- annotation_file : None or str or pd.DataFrame
Table with the annotation of each gene in the genome
- Returns:
test_data (pd.DataFrame) – Genomic set of peaks to be tested in the good format
regdom (pd.DataFrame) – Regulatory domain of all genes in the genome in the good format
size (pd.DataFrame) – Table with the size of each chromosome in the good format
ann (pd.DataFrame) – Table with the annotation of each gene in the genome in the good format
Examples
>>> test,regdom,size,ann = loader( "../../data/tests/test_data/input/02_srf_hg38.bed", "../../data/human/hg38/regulatory_domain.bed", "../../data/human/hg38/chr_size.bed", "../data/human/ontologies.csv" )
>>> test.head() ... | | chr | chr_start | chr_end | ... |---:|:------|------------:|----------:| ... | 0 | chr1 | 1052028 | 1052049 | ... | 1 | chr1 | 1065512 | 1065533 | ... | 2 | chr1 | 1067375 | 1067397 | ... | 3 | chr1 | 1068083 | 1068119 | ... | 4 | chr1 | 10520283 | 10520490 |
>>> regdom.head() ... | | chr | chr_start | chr_end | name | tss | strand | ... |---:|:------|------------:|----------:|:----------|------:|:---------| ... | 0 | chr1 | 0 | 22436 | MIR6859-1 | 17436 | - | ... | 1 | chr1 | 16436 | 22436 | MIR6859-2 | 17436 | - | ... | 2 | chr1 | 16436 | 22436 | MIR6859-3 | 17436 | - | ... | 3 | chr1 | 16436 | 28370 | MIR6859-4 | 17436 | - | ... | 4 | chr1 | 22436 | 34370 | WASH7P | 29370 | - |
>>> size.head() ... | | chrom | size | ... |---:|:--------|----------:| ... | 0 | chr1 | 248956422 | ... | 1 | chr2 | 242193529 | ... | 2 | chr3 | 198295559 | ... | 3 | chr4 | 190214555 | ... | 4 | chr5 | 181538259 |
>>> ann.head() ... | | id | name | symbol | ... |---:|:-----------|:-------------------------------------------------------|:--------------| ... | 0 | GO:0003924 | GTPase activity | DNAJC25-GNG10 | ... | 1 | GO:0007186 | G protein-coupled receptor signaling pathway | DNAJC25-GNG10 | ... | 2 | GO:0003723 | RNA binding | NUDT4B | ... | 3 | GO:0005829 | cytosol | NUDT4B | ... | 4 | GO:0008486 | diphosphoinositol-polyphosphate diphosphatase activity | NUDT4B |