greatpy.tl.create_regdom

greatpy.tl.create_regdom(tss_file, chr_sizes_file, association_rule, max_extension=1000000, basal_upstream=5000, basal_downstream=1000, out_path=None)

Create regdoms according to the three association rules, to write the result in a file or not and to return the result as a pd.DataFrame

Parameters:
tss_file : str

The path of the TSS file.

chr_sizes_file : str

The path of the chromosome size file.

association_rule : str

The association rule to use. Could be : “one_closet”, “two_closet”, “basal_plus_extention”.

Documentation aviable at https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655443/Association+Rules.

maximumExtension : int

The maximum extension of the regulatory domain.

Default is 100000

basalUp : int

The basal upstream of the regulatory domain.

Default is 5000

basalDown : int

The basal downstream of the regulatory domain.

Default is 1000

out_path : str or NoneType

The path of the output file.

If None, the result is only returned as a pd.DataFrame.

Default is None

Returns:

out – The regulatory domains.

Return type:

pd.DataFrame

Examples

>>> regdom = create_regdom(
    tss_file="../../data/human/tss.bed",
    chr_sizes_file="../../data/human/chr_size.bed",
    sep="   ",
    names=["chr","size"],
    association_rule="one_closet"
    )
>>> regdom.head()
...    |    | chr   |   chr_start |   chr_end | name      |   tss | strand   |
...    |---:|:------|------------:|----------:|:----------|------:|:---------|
...    |  0 | chr1  |           0 |     17436 | MIR6859-1 | 17436 | -        |
...    |  1 | chr1  |       17436 |     17436 | MIR6859-2 | 17436 | -        |
...    |  2 | chr1  |       17436 |     17436 | MIR6859-3 | 17436 | -        |
...    |  3 | chr1  |       17436 |     23403 | MIR6859-4 | 17436 | -        |
...    |  4 | chr1  |       23403 |     29867 | WASH7P    | 29370 | -        |