chromDCSCG¶
chromDCSCG performs Differential Chromatin State Cluster Gene (DCSCG) analysis based on chromatin state functional clustering results from clusterCS or chromIDEAS_CSC command. The tool identifies genes exhibiting significant chromatin remodeling between two conditions without requiring biological replicates. It extends gene analysis regions to include upstream and downstream regulatory bins, captures CSC transitions, and classifies genes based on dominant transition patterns prioritized by cumulative Euclidean distances and TSS proximity.
Usage: Usage: chromDCSCG [options] -i <input_CS> -c <CS_cluster> -r <region_file> -d <dist_mat> -o <out_prefix>
Content¶
Required arguments¶
-i <input_CS>Chromatin state segmentation file (space delimited) generated by chromIDEAS or ideasCS [Default: None]. Example:
#ID CHR POSst POSed cell1 cell2 1 chr1 792600 792800 1 1 2 chr1 792800 793000 0 0 3 chr1 793000 793200 0 0 4 chr1 793200 793400 0 0
-c <CS_cluster>Chromatin state clustering results. Comma-delimited file from clusterCS/chromIDEAS_CSC mapping each CS to a functional cluster (CSC) [Default: None]. Example:
state,cluster S0,1 S1,1 S2,1 S3,2 ...
-r <region_file>Input genomic regions. Format must be specified with “-f”. [Default: None]
gtf: Differential analysis within gene‑defined regions. Gene classification integrates both cumulative Euclidean distances and TSS proximity to determine dominant transition patterns.
bed: Differential analysis within user‑specified BED regions. Region classification is based solely on cumulative Euclidean distances, without TSS consideration.
-d <dist_mat>Euclidean distance matrix. R qs format file generated by clusterCS/chromIDEAS_CSC, containing distances between all CS pairs within the WNN space. [Default: None]
-o <out_prefix>Output file path and prefix. Two primary output files will be generated [Default: None]:
1. <out_prefix>.DCSCG.csv 2. <out_prefix>.DCSCG_Label.csv - File 1) Detailed bin-level DCSCG analysis results, including CS/CSC assignments, normalized distances, and transition types for each differential bin. - File 2) Final gene classification with dominant CSC transition labels determined by cumulative distance and TSS proximity.
Optional arguments¶
-f <file_type>Format of the region file. Specifies whether the <region_file> is in “gtf” or “bed” format. The standard GTF format is a 9-column, tab-delimited file. The required BED format should contain 5 columns: chrom, chromStart, chromEnd, strand, and regionID. [Default: gtf]
-u <up_bin_num>Number of upstream bins to include. Extends analysis upstream of each gene’s TSS to capture potential regulatory regions. [Default: 3]
-w <down_bin_num>Number of downstream bins to include. Extends analysis downstream of each gene’s TES to capture otential regulatory regions. [Default: 3]
-p <nthreads>Number of parallel processes. [Default: 4]
-hShow this help message and exit.
-vShow program’s version number and exit.
Gene Classification Logic¶
Primary Criterion: The cumulative Euclidean distance sum for each CSC transition type.
TSS‑based Tie‑breaker (requires -g): When multiple transition types share the maximal cumulative distance, priority is given to the type that occurs most frequently in TSS‑associated bins.
Confused Genes: Genes where the tie‑breaker cannot resolve a single winner (i.e., multiple tied types also have identical TSS occurrence counts) are labeled “confused”