clusterCS ========= clusterCS identifies functionally coherent chromatin state clusters (CSCs) by integrating two complementary biological perspectives: 1) the genomic spatial distribution patterns of chromatin states (CSs) across gene bodies, and 2) their epigenetic signal compositions. The tool operates on the principle that CSs sharing similar genomic distributions and epigenetic profiles likely perform related biological functions. To implement this, clusterCS employs the Weighted Nearest Neighbor (WNN) algorithm to construct a multimodal similarity space that balances contributions from both feature types. Within this integrated space, graph-based clustering partitions CSs into functionally consistent groups. For methodological details of WNN, see: https://doi.org/10.1016/j.cell.2021.04.048. .. code-block:: sh Usage: clusterCS [options] -e -a -b -H -o [-m 1/2/3/4] or: clusterCS [options] -e -a -H -o -m 1 or: clusterCS [options] -e -b -H -o -m 2 Content ======= .. contents:: :local: Required arguments ^^^^^^^^^^^^^^^^^^ ``-e `` Chromatin state emission file (tab delimited) generated by chromIDEAS or ideasCS that defines each CS by the co-occurrence probabilities of epigenetic signals **[Default: None]**. Example:: State Percentage ATAC H3K27ac ... H3K79me2 H3K9me3 S24 0.12 11.32 4.08 ... 0.41 0.22 S30 0.05 10.80 0.63 ... 0.34 0.27 S3 4.61 0.34 0.22 ... 0.42 0.20 ``-a `` Segment-wise CS occupancy matrix for cell type 1. An R qs format file generated by ``computeCSMat``. **[Default: None]** ``-b `` Segment-wise CS occupancy matrix for cell type 2. An R qs format file generated by ``computeCSMat``. **[Default: None]** ``-H `` Highly Informative Transcripts (HITs) list. An R qs format file generated by ``computeCSMat``. These list define the feature space for clustering. **[Default: None]** ``-o `` Output file path and prefix. Three primary output files will be generated **[Default: None]**:: 1. ..cluster.csv 2. ..clustree.pdf 3. ..CS_Distance.qs - File 1) A CSV file containing cluster membership for each tested resolution. - File 2) Creates a clustering tree plot (generated by the clustree package) showing relationships between clusterings across different resolutions. - File 3) A qs format file containing the distance matrix of all CSs within the WNN space, used for downstream differential CSC gene analysis. Optional arguments ^^^^^^^^^^^^^^^^^^ ``-m `` Analysis mode. Specifies which datasets to cluster. **[Default: 3]** :: 1: Cluster CSs from cell type 1 only (independent analysis). 2: Cluster CSs from cell type 2 only (independent analysis). 3: Joint clustering of CSs from both cell types (merged analysis). 4: Perform all three analyses (modes 1, 2, and 3) simultaneously. ``-r `` Clustering resolution(s). Specifies the graph clustering resolution parameter(s) to test. Accepts a comma-separated list that can include individual values and ranges. Ranges use the "start-end-step" format. All values must be within (0, 5]. Example: "0.9,1.3-1.8-0.1,2" tests 8 resolutions: 0.9, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, and 2. **[Default: "0.1-0.9-0.1,1-1.8-0.2,2-5-1"]** ``-p`` Disable the creation of multi-resolution clustering tree plots. By default, these visualizations are automatically generated. Use this flag to suppress plot output if only the CSV results are needed. **[Default: enabled]** ``-E `` Exclude specified CSs from functional clustering. When users have clear evidence about the functions of certain CSs, they can manually cluster them and exclude these states from chromIDEAS's unsupervised functional clustering analysis. CS are specified directly by their numerical labels, separated by commas (e.g., "1,2,3" will exclude states S1, S2, and S3 from the analysis). **[Default: none]** ``-h`` Show this help message and exit. ``-v`` Show program's version number and exit.