chromIDEAS¶
chromIDEAS is a suite of tools particularly developed for the efficient analysis of chromatin states. You can use chromIDEAS to do s3v2 normalization and chromatin state segmentation all at once, or you can finish these analysis one by one, using the specific subcommands.
Usage: chromIDEAS [options] -m <metadata> -o <output> -n <windows_name> [-b <bin_size>]
or: chromIDEAS [options] -m <metadata> -o <output> -b <bin_size> -s <species> [-n <windows_name>]
or: chromIDEAS [options] -m <metadata> -o <output> -b <bin_size> -g <genomesizes> -n <windows_name> [-B <blackList>]
Content¶
Required arguments¶
-b <bin_size>Bin size (in base pairs). [Default: 200]
-o <output>Output directory path. [Default: ./]
-m <metadata>Path to a metadata file (tab-delimited), containing the Cell, Marker, ID, Exp_bigwig file, and CT_bigwig (optional) file info [Default: None]. Example:
293T H3K4me3 rep1 /PATH/TO/H3K4me3_rep1.bw /PATH/TO/H3K4me3_rep1_CT.bw 293T H3K27ac rep1 /PATH/TO/H3K27ac_rep1.bw ...
-s <species>Supported species: hg38, hg19, or mm10. Selecting this option automatically loads the corresponding genomesizes file and blacklist file. If your species is not listed, manually provide these files via
-g <genome_sizes> -n <windows_name> [-B <blackList>]. [Default: None]-g <genomesizes>Required if
-sis unspecified. Path to a genomesizes file (tab-delimited) listing chromosome lengths [Default: None]. Example:chr1 249250621 ...
-n <windows_name>Required if
-sis unspecified. A unique name to identify the generated window bins for downstream processing. [Default: None]
Optional arguments¶
-cContinue running a partially finished process. Use this option to resume a process interrupted by an error. [Default: false]
-d <id_name>Project name. [Default: chromIDEAS]
-B <blackList>Path to the blacklist file (tab-delimited). If
-sis set to hg38/hg19/mm10, the default blacklist is used [Default: None]. Example:chr1 200 3000 ...
-p <nthreads>Number of parallel processes. [Default: 4]
-zCompress output files in gzip format. [Default: false]
-hShow this help message and exit.
-vShow program’s version number and exit.
Normalization Parameters¶
-l <local_bg_bin>Local background bin number. If set local_bg_bin=5, this means when the target bin is bin10, the surrounding background bins are bin5-9 and bin11-15 [Default: 5]
Chromatin States Assignment Parameters¶
-f <otherpara>This allows you to use .para file from previous run as priors for the Gaussian distribution parameters in the current job (Example:
-f ${out_dir}/${prefix}.para). However, currently we assume that the set of marks and their orders in input are the same between previous and current data. [Default: None]-I <impute>Imputation for missing marker data. You can set it as “None” or “All” or the markers you want to impute. All markers to be imputed are separated by commas (e.g. “H3K4me3,H3K9me3”). [Default: None]
-t <train>The number of random starts used to select the state, which determines the number of times to pre-train the HMM model. The higher the value, the more stable the model is, but at the same time the computation consumes more time. [Default: 100]
-S <trainsz>The bin number used to pre-train model. The higher the value, the more stable the model is, but at the same time the computation consumes more time. CAUTION:
<trainsz>must be less than row number of signal file specified in<metadata>. If<trainsz>exceeds file row count, it will be set to 60% of row count. [Default: 500000]-C <num>Specify number of states at the initialization stage. We recommend setting the initial number of states slightly larger than the number of states you expect or are willing to handle. [Default: 100]
-G <num>Specify the number of states to be inferred. The final number of inferred states may be smaller than the number you specified. 0: let program determine. [Default: 0]
-e <minerr>Specify the minimum standard deviation for the emission Gaussian distribution, usually between (0,1]. [Default: 0.5]
-N <burnin>The number of burnins. Increasing the number will increase computing and only slightly increase accuracy. [Default: 20]
-M <mcmc>The number of steps for maximization. Increasing the number will increase computing and only slightly increase accuracy. [Default: 5]