Scientific Papers

Identifying quantitatively differential chromosomal compartmentalization changes and their biological significance from Hi-C data using DARIC | BMC Genomics


Introduction of the DARIC framework

To quantify the compartmentalization strength for genomic regions, we first devised a metric named Preferential Interaction Score (PIS). PIS is defined as the log-transformed ratio of the average interactions with compartments A to B. Specifically, we first binned the genome at a selected size, usually at 100 kb or 50 kb, applied the PCA analysis, and obtained the compartment type information for each genomic bin. For each bin, the chromatin interactions with other A- and B-type bins in the same chromosome were calculated, and PIS was calculated subsequently (Fig. 1A). A positive/negative PIS indicates that the genomic region preferentially interacts with the active/repressive compartment, respectively. Moreover, a higher PIS represents stronger interaction with the active A compartment. Interestingly, we observed that PIS is highly correlated with the PC1 values (Fig. 1B, Figure S1A). Notwithstanding, we used PIS instead of PC1 values for further quantitatively differential analysis for mainly two reasons: (a) PIS has a clear biological meaning, enabling a straightforward interpretation of the quantitative differences; (b) Direct quantitative comparison of PC1 values for two Hi-C samples is not a legitimate approach, because PC1 values from two separate PCA analyses are not directly comparable.

Fig. 1
figure 1

Introduction of the DARIC framework. A Schematic of the Preferential Interaction Score (PIS) definition. B Snapshots of chromosomes 2 and 6 showing the high correlation between PIS and PC1 values. C Flowchart of the DARIC pipeline for identifying genomic regions with significantly differential compartment changes. D Emission probability of the four-state Hidden Markov Model (HMM) model trained from the differential PIS between H1ESC and K562 cells. Values in the heatmap represent the mean value of PIS differences for each state. Strongly increased/decreased states are denoted as “Strong ± ”. Weakly increased/decreased states are denoted as “Weak ± ”. E Genomic coverage of the four states as shown in D. F Exemplar genome snapshot showing the PIS comparison of H1ESC and K562 and the output of DARIC, including statistical significance track (-log10P-value) and the segmentation of genome into four states

DARIC includes the following four steps (Fig. 1C): (1) Calculation of the genome-wide PIS for the samples; (2) Smoothing of PIS in each sample to remove technical noises (see methods for details); (3) Normalization. Systematic differences, such as choices of restriction enzymes and heterogenous composition of cells in different cell cycle stages, are common confounding factors for Hi-C data comparison. To eliminate systematic biases, we borrowed a concept from MA-norm [28], a method originally developed for ChIP-seq data normalization, to normalize PIS tracks from two Hi-C samples. Similar to ChIP-seq data, we make two assumptions with the usage of MA-norm: (a) most genomic regions, by default defined as those ranked in the 15%-85% percentage in PIS residuals of the two cell-types, do not have significant changes in compartmentalization. This assumption is valid because chromatin conformation data across many tissue- and cell- types have suggested this. (b) the scaling relationship of PIS between two Hi-C samples is reflected in observed PIS differences in most genomic regions, which can be extrapolated to all genomic regions. Using the comparison between H1ESC and K562 as an illustrating example (Figure S1B and C), we showed that MA-norm can substantially eliminate the technical biases while still detecting biological differences between the two cell types; (4) Identifying differential domains and performing statistical analyses. Specifically, we used a Hidden Markov Model (HMM) to segment the genome into four states based on the residuals of two PIS tracks: Strongly decreased (“Strong-”), Weakly decreased (“Weak-”), Weakly increased (“Weak + ”) and Strongly increased (“Strong + ”), which are determined by the mean PIS changes of the four states (i.e. the emission matrix from the HMM model, Fig. 1D). We demonstrated that although the choice of a higher state number would reveal a finer resolution of compartmentalization changes, the regions showing strong PIS changes are largely overlapped across different models (Supplementary Fig. 1D-F). To facilitate the downstream analysis, we chose four as the state number to make it correspond to the four states in the conventional compartment switching analysis, i.e. the stable A or B compartments (“AA” or “BB”) and the switched compartments (“AB” and “BA”). Furthermore, if replicate data are available, DARIC enables the statistical significance analysis of PIS differences by using the variations within the replicates as an empirical background (see methods for details). In the comparison between H1ESC and K562, DARIC reveals that 40% of the genome (18% for decreased and 22% for increased PIS in K562) are identified with significant changes in compartmentalization (Fig. 1E). An example of the input and main outputs of DARIC are summarized in Fig. 1F.

Quantitative PIS differences are associated with concordant changes in transcription and chromatin state

To test whether the quantitatively differential compartmentalization analysis provided by DARIC increases the power for identifying functional relevant changes, we interrogated the transcriptomic and epigenetic data to see if concordant changes are observed in regions with significantly changed PIS. Using the comparison between H1ESC and K562 cells, we evaluated the changes in gene expression from RNA-seq, chromatin accessibility from DNase-seq, and H3K27ac distribution from ChIP-seq. Indeed, consistent changes were observed in all the three modalities, where genomic regions with strongly decreased PIS in K562 cells (i.e. “Strong-”) are associated with significantly lower gene expression, chromatin accessibility, and H3K27ac signals (Fig. 2A, B and C). Opposite trends were also observed in the “Strong + ” state, suggesting that higher PIS correlates with higher transcription and a more active chromatin state. To test whether PIS changes are associated with functionally important genome regions, we performed enrichment analysis of cell-type-specific genes, as well as super-enhancers, in the four types of domains. The rationale for using super-enhancers is that super-enhancers are hallmarks of cell fate and cell identity [38]. As shown in Figure S2A and B, cell-type-specific genes and super-enhancers are most significantly enriched in the genomic regions with strongly changed PIS and moderately enriched in domains with weakly changed PIS. Taken together, our analyses demonstrated that the quantitative compartmentalization changes are closely related to gene regulation and play a critical role in cell identity.

Fig. 2
figure 2

Integration of multi-modal genomics data demonstrates the functional association between gene regulation and differential compartments revealed by DARIC. A-C Gene expression (A), DNase-seq chromatin accessibility (B), and H3K27ac intensity. (C) comparison between H1ESC and K562 cells in the four states. ***, p < 1e-3, Mann–Whitney U test. TPM, Transcript Per Million. RPM, Reads Per Million mapped reads. D-E Illustration and comparison of the lamina B1 DamID signals (D) and TSA-seq signals. E between H1ESC and K562 cells in the PIS strongly changed regions. F Snapshots at the MYB and SOX2 gene loci showing the PIS comparison between H1ESC and K562 cells, as well as the concordant changes in DamID and TSA-seq signals

Many genome-wide techniques have emerged recently to probe the positioning of chromosomal regions with respect to specific subnuclear structures, such as speckles and laminas, and revealed that differential nuclear positioning adds an additional layer of regulation to modulate gene expression [11, 12, 16, 17]. To show if the PIS changes derived from Hi-C data correlate with the differences in nuclear positioning, we integrated the lamina B1 DamID and TSA-seq data in H1ESC and K562 cells from the 4D Nucleosome consortium [13]. Specifically, lamina B1 DamID is a technique to measure the physical interactions between chromosomal regions and the lamina B1 protein in nuclear membrane. Higher DamID signals represent higher interactions between chromatin and lamina [12] (Fig. 2D). TSA-seq estimates the cytological distance between chromosomal regions and nuclear speckles. A higher TSA-seq score means a closer distance to speckles and is usually associated with higher transcriptional activities and more active chromatin states [17] (Fig. 2E). After normalization, our data show that regions with strongly increased PIS (“Strong + ”) in K562 cells have lower interactions with the nuclear lamina and closer distance to nuclear speckles. A consistent trend was also observed for regions with decreased PIS (“Strong-”). Figure 2F summarizes the close relation between PIS changes and the other two techniques by using two cell-type marker genes, MYB and SOX2. MYB is a transcription factor that plays an essential role in hematopoiesis [39] and is highly expressed in K562 cells (Figure S2C), whereas SOX2 is a critical regulator related to pluripotency [40] and highly expressed in H1ESCs (Figure S2C). At the MYB locus, DARIC reveals that there is a significant increase in PIS in K562 cells as compared to H1ESC. In concordance, the interaction between the MYB locus and nuclear lamina is significantly reduced and the distance to nuclear speckles is closer. Trends with consistent changes were also seen for the SOX2 locus. Taken together, our analyses suggest that the quantitative differences in PIS may provide a useful guide for the investigation of nuclear repositioning in development and diseases.

Comparison between DARIC and existing methods

Unlike the conventional compartment switching analysis which focuses on the qualitative differences, DARIC adopts a quantitative approach. To demonstrate DARIC’s improvements over the conventional method, we compared these two approaches in the differential compartment analysis between H1ESC and K562 cells.

To this end, we first performed PCA in H1ESC and K562, each with two replicates. The PC1 values from the two replicates were averaged within each cell type and then compared between the two cell types, which revealed four types of genomic bins: AA and BB (bins without switching compartments), AB, and BA (bins switched compartments). As shown in Fig. 3A, 8.2% of the genome switch from compartment A to B, and 8.9% of the genome is characterized by the opposite switch, much less than the genomic coverage revealed by the quantitative analysis by DARIC (Fig. 1E). Furthermore, we overlapped the PC1-switched genomic bins with the quantitatively differential bins revealed by DARIC. Notably, most of the genomic bins switching from compartment B to A (4125 out of 4999, 83%) are identified by DARIC (Fig. 3B). A comparison of PIS differences reveals that the 874 switching-specific genomic bins (17% of the total “BA” genomic bins) have significantly lower differences than the other types of regions, explaining why they are not identified by DARIC (Fig. 3C). It was also observed that there are more genomic regions that are characterized by significant quantitative PIS increases without switching compartments (hereafter denoted as DARIC-specific regions) (Fig. 3B). To test whether these loci represent functionally relevant changes, we evaluated the TSA-seq, DamID, and gene expression changes among the three types of regions: DARIC-specific, overlapped, and switching-specific, with the random genomic regions used as background. Compared to the random background, DARIC-specific regions have higher TSA-seq signals (Fig. 3D), lower DamID signals (Fig. 3E), and higher gene expressions (Fig. 3F), suggesting that those regions unique to the quantitative analysis are functionally important and thus cannot be overlooked in the differential compartment analysis. Similarly, we performed the analysis between the quantitatively decreased regions (“Strong-”) with genomic bins switched from A to B (i.e. “AB”), and obtained results showing the same trend (Figure S3A-D). Collectively, by comparing DARIC with the prevalent compartment switching analysis, we showed that the conventional switching analysis misses a significant portion of quantitatively differential genomic regions which are closely associated with differential nuclear positioning and gene expression.

Fig. 3
figure 3

Comparison between DARIC and existing methods. A Genomic coverage of the four types of domains in terms of compartment changes. ‘AB’ represents a 50 kb genomic bin in compartment A in H1ESC that switch to compartment B in K562 cells. B Venn diagram representing the overlap between the ‘Strong + ’ state revealed by DARIC and the ‘BA’ state in PC1-based switching analysis. The numbers in the plot represent the numbers of 50 kb bins. C Violin plot showing the PIS differences for the three types of domains defined in (B). D-F Violin plots showing the comparisons of TSA-seq signal changes (D), lamina B1 DamID signal changes (E), and gene expression fold changes (F) in the three types of domains defined in (B). Gray distributions in each figure represent the signal changes for background regions that are randomly selected from the genome.***, p < 1e-3;n.s, not significant (p > 0.05), Mann-Whitney U test. G Venn diagram showing the overlap of genomic bins identified with increased PIS/PC1 values in K562 by DARIC and dcHiC. Numbers of 50 kb bins were shown in the diagram. H Enrichment of K562-specific genes for the three types of genomic regions defined in (G). I An exemplary region showing DARIC and dcHiC output with increased PIS in K562 cells

Next we further compared DARIC with dcHiC [41], a recent method using the quantitative PC1 values to find the differential compartmentalization domains. We applied dcHiC to H1ESC and K562 cells and identified the significantly differential genomic bins at a bin resolution of 50 kb and a p-value cut-off of 0.01 (same as the cutoff used in DARIC). Subsequently, the genomic bins identified with significantly increased PC1 values in K562 were compared with the PIS increased regions (i.e. “Strong + ”) from DARIC. As shown in Fig. 3G, most of the genomic bins identified by dcHiC (83%, 2644 out of 3194 bins) are also identified by DARIC. To check if the genomic regions specific to each method are biologically relevant, we looked at if K562-specific genes are enriched in those regions. A higher enrichment of K562-specific genes would suggest a tighter functional relevancy for the genomic regions. Figure 3H shows that the regions identified by DARIC are enriched with K562-specific genes, suggesting DARIC-specific genomic regions are biologically meaningful. As expected, the regions identified by both methods exhibit the highest enrichment of K562-specific genes. Nevertheless, dcHiC-specific regions did not demonstrate an enrichment of K562-specific genes. The same analyses were also repeated for PIS-decreased regions and similar conclusions were reached (Figure S3E, F). Notably, compared with dcHiC which tend to output discrete and discontinuous genomic bins as results (Fig. 3I and Figure S3G), DARIC can output the whole differential domain and keep the domain continuity nature of chromatin, largely due to the usage of HMM model which considers effects of neighboring chromatin regions. We also compared DARIC with HOMER, a method to find differential compartment domains based on the correlation between interaction profiles of two Hi-C matrices. Similar results were obtained (Figure S3H-K). Altogether, these results strongly suggest that DARIC achieves superior performance in the comparative analyses of compartmentalization.

DARIC is robust to protocol and technical variations of Hi-C

A common challenge in Hi-C data comparison relates to the protocol and technical variations involved in data generation, such as the choice of restriction enzymes and sequencing depth. Indeed, a systematic evaluation of chromosome conformation assays by Oksuz et al. [42] reveals that the usage of different restriction enzymes can affect compartment analysis. As such, it is critical to assess and remove these systematic biases before quantitative and statistical analyses.

To assess the effects of restriction enzymes on differential compartment analysis, we calculated and compared the PIS tracks of Hi-C samples from the same cell type and with three restriction enzymes that are commonly used in Hi-C protocols: HindIII, MboI, and DpnII. While MboI and DpnII produce fragments at kilobase resolution, HindIII cuts the genome at relatively large fragments of several kilobases. Specifically, the three Hi-C datasets generated in GM12878 cells were downloaded from the 4DN consortium. They were down-sampled to the same sequencing depth and PIS tracks were calculated. A comparison of PIS tracks reveals that they are highly correlated, while MboI and DpnII PIS tracks have a slightly higher correlation, likely because the two enzymes recognize and cut the same DNA sequences (Fig. 4A). However, visualization of PIS tracks in the genome browser (Figure S4A, upper panel) or examination of the PIS distribution (left panel of Fig. 4B) reveals that HindIII is distinct from the other two in data scaling, where HindIII dataset displays lower amplitude, suggesting the necessity to remove this systematic difference before further analysis. We used the MA-norm module from DARIC to normalize the MboI and DpnII PIS tracks with respect to HindIII and observed that the distinct differences in data scaling are eliminated, as shown in Fig. 4B (right panel) and Figure S4A (lower panel), suggesting that DARIC can robustly handle Hi-C datasets from different restriction enzymes.

Fig. 4
figure 4

DARIC is robust to technical variations in Hi-C data, such as choices of restriction enzymes and sequencing depth. A Heatmap showing the high correlation between PIS tracks generated from Hi-C data of different restriction enzymes, including HindIII, MboI, and DpnII. The three datasets are from GM12878. B Histograms showing the distributions of PIS calculated from Hi-C datasets with the three restriction enzymes, before and after normalization by DARIC. C Heatmap showing the high correlation of PIS calculated from Hi-C datasets at different sequencing depths. D Histograms showing the similar distributions of PIS from Hi-C data at different sequencing depths

Sequencing depth is another common yet crucial confounding factor in Hi-C data comparison. To assess the impacts of sequencing depth on DARIC, we compared the PIS tracks resulting from Hi-C datasets at various sequencing depths. Specifically, we downloaded a deeply sequenced Hi-C dataset of H1ESC from the 4DN consortium and down-sampled it into eight different depths, ranging from 150 million to 2 billion valid read pairs. PIS tracks were then obtained from DARIC at the resolution of 50 kb. Similarity analysis reveals that all the PIS tracks are highly correlated (Fig. 4C). Furthermore, PIS distributions (Fig. 4D), as well as genome browser snapshots (Figure S4B), show that these Hi-C datasets at different sequencing depths display no obvious differences in data scaling, suggesting that DARIC is not sensitive to the sequencing depth of the Hi-C data. Taken together, our analyses demonstrate that DARIC is robust to protocol and technical variations of the input Hi-C data.

Applying DARIC to time-course Hi-C data during cardiomyocyte differentiation reveals that activated genes involving compartmentalization changes correspond to more specific cellular functions

In order to systematically characterize dynamic compartmentalization landscapes during cell differentiation, we applied DARIC to a time-course Hi-C dataset [33] delineating the human cardiomyocyte differentiation from ESCs at day 0 to primitive cardiomyocytes at day 15, with three intermediate stages (top panel of Fig. 5A). Specifically, PIS tracks were calculated from the Hi-C data at each time point and residual PIS tracks were obtained for neighboring time points. DARIC was then applied to the four residual PIS tracks simultaneously to identify the genomic regions with significant compartmentalization changes. A single HMM model was trained for all the samples to facilitate cross-sample comparison (Figure S5A). Comparing DARIC’s results with PC1-based switching analysis, as expected, we found a much higher percentage of the genome associated with significant changes in compartmentalization (Fig. 5A). For example, during the transition from Day00 to Day02, DARIC identifies ~ 4 times more genomic regions with strongly increased compartmentalization with active compartment A than the conventional PC1-based switching analysis (15% vs. 4%). Furthermore, in the whole differentiation course, 68% of the genome is associated with compartmentalization changes, yet only 26% of the genome has switched compartments (Fig. 5B), suggesting that the plasticity of the compartmentalization landscape was greatly underestimated by the PC1-based switching analysis. Among the genomic regions with significant quantitative changes in compartmentalization, there are genes with critical roles in regulating cell identity, such as HAND2 and SOX2 (Fig. 5C). HAND2 is an essential transcription factor for cardiac morphogenesis [43] and is up-regulated during cardiomyocyte differentiation (Fig. 5D). Temporal PISs show that the HAND2 locus has increasingly higher interactions with the active compartment A and DARIC correctly annotates the locus with “Strong + ” when comparing Day15 with Day00. A similar example was for SOX2, which is a critical factor in pluripotency [44] and is associated with gradually decreasing interactions with the active compartment during cardiomyocyte differentiation (Fig. 5C and E). Collectively, these analyses indicate that the quantitative compartmentalization changes are important for cell identity and DARIC can accurately capture these changes.

Fig. 5
figure 5

Applying DARIC to delineating compartment changes during cardiomyocyte differentiation. A Flow charts and pie charts comparing the genomic coverage of significantly changed regions in compartmentation revealed by DARIC and PC1-based switching analysis. B Stacked bar plot showing the percentage of genomic regions undergoing significant compartment changes during the cardiomyocyte differentiation. C Genome browser snapshots at the HAND2 and SOX2 locus showing compartment changes during differentiation and concordant epigenetic changes. D, E Bar plots showing the concordant gene expression changes for HAND2 (D) and SOX2 (E) during the differentiation. ***, p < 0.001, Wald test from DESeq2. F Pie chart representing the numbers of cardiomyocyte-specific genes which are associated or not associated with significant PIS increases. Cardiomyocyte-specific genes are defined as those significantly up-regulated in Day15 compared to Day00. G Gene and enhancer density comparison for the two gene sets as defined in (F). H An illustration showing the two mechanisms involved in the activation of cardiomyocytespecific genes during differentiation

Out of the 1948 genes that are activated in Day15 comparing to Day00, 825 genes (42%) are associated with significant PIS increases (Fig. 5F). We speculated that there might be differences in the activation mechanisms of these two sets of genes. Indeed, we found that the genes involved in significant PIS increases tend to be less accessible to local functional elements like genes or enhancers because they tend to reside in less gene-dense or enhancer-dense regions (Fig. 5G). We hypothesize that the activity of these genes are more preferentially regulated via large-scale 3D genome reorganization (Fig. 5H). As additional support, we analyzed the corresponding Hi-C data and observed that the chromatin loops associated with the genes involved in PIS increases are significantly longer (Figure S5B). Gene Ontology enrichment analysis suggests that the genes associated with PIS increases are more enriched in cardiomyocyte-specific functions (Figure S5C), consistent with previous studies showing that more specific genes tend to use distant regulatory elements [45, 46]. Taken together, our analyses using DARIC reveal more biological insights about the dynamic compartmentalization during cell differentiation.

Evaluating compartmentalization variability landscape of human genome

A thorough understanding of the compartmentalization variability landscape of the human genome is still lacking, largely due to the limitations in the qualitative switching-based analysis. The compartmentalization quantification and normalization modules in DARIC provide an unbiased approach for this task. To this end, we leveraged the rich Hi-C resources housed in the 4DN consortium and evaluated the variability landscape of compartmentalization of the human genome (Fig. 6A). We first performed a comprehensive query and interrogation of available Hi-C datasets in 4DN, resulting in Hi-C data of 32 cell lines or primary cells without additional treatments. We analyzed the compartmentalization landscape at the resolution of 50 kb. Quality control filtered out 13 Hi-C samples likely due to limited sequencing depth, resulting in Hi-C data for 19 cell types, covering blood cells (B lymphocytes, lymphoblasts), various brain cells (astrocytes of the cerebellum/spinal cord, and microvascular endothelial cells), lung epithelial cells, skin malignant melanoma and others (see detailed information of all the 19 Hi-C datasets in Table S2). We then calculated PIS for each cell type and normalized it with respect to the PIS in H1ESC. The normalized PIS tracks were then used for the variability analysis (Fig. 6B).

Fig. 6
figure 6

Applying DARIC to a compendium of Hi-C datasets across many cell types. A Flowchart showing the pipeline of the analysis. B A snapshot of chromosome 1 showing the PIS tracks across 19 cell lines, the mean and standard deviation of PIS, and segmentation of the genome into five states according to the mean and variability of PIS. C Mean and standard deviation of PIS, as well as the genomic coverage, for the five HMM states. D, E Distributions of TSA-seq (D) and DamID signals (E) for the five types of states. F Heatmap depicting the enrichment of functional genomic elements in the five types of domains. G Heatmap showing the enrichment of tissue-specific genes in the five types of domains. H Stacked bar plot showing the sub-compartment composition for the five types of domains. I PIS variability, defined as the standard deviation of PIS across the 19 cell lines, of the five types of sub-compartments

We used the standard deviation of PIS as the metric to assess the compartmentalization variability. To identify the genomic domains at different variability levels, we trained an HMM model based on the mean and standard deviation of PIS to segment the genome into five types of domains: “conserved A” (con A), “conserved B” (con B), “variable A” (var A), “variable B” (var B), and “variable” (Fig. 6B and C), where “con A” and”con B” represent the domains in constitutive compartment A or B and show low variability, “variable A” and “variable B” represent domains with slightly higher variability but a clear compartmental type preference, and “variable” domains display the highest variability level and show frequent switching between the two compartment types (Fig. 6C). Nuclear positioning data in H1ESCs, such as TSA-seq and lamina B1 DamID data, reveal that these five states occupy distinct spaces in the lamina-to-speckle axis, where “conA” and “con B” regions are the most distant or closest to nuclear lamina respectively, and the other three states exhibit a gradual trend in the intermediate space (Fig. 6D and E). Same trends were observed in other cell types (Figure S6A and B). Furthermore, the five states display gradually differential enrichments in genomic elements (Fig. 6F), where the conA state is the most enriched in CpG-islands, promoters, and UTRs, and the conB state shows the strongest depletion in these states. Elements with a high level of sequence repeating, such as LINEs (Long Interspersed Nuclear Elements), LTRs (Long Terminal Repeats), and simple repeats, tend to show higher enrichment in non-A-type compartments (i.e. variable, varB and conB). To understand if there is a link between compartmentalization variability and gene expression variability, we looked at how cell-type-specific genes distribute in the five states with distinct compartment variability (Fig. 6G). To this end, we accessed the gene expression matrix spanning 30 tissues from GTEx [47] and examined the enrichment of tissue-specific genes in the five states. Specifically, we used Tau index [48], which ranges from 0 (uniformly expressed in all tissues) to 1 (uniquely expressed in one tissue, see methods for details), to measure each gene’s expression specificity. Genes with a Tau index higher than 0.3 were deemed as specific genes in one tissue. Interestingly, we found that tissue-specific genes are highly enriched in the non-A-type state (Fig. 6G), and the variable state exhibits an overall strong enrichment across the panel of tissues, highlighting a strong correlation between compartment variability and expression specificity.

Sub-compartments are finer chromatin structures that can be obtained from Hi-C data with high coverage. Analysis in GM12878 cells [6] revealed that compartment A can be divided into two sub-compartments, A1 and A2, and compartment B into three sub-compartments, B1-3. Sub-compartments exhibit refined associations with many functional features, such as histone modification marks, replication timing, and gene expression [6]. However, a thorough understanding of sub-compartments is still lacking. Our analysis in the compendium of Hi-C datasets enables a better understanding of the compartmentalization variability at the sub-compartment level. Notably, we found that the five states based on compartment variability show a remarkable correspondence to sub-compartments in GM12878 data, where the conA state is mostly composed of the active A1 state and the conB state is mostly made up of inactive B3 states (Fig. 6H). A direct comparison of the compartment variability of the five types of sub-compartments further supports that there is a differential pattern in the variability of the five sub-compartments (Fig. 6I). Similarly, results in other cell types reveal the same observation (Figure S6C and D). Taken together, our analyses strongly suggest that compartment variability corresponds well with the sub-compartments.



Source link