scHiCPRSiM: single-cell Hi-C Practical Rational SiMulator
ID:20
Submission ID:59 View Protection:ATTENDEE
Updated Time:2024-10-30 22:22:24
Hits:164
Poster Presentation
Abstract
Recent advancements in single-cell Hi-C (scHi-C) techniques have empowered us to explore the three-dimensional genome organization at the individual cell level. Various computational and statistical methods have been developed for scHi-C data analysis. However, benchmarking these methods remains a significant challenge, primarily due to the scarcity of high-quality scHi-C datasets. A solution is to use computational simulators, but existing bulk Hi-C simulators cannot directly apply to single-cell datasets and the two available scHi-C simulation methods are either limited to generating any number of cells with varying sequencing depths or the resolution of scHi-C data.
To address this challenge, we propose scHiCPRSiM, a versatile and robust statistical simulator of scHi-C data. To accommodate various protocols, we have shifted our approach away from replicating the experimental procedure and towards developing a count-based model. scHiCPRSiM captures important characteristics and properties in single-cell Hi-C, such as genomic distance and sparsity, by learning empirical ZINB distributions. As a comprehensive simulation method, scHiCPRSiM also takes into account both inter and intra-chromosomal contacts, as well as genomics distance, since these features are crucial for accurately replicating gene enrichment and depletion in the contact matrix.
To demonstrate the superior performance of scHiCPRSiM, we have compared with existing simulation methods, scHi-CSim and downsampling method well-described in scHiCluster. The comparison was conducted in two approaches: key statistics and embedding performance. Based on two different set of real scHi-C data, we have demonstrated that scHiCPRSiM can generated data more flexibly than existing methods and is not restricted by training set. Also, by obtaining the low-dimension representation of real and/or synthetic scHi-C data, we can see that scHiCPRSiM can better preserve the nature characteristic observed in real data. Additionally, we have illustrated scHiCPRSiM can emulate real scHi-C contact maps under a series of statistical assumptions and the synthetic scHi-C count-based data generated is able to capture the key chromatin structure features. Besides, the synthetic data generated at different resolutions using scHiCPRSiM are not significantly different from each other has illustrated using mouse cell-cycle data to show that scHiCPRSiM is consistent and reliable at different resolutions.
Recently, the advent of scHi-C technologies has changed our understanding of chromatin conformation at the single-cell level. Because of this, many computational analysis methods have been developed to investigate spatial organization of chromatin at single-cell level in order to reveal underlying genetic expression and differentiation. Therefore, a comprehensive benchmarking in downstream computational analysis using data generated by scHiCPRSiM have been performed to emphasize that scHiCPRSiM can serve as an experimental design guideline and help the development of computational analytical methods. Using real scHi-C data published by different scHi-C protocols, we have concluded the precision of clustering increases as the library size increases and converges after a optimal point which helps to balance the clustering results and the laboratory budget as what we expect to see. Furthermore, for main downstream computational analysis, such as batch correction and embedding, we utilized scHiCPRSiM to simulate data in order to evaluate and compare the performance of existing methods employed in these two tasks. In particular, based on our results, we suggest that BandNorm and scHiCluster coupled with Harmony have higher performance in embedding/clustering with batch correction; and for data without batch effects, scHiCluster and scHiCTools can achieve better on embedding and clustering.
Keywords
single cell Hi-C,Simulation,zero-inflated negative binomial
Submission Author
刘卉灵
华南理工大学
MaWenxiu
University of California Riverside
MaRui
University of California Riverside
Comment submit