第十一届国际三维基因组学研讨会

[张贴报告]scHiCPRSiM: single-cell Hi-C Practical Rational SiMulator

scHiCPRSiM: single-cell Hi-C Practical Rational SiMulator

编号：20 稿件编号：59 访问权限：仅限参会人更新：2024-10-30 22:22:24 浏览：1643次张贴报告

报告开始：2024年11月01日 15:25 （Asia/Shanghai）

报告时间：15min

所在会议：[S3] 分会场三：三维基因组学技术（新技术、分析工具与AI） » [S3-1] 分会场三：三维基因组学技术（新技术、分析工具与AI）

暂无文件

摘要

Recent advancements in single-cell Hi-C (scHi-C) techniques have empowered us to explore the three-dimensional genome organization at the individual cell level. Various computational and statistical methods have been developed for scHi-C data analysis. However, benchmarking these methods remains a significant challenge, primarily due to the scarcity of high-quality scHi-C datasets. A solution is to use computational simulators, but existing bulk Hi-C simulators cannot directly apply to single-cell datasets and the two available scHi-C simulation methods are either limited to generating any number of cells with varying sequencing depths or the resolution of scHi-C data.

To address this challenge, we propose scHiCPRSiM, a versatile and robust statistical simulator of scHi-C data. To accommodate various protocols, we have shifted our approach away from replicating the experimental procedure and towards developing a count-based model. scHiCPRSiM captures important characteristics and properties in single-cell Hi-C, such as genomic distance and sparsity, by learning empirical ZINB distributions. As a comprehensive simulation method, scHiCPRSiM also takes into account both inter and intra-chromosomal contacts, as well as genomics distance, since these features are crucial for accurately replicating gene enrichment and depletion in the contact matrix.

To demonstrate the superior performance of scHiCPRSiM, we have compared with existing simulation methods, scHi-CSim and downsampling method well-described in scHiCluster. The comparison was conducted in two approaches: key statistics and embedding performance. Based on two different set of real scHi-C data, we have demonstrated that scHiCPRSiM can generated data more flexibly than existing methods and is not restricted by training set. Also, by obtaining the low-dimension representation of real and/or synthetic scHi-C data, we can see that scHiCPRSiM can better preserve the nature characteristic observed in real data. Additionally, we have illustrated scHiCPRSiM can emulate real scHi-C contact maps under a series of statistical assumptions and the synthetic scHi-C count-based data generated is able to capture the key chromatin structure features. Besides, the synthetic data generated at different resolutions using scHiCPRSiM are not significantly different from each other has illustrated using mouse cell-cycle data to show that scHiCPRSiM is consistent and reliable at different resolutions.

Recently, the advent of scHi-C technologies has changed our understanding of chromatin conformation at the single-cell level. Because of this, many computational analysis methods have been developed to investigate spatial organization of chromatin at single-cell level in order to reveal underlying genetic expression and differentiation. Therefore, a comprehensive benchmarking in downstream computational analysis using data generated by scHiCPRSiM have been performed to emphasize that scHiCPRSiM can serve as an experimental design guideline and help the development of computational analytical methods. Using real scHi-C data published by different scHi-C protocols, we have concluded the precision of clustering increases as the library size increases and converges after a optimal point which helps to balance the clustering results and the laboratory budget as what we expect to see. Furthermore, for main downstream computational analysis, such as batch correction and embedding, we utilized scHiCPRSiM to simulate data in order to evaluate and compare the performance of existing methods employed in these two tasks. In particular, based on our results, we suggest that BandNorm and scHiCluster coupled with Harmony have higher performance in embedding/clustering with batch correction; and for data without batch effects, scHiCluster and scHiCTools can achieve better on embedding and clustering.

关键字

single cell Hi-C,Simulation,zero-inflated negative binomial

报告人

刘卉灵 (Huiling Liu)

华南理工大学

稿件作者

刘卉灵华南理工大学

MaWenxiu University of California Riverside

MaRui University of California Riverside

发表评论

全部评论

注册参会提交稿件