Multi-contact Pore-C
Full title
Multi-contact Pore-C: Telomere-to-telomere genome assembly using ultra-long reads and Pore-C scaffolding.
Aim
This project will establish “Pore-C” at UQ, a comparable crosslinked-read sequencing with powerful multi-contact Chromatin Conformation Capture (3C) sequencing to produce complete phased-genome assemblies. Through a collaborative effort across two study organisms – Cattle and Mango (QAAFI), we will develop an end-to-end pipeline that includes Pore-C DNA extraction protocols, Pore-C scaffolding and a bioinformatics pipeline to reconstruct haplotype-level telomere-to-telomere assemblies using Oxford Nanopore Technologies (ONT) sequencing.
Throughout the project, we will engage with UQ and external collaborators (ONT) to communicate the technology, and identify opportunities to improve its development and efficiency in current and future projects across a range of organisms including, but not limited to, cattle, mangoes, agricultural and aquatic species/corals. The approach is being rapidly deployed in Europe and USA (Ulahannan et al. 2019; Nanoporetech 2020; 2021), but has not been implemented anywhere in Australia to date.
Deploying “Pore-C” will 1) position UQ as a leader in the development and translation of this burgeoning technology and 2) help both UQ and the broader Australian scientific community to remain internationally competitive in a variety of genomic studies. The project will be implemented on the new QAAFI PromethION, with protocols and availability for collaborative use by all UQ researchers for free.
Brief project outline
This project will provide an end-to-end solution to the problem of identifying multiple genomic contact loci by deploying a ready-to-implement molecular and bioinformatics method that is optimised across agricultural species.
Briefly, this project consists of three aims:
- Establish and optimise the Pore-C scaffolding method of crosslinking, reconstructing proximity ligation and reversing cross-linked and DNA purification developed by Nanopore (2020; 2021).
- Develop a bioinformatics pipeline for revealing dynamic interactions in the genome from high-resolution multi-contacts ONT long-read 3C data.
- Test the biological impact of the chromatin structure on gene expression metrics.
Genomics-based innovative aspect of proposal
Directly observing telomere-to-telomere phased-3D genome organization is an important step for many genomics approaches. Previous 3C technologies based on short-read sequencing technologies (i.e. Hi-C) could only capture the interaction between two pairs of loci (or points) to bridge and order contigs during genome assembly and thus lacked the ability to resolve higher-order interactions as well as to generate complete genome assemblies. However, the state-of-the-art long-read sequencing technologies can be used to directly determine multi-way genomic loci (chromatin) but is cost and accuracy prohibitive for large eukaryotic genomes. Here we will provide an end-to-end solution to the problem of identifying multiple genomic loci by deploying a ready-to-implement molecular and bioinformatics method that is optimised across agricultural species. Pore-C scaffolding both lowers the cost and increases the usability of genomics data, making large scale genomics studies cheaper and statistically more powerful.
Broad applicability of the technique
One of the capabilities of the PromethION is Pore-C, which has uses for both genome building and understanding the biological interactions of chromatin regions. This project will build on that capacity to make Pore-C available to all UQ researchers, by providing details on optimisation and analysis of Pore-C data in the UQ setting.
Retaining 3D genome organisation from high-resolution 3C data in large eukaryotic genomes at a low cost has a wide appeal throughout the genetics/genomics community. This is due to the significant advantage of having a comprehensive genome in a diverse set of genetic/genomic analyses, including genome-wide association studies (GWAS), genomic prediction, genetic risk score calculations and population structure analyses. In addition to its extensive use in cattle and Horticulture species, several other groups at UQ have expressed their interest in adopting Pore-C scaffolding immediately if it was to be implemented at UQ. The initial upfront costs of the genomic sequencing is not obtainable for most labs but as only a small number of PromethION flow cell is required per project, it can be shared across many researchers for a smaller target genome.