Full title

Multi-contact Pore-C: Telomere-to-telomere genome assembly using ultra-long reads and Pore-C scaffolding.

Aim

This project will establish “Pore-C” at UQ, a comparable crosslinked-read sequencing with powerful multi-contact Chromatin Conformation Capture (3C) sequencing to produce complete phased-genome assemblies. Through a collaborative effort across two study organisms – Cattle and Mango (QAAFI), we will develop an end-to-end pipeline that includes Pore-C DNA extraction protocols, Pore-C scaffolding and a bioinformatics pipeline to reconstruct haplotype-level telomere-to-telomere assemblies using Oxford Nanopore Technologies (ONT) sequencing.

Throughout the project, we will engage with UQ and external collaborators (ONT) to communicate the technology, and identify opportunities to improve its development and efficiency in current and future projects across a range of organisms including, but not limited to, cattle, mangoes, agricultural and aquatic species/corals. The approach is being rapidly deployed in Europe and USA (Ulahannan et al. 2019; Nanoporetech 2020; 2021), but has not been implemented anywhere in Australia to date.

Deploying “Pore-C” will 1) position UQ as a leader in the development and translation of this burgeoning technology and 2) help both UQ and the broader Australian scientific community to remain internationally competitive in a variety of genomic studies. The project will be implemented on the new QAAFI PromethION, with protocols and availability for collaborative use by all UQ researchers for free.

Brief project outline

This project will provide an end-to-end solution to the problem of identifying multiple genomic contact loci by deploying a ready-to-implement molecular and bioinformatics method that is optimised across agricultural species.

Briefly, this project consists of three aims:

  1. Establish and optimise the Pore-C scaffolding method of crosslinking, reconstructing proximity ligation and reversing cross-linked and DNA purification developed by Nanopore (2020; 2021).
  2. Develop a bioinformatics pipeline for revealing dynamic interactions in the genome from high-resolution multi-contacts ONT long-read 3C data.
  3. Test the biological impact of the chromatin structure on gene expression metrics.

Genomics-based innovative aspect of proposal

Directly observing telomere-to-telomere phased-3D genome organization is an important step for many genomics approaches. Previous 3C technologies based on short-read sequencing technologies (i.e. Hi-C) could only capture the interaction between two pairs of loci (or points) to bridge and order contigs during genome assembly and thus lacked the ability to resolve higher-order interactions as well as to generate complete genome assemblies. However, the state-of-the-art long-read sequencing technologies can be used to directly determine multi-way genomic loci (chromatin) but is cost and accuracy prohibitive for large eukaryotic genomes. Here we will provide an end-to-end solution to the problem of identifying multiple genomic loci by deploying a ready-to-implement molecular and bioinformatics method that is optimised across agricultural species. Pore-C scaffolding both lowers the cost and increases the usability of genomics data, making large scale genomics studies cheaper and statistically more powerful.

Broad applicability of the technique

One of the capabilities of the PromethION is Pore-C, which has uses for both genome building and understanding the biological interactions of chromatin regions. This project will build on that capacity to make Pore-C available to all UQ researchers, by providing details on optimisation and analysis of Pore-C data in the UQ setting.

Retaining 3D genome organisation from high-resolution 3C data in large eukaryotic genomes at a low cost has a wide appeal throughout the genetics/genomics community. This is due to the significant advantage of having a comprehensive genome in a diverse set of genetic/genomic analyses, including genome-wide association studies (GWAS), genomic prediction, genetic risk score calculations and population structure analyses. In addition to its extensive use in cattle and Horticulture species, several other groups at UQ have expressed their interest in adopting Pore-C scaffolding immediately if it was to be implemented at UQ. The initial upfront costs of the genomic sequencing is not obtainable for most labs but as only a small number of PromethION flow cell is required per project, it can be shared across many researchers for a smaller target genome.

Project members

Research collaborators

Dr Hyungtaek Jung

Dr Hyungtaek Jung

Research Fellow
Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation (QAAFI)
Loan Nguyen

Dr Loan Nguyen

Research Fellow
Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation (QAAFI)
Dr Elizabeth Ross

Dr Elizabeth Ross

Research Fellow
Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation
Professor Ben Hayes

Professor Ben Hayes

Centre Director, Animal Science
Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation (QAAFI)
Associate Professor Craig Hardner

Associate Professor Craig Hardner

Principal Research Fellow
Centre for Horticultural Science, Queensland Alliance for Agriculture and Food Innovation
Dr Bradley Campbell

Dr Bradley Campbell

Research Fellow
Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation

Genome Innovation Hub

Dr Jun Ma

Dr Jun Ma

Research Specialist - Biochemistry
Genome Innovation Hub
Dr Subash Rai

Dr Subash Rai

Research Specialist - Long Read Sequencing
Genome Innovation Hub
Valentine Murigneux

Valentine Murigneux

Computational biologist
Genome Innovation Hub
Bioinformatician
QCIF Facility for Advanced Bioinformatics