Genome-Phaser: protocol for fully phasing whole genome variants using haplotagging
Aim
To establish “haplotagging” at UQ, a new low-cost linked-read sequencing technique that produces phased haplotypes. Through a collaborative effort across three study organisms – cattle, mango and an Australian wildflower, we will develop an end-to-end pipeline which includes 1) high molecular weight DNA extraction, 2) haplotagging of these long DNA molecules and 3) a bioinformatics pipeline to reconstruct haplotypes after short-read sequencing.
Brief project outline
1. Generate high molecular weight DNA for cattle (Bos indicus), mango (Mangifera indica) and an Australian wildflower (Senecio lautus) at a competitive cost for high-throughput studies routinely using hundreds of samples.
2. Establish and optimise the haplotagging method of barcoding and reconstructing haplotypes in a diverse set of taxa – B. indicus, M. indica and S. lautus. We will build haplotagging Illumina libraries that will be sequenced using short-read Illumina sequencing technologies, thus producing the novel tagged short-reads for genomic phase deduction.
3. Develop a bioinformatics pipeline for extracting haplotypes from tagged Illumina short-reads. The pipeline will include de- multiplexing pooled samples based on both individual and DNA molecule barcodes and reconstruction of haplotypes.
Genomics-based innovative aspect of proposal
Directly observing haplotypes is an important step for many genomics approaches. Long read sequencing can be used to directly determine haplotypes but is cost prohibitive for large numbers of samples. Here we will provide an end-to-end solution to the problem of identifying haplotypes by deploying a-ready-to implement molecular and bioinformatics method that is optimised across both plants and animals. Haplotagging both lowers the cost and increases the usability of genomics data, making large scale genomics studies cheaper and statistically more powerful.
Broad applicability of the technique
Retaining phased haplotypes in large populations at a low cost has a wide appeal throughout the genetics community. This is due to the significant advantage of having haplotypes in a diverse set of genetic analyses, including genome-wide association studies (GWAS), genomic prediction, genetic risk score calculations and population structure analyses.