High-throughput sequencing of protein variant libraries

Full Project Title

Building capacity to sample and explain protein sequence fitness landscapes: a case study of re-starting the evolution of antibiotic degradation in ribonucleases

Project Summary

Conventionally, protein fitness can be explored and optimised by experimentally varying residues and successively screening and selecting for desired function. Sequencing is not required and is typically determined subsequently—meaning experiments are not guided by knowledge of specific mutations. To understand determinants of fitness (e.g., stability, activity, and function), many variants must be sampled post-hoc and across levels of fitness—a laborious and costly process.

The ‘every variant sequencing’ (evSeq) protocol resolves these issues by drastically increasing the capacity to sequence protein variants systematically paired to fitness data, allowing this knowledge to guide the experimental process and support protein engineering at unprecedented scales. This project sets out to establish and evaluate the evSeq protocol, which is designed to be extremely cost-effective and simple to perform for researchers who already employ directed evolution.

Potential Outcomes

Firstly, this project will establish at UQ the capacity to perform evSeq, a new protocol for utilising amplicon sequencing that drastically reduces the cost of sequencing protein variants by generating data on a multiplexed next-generation sequencing (NGS) run at a cost of cents per variant. The knowledge gained and workflows established during the course of this project will make this type of experimentation accessible to all UQ researchers and further cement the benefits of using genomic technology to perform complementary research such as protein engineering.

Secondly, this project will develop machine learning methodology to address fundamental challenges in protein engineering and evolutionary studies. Such challenges include how best to navigate highly complex combinatorial spaces, how various scales of known data influence predictions, and how alternative paths to gain function emerge.