Pre-print: stLearn

6 Aug 2020

Full title

stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues

Duy Pham, Xiao Tan, Jun Xu, Laura F. Grice, Pui Yeng Lam, Arti Raghubar, Jana Vukovic, Marc J. Ruitenberg, Quan Nguyen


Spatial Transcriptomics is an emerging technology that adds spatial dimensionality and tissue morphology to the genome-wide transcriptional profile of cells in an undissociated tissue. Integrating these three types of data creates a vast potential for deciphering novel biology of cell types in their native morphological context. Here we developed innovative integrative analysis approaches to utilise all three data types to first find cell types, then reconstruct cell type evolution within a tissue, and search for tissue regions with high cell-to-cell interactions. First, for normalisation of gene expression, we compute a distance measure using morphological similarity and neighbourhood smoothing. The normalised data is then used to find clusters that represent transcriptional profiles of specific cell types and cellular phenotypes. Clusters are further sub-clustered if cells are spatially separated. Analysing anatomical regions in three mouse brain sections and 12 human brain datasets, we found the spatial clustering method more accurate and sensitive than other methods. Second, we introduce a method to calculate transcriptional states by pseudo-space-time (PST) distance. PST distance is a function of physical distance (spatial distance) and gene expression distance (pseudotime distance) to estimate the pairwise similarity between transcriptional profiles among cells within a tissue. We reconstruct spatial transition gradients within and between cell types that are connected locally within a cluster, or globally between clusters, by a directed minimum spanning tree optimisation approach for PST distance. The PST algorithm could model spatial transition from non-invasive to invasive cells within a breast cancer dataset. Third, we utilise spatial information and gene expression profiles to identify locations in the tissue where there is both high ligand-receptor interaction activity and diverse cell type co-localisation. These tissue locations are predicted to be hotspots where cell-cell interactions are more likely to occur. We detected tissue regions and ligand-receptor pairs significantly enriched compared to background distribution across a breast cancer tissue. Together, these three algorithms, implemented in a comprehensive Python software stLearn, allow for the elucidation of biological processes within healthy and diseased tissues.

Availability and implementation

The stLearn package is open source and it is available at

GIH contribution

Jun Xu
Jun Xu

As one of the sucessful outcomes from the GIH funded collaborative project "Spatial genomics technologies to study cancer and genetic diseases in tissue contexts", GIH team member Jun Xu helped to conceive the experiments, develop the algorithms, write the software, conduct the experiments and analyse the data. He also helped to write the manuscript.

Dr Sohye Yoon
Dr Sohye Yoon

GIH team member Dr Sohye Yoon conducted the RNAscope Hiplex assays. Sohye also contributed to drafting the manuscript.