GTDB and MinHash - a little revolution
Being a bioinformatician at heart, I can’t help but feel that a little revolution is happening at the confluence of two recent developments in microbial genomics and bioinformatics.
The first is the advent of the GTDB, a phylogenetically consistent bacterial and archaeal taxonomy. Boldy going where we didn’t go before, Parks c.s. throw off the shackles of pre-genomic microbiology in favour of taxonomic assignment based on degrees of genomic similarity.
Update I just pushed gtdb-taxo, a command-line search tool and browser for the GTDB taxonomy, to GitHub.
The second development is the rise of MinHash, a computational approach designed 20 years ago for massive scale document comparison, and introduced to genomics by Ondov c.s. in the Mash tool. Mash and its various descendants (see the recent Dashing paper for an overview) have enabled the efficient many-to-many comparison of large genome collections.
It’s exciting to see the possibilities opened up by these developments, and an absolute joy to work in a field where these little revolutions still happen.