Choosing the right algorithm for your sequence similarity searches
In a contest between genetic sequence search algorithms, two of the main contenders would be BLAST® and Smith-Waterman. Both algorithms have proven their value in enabling genetic sequence searches. The BLAST search algorithm has, for a long time, been considered an industry-standard workhorse, faithfully serving the biological and medical research community. The Smith-Waterman algorithm has also been used for quite a while and is sometimes favored by the life sciences intellectual property researchers. So how do these two algorithms compare?
Each algorithm has its own advantages. The popularity of the BLAST search algorithms is helped by its speed and efficiency. The BLAST search results usually become available quickly, without significant delays, after a search query is submitted. Conversely, sequence searches performed using the Smith-Waterman algorithm, while sometimes being frustratingly slow, may produce more comprehensive (more complete) local alignment matches between the submitted query sequences and the existing database sequences, typically representing prior art.
Thus, a researcher who is especially concerned with minimizing a chance of missing important possible matches would want to avail him/herself of the benefits provided by the Smith-Waterman algorithm. This is especially true when it comes to short sequence searches where the accuracy of the Smith-Waterman results may prove to be significantly superior to the accuracy of the BLAST algorithm results. But what about the downside of the Smith-Waterman algorithm, where some queries have been known to take an inordinate amount of time to complete?
Luckily, the advancement of cloud computing, in combination with further improvements to the specialized genetic alignment search technology developed by SequenceBase, allowed our engineering team to reduce the time required to run Smith-Waterman queries to such a degree that they now approach the performance efficiency of BLAST searches. These innovations, optimized for GENESEQ, USGENE and WOGENE databases, can make it unnecessary to choose between accuracy on the one hand, and avoiding delays, previously associated with running excessively long queries, on the other hand.
Thus, the question may no longer be, “Should we use Smith-Waterman OR should we use BLAST?” because it has now become quite feasible to run both algorithms on the same set of sequences. In fact, if you’d like to try running both algorithms, as an experiment, and compare the results, please contact us for a 30-day free trial access. We’d love to hear from you, and any feedback after the trial would be most welcome.
Please feel free to register and get free-of-charge unrestricted access to USGENE® for 30 days!
Smith T F, Waterman M S: Identification of common molecular subsequences. J Mol Biol 1981, 147:195-197.
Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.