Using base R and testthat to calculate probabilities (CC271)

April 4, 2024 • PD Schloss • 1 min read

Watch and code along with Pat as he uses test driven development and base R to count kmers and calculate probabilities for a naive Bayesian sequence classifier. Pat generates all possible kmers for a sequence and then all sequences in a collection. These kmer counts are then used to generate the word specific priors and genus-specific conditional probabilities that are needed to train the naive Bayesian classifier for 16S rRNA gene sequences. Along the way, Pat continues to use Test Driven Development using the testthat R package and a number of tools from base R. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

Code

You can browse the state of the repository at the end of the episode