Base conversion in R to represent DNA sequences in base 4 (CC270)

April 1, 2024 • PD Schloss • 1 min read

Because DNA sequences contain 4 characters, they are often thought of as base 4 strings. Pat will show you how to carry out base conversion in R from base 4 to base 10. He explains the benefits of storing kmers and other DNA fragments in this format in bioinformatics approaches. Of course, he’ll do it all with test driven development using tools from base R including strtoi. This is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

Code

You can browse the state of the repository at the end of the episode