Benchmarking methods for reading text files in R (CC290)

June 17, 2024 • PD Schloss • 1 min read • •

Pat revisits his code for reading in FASTA-formatted DNA sequence files in R. First he takes on how to read in the sequence data. Then he removes a for loop. Finally, he revisits some of the functions from stringi to see if he can make further improvements in the performance of the code. Between all of the changes the function is now 3 times faster than it was before! He shows how to use scan, readLines, readr::read_lines, data.table::fread, and vroom::vroom_lines. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

Code

You can browse the state of the repository at the

beginning of the episode
end of the episode