Benchmarking R functions for joining data frames (CC292)

July 1, 2024 • PD Schloss • 1 min read

We often need to join two or more data frames to link different pieces of data together. What’s the most efficient way to do this in R? In this Code Club, Pat shows how to use base R’s merge function as well functions from the dplyr and data.table packages to illustrate how to perform an inner join, full join, left join, and right join. He then benchmarks their performance to see which inner join option is the fastest. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

Code

You can browse the state of the repository at the