Predicting lamb futures
As I’ve mentioned over the past few code clubs, I have sheep on my farm. I like raising sheep because they have a pretty quick growing period of about six months, they’re small enough that I can have a decent number of them, they taste good, and in southeastern Michigan there’s a pretty strong Middle Eastern clientele that eat a lot of lamb. Last week, we created a figure showing the number of lambs being sold at my local livestock auction. That’s only part of the equation - lamb numbers could be dropping because prices are falling or with the decline in sheep, they could be getting more valuable. Also, you may have noticed this, but the total number of sheep sold in the figure from last week was the total - I don’t know how many animals were sold in each weight category. Although I’ve been recording the number of sheep sold and the price for each class for the past 5 years, I’ve never really plotted the data. Today we’ll plot the price for each weight class to see what has happened to the prices over the past 5 years. As a reminder, we saw that sheep sales began to fall about 2 or 3 years ago.
For today’s Code Club, we’ll build off of what we did last week using the googlesheets4
package to read that spreadsheet, we’ll remember how to use the rename
function to fix my bad column names, we’ll learn to use the separate
function to split my price ranges into two columns, and we’ll see how we can use the geom_ribbon
function from the ggplot2
package to plot the data. Along the way we’ll also see the mutate
, geom_line
, and geom_smooth
functions again. Finally, we’ll use ggplot2
’s labs
function to give our plots descriptive titles and axis labels. Don’t watch the video straight through without firing up RStudio and trying the code and exercises yourself! Please be sure to see the setup instructions before you get going.
Prompt
Today we’ll use the tidyverse
package, but we’ll also use the googlesheets4
package for reading my googlesheet. If you haven’t already, you’ll need to instal it as described last week. We’ll read in the spreadsheet and go ahead and clean up the column names. Things like Aged sheep
or >131
don’t work very well with R. Of course, I could change my spread sheet, but I’d like to use rename
to practice those skills and in case anyone else is using the spreadsheet I don’t want to break their code.
library(tidyverse)
library(googlesheets4)
auction_data <- read_sheet("https://docs.google.com/spreadsheets/d/1_quMjJRBHDLQSmWQouzzyi1DOejAtCZnAeesdVyRWiQ/edit#gid=1467293328",
sheet="numbers_and_prices",
range="A:J",
col_type = "Ddcccccccc",
na="NA") %>%
rename_all(tolower) %>%
rename("aged_sheep" = "aged sheep",
"feeder_lambs" = "feeder lambs",
"hair_lambs" = "hair lambs",
"new_crop" = "new crop",
"small" = "40-85",
"medium" = "85-105",
"large" = "106-130",
"extra_large" = ">131")
Separating columns
For today’s task, I’d like to take the large lambs (those weighing between 105 and 130 pounds) and see how the high and low prices have varied over the course of the time I’ve been tracking these data. In the spreadsheet I have a price stored as 150-170
. This is the range in prices from the low to high price per 100 lbs of lamb. In other words, for that week, the prices ranged from $1.50 to $1.70 per live pound of lamb. Since each row has the same <min>-<max>
format, it’s relatively straightforward to separate the data into two columns using dplyr
’s separate
function.
The syntax for separate
requires that you give it a column that you want to separate and the column names that you want for the new columns. The separate
function will guess what character you want to separate on, but it’s safest and more descriptive to provide a value to the sep
argument.
auction_data %>%
separate(large, sep="-", into=c("min", "max")) #try running this without the sep argument
You’ll notice a couple of things. First, we no longer have the large
column. We could use remove=FALSE
to keep that column if we wanted to keep it around. Second, my min
and max
columns are character types although they are actually numbers now. We can get separate
to convert these columns to numbers using convert=TRUE
. Let’s include these arguments along with a select
function call to make it easier to see what we’ve done
auction_data %>%
separate(large, sep="-", into=c("min", "max"), remove=FALSE, convert=TRUE) %>%
select(date, large, min, max)
Nice, eh? I don’t need the large
column, so let’s get rid of that from our output and save the output as large_prices
large_prices <- auction_data %>%
separate(large, sep="-", into=c("min", "max"), convert=TRUE) %>%
select(date, min, max)
Plotting the average price
We don’t have more fine scale price data for the weight class to know whether larger lambs had a lower price or what the average price was. The reality is that the true prices are kind of chunky. At the auction they may sell a dozen lambs together for the same price and other times they may sell one or two lambs per lot. Also, just because two lambs are 120 pounds, doesn’t mean that their carcass qualities are going to be the same - so weight isn’t everything. The range is really the best statistic to report. Let’s create a line plot of the midpoint between the max and min price using mutate
and geom_line
. We’ll start with creating the midpoint
large_prices %>%
mutate(mid_point = (min + max) / 2)
And we can plot these data like we did last week with geom_line
large_prices %>%
mutate(mid_point = (min + max) / 2) %>%
ggplot(aes(x=date, y=mid_point)) +
geom_line()
Wow, that’s pretty seasonal! Let’s fit a smoothed line to the data using the span
argument in geom_smooth
large_prices %>%
mutate(mid_point = (min + max) / 2) %>%
ggplot(aes(x=date, y=mid_point)) +
geom_line() +
geom_smooth(span=0.1)
This shows that the prices have fallen with the number of lambs sold. Interestingly, they seem to spike in April and May, but this year the spike fell off in March when the pandemic hit.
Plotting the range of prices
Again, the midpoint of the min and max price, really isn’t the mean or median price. We can use geom_ribbon
to plot the range. As the name suggests, it will plot the data to look like a ribbon. For this geom, we need to give it columns corresponding to ymin
and ymax
large_prices %>%
ggplot(aes(x=date, ymin=min, ymax=max)) +
geom_ribbon()
We can also provide the fill
argument to geom_ribbon
to set the color of the ribbon. If you want to set the color of the boundary line on the ribbon, you would use the color
argument. Normally, color
is set to NA
, which means it has no color.
large_prices %>%
ggplot(aes(x=date, ymin=min, ymax=max)) +
geom_ribbon(fill="dodgerblue", color="black")
Unfortunately, there isn’t an easy way to smooth the ribbon boundary line.
Adding titles and fixing axis labels
I’m going to go ahead and add the theme_light
to this plot to make it a little cleaner
large_prices %>%
ggplot(aes(x=date, ymin=min, ymax=max)) +
geom_ribbon(fill="dodgerblue", color="black") +
theme_light()
I’d also like to add some better labelling to the plot to make it clear what’s going on. For example, the plot doesn’t have a title and the y-axis isn’t labelled. We can fix this with the ggplot2
function, labs
. The labs
function will take arguments to set the main title
and subtitle
, a caption
, and the x
and y
labels. To see where these will go, let’s give the arguments their own name as their value
large_prices %>%
ggplot(aes(x=date, ymin=min, ymax=max)) +
geom_ribbon(fill="dodgerblue", color="black") +
labs(title="main", subtitle="sub", caption="caption", x="x", y="y") +
theme_light()
Let’s fill them in with more meaningful values
large_prices %>%
ggplot(aes(x=date, ymin=min, ymax=max)) +
geom_ribbon(fill="dodgerblue", color="black") +
labs(title="Large lambs have their highest value in April and May",
subtitle="Prices for lambs between 106 and 130 pounds",
caption="Data as reported by United Producers in Manchester, MI",
x="Date",
y="Price ($/100 lbs)") +
theme_light()
I have to admit that although I’ve been recording these data for the past 5 years, I did not realize that prices for these large lambs was so consistently seasonal. This is valuable information! If a lamb’s value peaks in April and May and it takes about 6 months to get a lamb to 110 pounds, then I likely want to have a group of lambs born in October or November. This plot also causes me to ask other questions - are other weight classes this seasonal? How much has the range in prices varied over this period? These are the types of questions we’ll take on now in the exercises.
Exercises
1. “Aged sheep” are those sheep that are older than 1 year old. Typically, they’re ewes that aren’t able to support a lamb or rams that are no longer needed for breeding. These animals are generally used to produce pet food (it’s unlikely that the “lamb” in fido’s kibble is technically lamb). In Southeastern Michigan, there are anecdotes that some buyers like these animals because they are larger, cheaper, and have a flavor that they like. Is there any seasonal variation to the price in aged sheep? Produce a fitted line plot using the mid-point and a ribbon plot showing the range. Be sure to include an informative title and labels.
2. How has the weekly range in price for large lambs varied over the past 5 years?
3. I can sell lambs at 95 pounds or at 125 pounds. If we assume that these lambs would get the midpoint price for their weight class, what is the total difference in price I would receive for each animal? How has this varied over the past 5 years. This difference would tell me the most it should cost me to have an animal gain the extra 30 pounds.