Trying to guess a person's age from their name
Some names seem to be more associated with a time period than others. I can’t tell you how many Sarah’s I know between the ages of 25 and 45. Everyone seems to have a great-aunt Helen or Ruth that grew up in the Great Depression. Are these anecdotes or do the numbers back up these observations? The folks at Five Thirty Eight asked this question a few years ago - based on a person’s name, can we predict their age?.
When I read that article, I was left with a few questions.
- Can I replicate the analysis that was done in the article?
- I like to think that my wife and I selected “old-fashion” names for our kids - Mary, Patrick, Joseph, John, Ruth, Jacob, Peter, Martha (yes, we have a big N). How old-fashion are they, really?
- If I wanted to do a better job of picking names that were unique to the 1910s, what would be good choices?
- Can I generalize their analysis for any person or for their entire household?
I’ve worked with this data previously in a tutorial to teach people how to use GNU Make to automate data analysis pipelines (see the video here). That analysis was customized to my family and was done using so-called “base R”. I’d like to revisit that analysis from the beginning to ask these broader questions.
Over the next few weeks, I’ll be streaming my work on this project on our Riffomonas YouTube channel. Each session will last about an hour and while I’m streaming, you can ask questions or make comments that I’ll do my best to respond to. Even if you can’t follow the stream live, you’ll be able to follow it later on YouTube. I’ll be using git and GitHub to help with version control, so you can follow the code as it develops over on GitHub. Feel free to file an issue or make a pull request!
I still haven’t figured out when I’ll be running these sessions. If you are interested in watching the stream live and have a preference, please let me know in the comments below!comments powered by Disqus