Recent news of a potential new arrival to my extended family caused me to think about choosing names for newborns. What names are popular? Are there patterns and trends over space and time? And given a set of criteria, how might one use data to choose a newborn name?
I am developing a Shiny app (currently somewhere between non-existent and very work-in-progess) that allows you to explore the newborn names data with an eye to some of the questions I present below. Baby-name Shiny apps are quite common, but none were getting too deep into the analysis, so I thought I’d try to make something a little more complex.
The Social Security Administration keeps records of all newborn names in the US from 1910 to 2019. The data is reported at the state and national level, but only for names with 5 or more occurrences in a given state. Different spellings of similar names (Caitlin, Caitlyn, Kaitlyn, etc) are all counted separately.
The dataset also contains a binary variable for sex assigned at birth. I’ll be using the terminology AFAB (assigned female at birth) and AMAB (assigned male at birth), and in this document I’ll generally present the version of a name with more AFAB or AMAB occurrences.
Name frequencies presented here are standardized: Each name is shown as a proportion of the total births in a given location in a given year for a given assigned sex at birth.
Most names follow a smooth boom-bust cycle (ex: Susan, below). However, there are exceptions to that pattern (ex: Anthony), especially as you go further back in time. Black line is data pooled across the US.
You probably want to see the data for your name. Go to the Shiny app!
No matter how you measure it, there is more variability in AFAB names.
Note in the plot below that there is a lack of overlap at the top and bottom of the scale.
Let’s look at this pattern at the state level, over time.
Take-home messages:
I would guess (but have not grabbed the data yet) that this is mostly, but not completely, explained by the racial/ethnic/cultural diversity characteristics of each state. Also interesting (to me, at least) is that a small number ot states (like Vermont) appear to not obey the overall trend of AFAB names having more variability. Worth looking into more at some point.
Go with the flow. Pick a popular name that’s really zeitgeisty.
Possible options from among the most popular 2019 names, nationally:
Problems with this strategy: These names may be common, yet may also be pask their peak and in decline. Or, they may be popular in some states, but not yours!
Let’s take a look. Black line is again the national average.
if you live in Vermont, Ava is way past its peak, but in Indiana it’s just starting to pop!
Get it while it’s hot. Pick a name that appears to be peaking across the country, even if it’s rare.
In our case, we’ll use the following definition of a ‘peaking’ name:
Problems with this strategy: Sensitive to definition of peak; the upturns and downturns might be statistical noise. Further, does not take into account state-level variation.
Let’s see what emerges from this method:
Nice to see that Dawson is back to its 90s-level popularity. Looks like Easton is dropping fast, get it while you can!
The future is unpredictable. Rare names allow a child a fresh start. Find the rarest names to avoid being part of the herd.
Rare names are generally rare in both states and nationwide, so you don’t need to worry too much about state-level differences. Also, because of the long tail of rare names, you have a LOT to choose from!
Possible options from among the least common 2019 names:
Problems with this strategy: None.
Set them up to be a pan-generational influencer, always ahead of the curve.
Problems with this strategy: May be sensitive to different definitions of leading edge states and explosive growth.
Example: We want to find the state that was first to peak with “Joshua,” and identify it just as it’s starting the explosive growth around 1970 - 1975.
First, we find the peak year for each name in each state (among names that appear in all 51 locations in at least 10 years, and which peaked after 1970, to remove old names whose peak was before the start of the data collection). Then:
Example: Tyler peaked in 1990 in South Dakota, and no other state peaked on or before that date. For Wyatt, two states peaked earliest. Zoe peaked in Ohio first in 2001, and nowhere else. In this case, if we allow ties, each state listed gets a point. If we discard ties, only South Dakota and Ohio get points for these four names.
## name year_of_first_peak state
## 1 Tyler 1990 SD
## 2 Wyatt 2008 ND
## 3 Wyatt 2008 MN
## 4 Zoe 2001 OH
Leading states:
Lagging states:
Notably, there are many states that fall in the middle - for example, Florida and Missouri - which are rarely ahead of the curve but also rarely behind.
Utah leaps out as a clear outlier leading state. This holds up whether ties are included or not.
Allowing ties:
##
## UT WA CO OR MN IA ID NE LA SD HI KS MA MT AK ND NV WY CA DC
## 41 23 22 22 21 19 19 19 18 16 15 15 15 15 14 14 14 14 13 13
Discarding ties:
##
## UT SD IA MT OR RI HI ID LA ME WY AL AR CO DC MS ND NE NH NJ
## 20 6 4 4 4 4 3 3 3 3 3 2 2 2 2 2 2 2 2 2
Let’s look at a Utah-specific example. Utah appears to adopt and then discard names more quickly than any other state. Example:
What names are on the rise in Utah right now? Searched for names in Utah that:
Results below.
What are these special names doing in other states?
How does the prevalence of those names compare to the heavy hitters?
Looks like they still have a lot of headroom to gain popularity!
My sister-in-law suggested to me that the age distribution of mothers might have made Utah an outlier - perhaps there are lots of Mormons who, for cultural reasons, are more likely to be young when having children and hence more likely to have cutting-edge names. I thought that was a really compelling idea, so I looked into it using CDC natality data for 2019. The plot below also only looks at names that peaked in all 51 states since 1980, but includes a wider variety of uncommon names - they had to appear 6 or more times in each state in at least 2 years.
Even accounting for this redefinition of the data, Utah is still a huge outlier, and in general, most states aren’t affected much by the redefinition. I expect all of these nuances are due to racial/ethnic/culture/class differences in different states; however, I don’t have easily accessible data on each birth to do a more sophisticated analysis.
Here are the names from the data above where Utah was the first to peak (including ties):
## [1] "Addison" "Alec" "Alexis" "Ariana" "Ariel" "Ashlee"
## [7] "Aubree" "Aubrey" "Austin" "Avery" "Bella" "Braden"
## [13] "Brady" "Brayden" "Brenden" "Brittany" "Brooke" "Caden"
## [19] "Cameron" "Carson" "Carter" "Cayden" "Chelsey" "Colby"
## [25] "Colton" "Dalton" "Devin" "Easton" "Gabriel" "Hayden"
## [31] "Isabella" "Jackson" "Jaden" "Jase" "Jenna" "Jordan"
## [37] "Jordyn" "Kaden" "Kaitlyn" "Kaleb" "Kaylee" "Keira"
## [43] "Khloe" "Kristen" "Krystal" "Kylee" "Kylie" "Kyra"
## [49] "Lincoln" "Mackenzie" "Madelyn" "Madison" "Makenna" "Makenzie"
## [55] "Mariah" "Mason" "Mckenzie" "Natalie" "Nevaeh" "Paisley"
## [61] "Parker" "Payton" "Ryleigh" "Scarlett" "Shania" "Sierra"
## [67] "Trevor" "Tristan" "Whitney" "Zoey"
I’m tempted to say that none of this really matters, but I suspect that either:
… meaning that this probably does matter to you, especially if you’ve read this far.