Introduction

Recent news of a potential new arrival to my extended family caused me to think about choosing names for newborns. What names are popular? Are there patterns and trends over space and time? And given a set of criteria, how might one use data to choose a newborn name?

Shiny app

I am developing a Shiny app (currently somewhere between non-existent and very work-in-progess) that allows you to explore the newborn names data with an eye to some of the questions I present below. Baby-name Shiny apps are quite common, but none were getting too deep into the analysis, so I thought I’d try to make something a little more complex.

Definitions & Data

The Social Security Administration keeps records of all newborn names in the US from 1910 to 2019. The data is reported at the state and national level, but only for names with 5 or more occurrences in a given state. Different spellings of similar names (Caitlin, Caitlyn, Kaitlyn, etc) are all counted separately.

The dataset also contains a binary variable for sex assigned at birth. I’ll be using the terminology AFAB (assigned female at birth) and AMAB (assigned male at birth), and in this document I’ll generally present the version of a name with more AFAB or AMAB occurrences.

Name frequencies presented here are standardized: Each name is shown as a proportion of the total births in a given location in a given year for a given assigned sex at birth.

Basic name dynamics

Most names follow a smooth boom-bust cycle (ex: Susan, below). However, there are exceptions to that pattern (ex: Anthony), especially as you go further back in time. Black line is data pooled across the US.

You probably want to see the data for your name. Go to the Shiny app!

Conformity for AFAB/AMAB names

No matter how you measure it, there is more variability in AFAB names.

  • The top 100 AMAB names account for 724,000 births in 2019
  • The top 100 AFAB names account for 549,000 births in 2019
  • Number of AMAB names with only 5 occurrences in 2019: 4,252
  • Number of AFAB names with only 5 occurrences in 2019: 5,280

Note in the plot below that there is a lack of overlap at the top and bottom of the scale.

Let’s look at this pattern at the state level, over time.

Take-home messages:

  • More name variability in some states than others
  • All states gaining name diversity over time
  • Diversity of AFAB names outpacing diversity of AMAB names

I would guess (but have not grabbed the data yet) that this is mostly, but not completely, explained by the racial/ethnic/cultural diversity characteristics of each state. Also interesting (to me, at least) is that a small number ot states (like Vermont) appear to not obey the overall trend of AFAB names having more variability. Worth looking into more at some point.

Picking a newborn name: Four strategies

Strategy 1: Current common names

Go with the flow. Pick a popular name that’s really zeitgeisty.

Possible options from among the most popular 2019 names, nationally:

  • Liam (20,502)
  • Noah (19,048)
  • Olivia (18,451)
  • Emma (17,102)
  • Ava (14,440)
  • Oliver (13,891)

Problems with this strategy: These names may be common, yet may also be pask their peak and in decline. Or, they may be popular in some states, but not yours!

Let’s take a look. Black line is again the national average.

if you live in Vermont, Ava is way past its peak, but in Indiana it’s just starting to pop!

Strategy 2: Currently peaking names nationwide

Get it while it’s hot. Pick a name that appears to be peaking across the country, even if it’s rare.

In our case, we’ll use the following definition of a ‘peaking’ name:

  • Increased in popularity from 2015 to 2018
  • Decreased in popularity from 2018 to 2019

Problems with this strategy: Sensitive to definition of peak; the upturns and downturns might be statistical noise. Further, does not take into account state-level variation.

Let’s see what emerges from this method:

Nice to see that Dawson is back to its 90s-level popularity. Looks like Easton is dropping fast, get it while you can!

Strategy 3: Currently uncommon names

The future is unpredictable. Rare names allow a child a fresh start. Find the rarest names to avoid being part of the herd.

Rare names are generally rare in both states and nationwide, so you don’t need to worry too much about state-level differences. Also, because of the long tail of rare names, you have a LOT to choose from!

Possible options from among the least common 2019 names:

  • Anakin (177)
  • Fenix (29)
  • Quest (26)
  • Khole (11)
  • Iam (5)

Problems with this strategy: None.

Strategy 4: Predicted future popularity

Set them up to be a pan-generational influencer, always ahead of the curve.

  • Look at states that are most often on the leading edge of popular names
  • Find the names in those states that are just starting their explosive growth

Problems with this strategy: May be sensitive to different definitions of leading edge states and explosive growth.

Example: We want to find the state that was first to peak with “Joshua,” and identify it just as it’s starting the explosive growth around 1970 - 1975.

Strategy 4 step 1: Identify leading / lagging indicator states

First, we find the peak year for each name in each state (among names that appear in all 51 locations in at least 10 years, and which peaked after 1970, to remove old names whose peak was before the start of the data collection). Then:

  • Find states whose peak year was earliest for each name, give that state a “point”
  • If states peak in the same year, can allow or discard ties
  • Count up which state has the most “points”
  • Repeat for states whose peak year was the latest for each name.

Example: Tyler peaked in 1990 in South Dakota, and no other state peaked on or before that date. For Wyatt, two states peaked earliest. Zoe peaked in Ohio first in 2001, and nowhere else. In this case, if we allow ties, each state listed gets a point. If we discard ties, only South Dakota and Ohio get points for these four names.

##    name year_of_first_peak state
## 1 Tyler               1990    SD
## 2 Wyatt               2008    ND
## 3 Wyatt               2008    MN
## 4   Zoe               2001    OH

Leading states:

Lagging states:

Notably, there are many states that fall in the middle - for example, Florida and Missouri - which are rarely ahead of the curve but also rarely behind.

Utah leaps out as a clear outlier leading state. This holds up whether ties are included or not.

Allowing ties:

## 
## UT WA CO OR MN IA ID NE LA SD HI KS MA MT AK ND NV WY CA DC 
## 41 23 22 22 21 19 19 19 18 16 15 15 15 15 14 14 14 14 13 13

Discarding ties:

## 
## UT SD IA MT OR RI HI ID LA ME WY AL AR CO DC MS ND NE NH NJ 
## 20  6  4  4  4  4  3  3  3  3  3  2  2  2  2  2  2  2  2  2

Let’s look at a Utah-specific example. Utah appears to adopt and then discard names more quickly than any other state. Example:

Strategy 4 step 2: Identify leading state names about to pop

What names are on the rise in Utah right now? Searched for names in Utah that:

  • Have 3 years of slow or negative growth before 2018
  • Have one large jump from 2018 to 2019

Results below.

What are these special names doing in other states?

How does the prevalence of those names compare to the heavy hitters?

Looks like they still have a lot of headroom to gain popularity!

What’s up with Utah?

My sister-in-law suggested to me that the age distribution of mothers might have made Utah an outlier - perhaps there are lots of Mormons who, for cultural reasons, are more likely to be young when having children and hence more likely to have cutting-edge names. I thought that was a really compelling idea, so I looked into it using CDC natality data for 2019. The plot below also only looks at names that peaked in all 51 states since 1980, but includes a wider variety of uncommon names - they had to appear 6 or more times in each state in at least 2 years.

Even accounting for this redefinition of the data, Utah is still a huge outlier, and in general, most states aren’t affected much by the redefinition. I expect all of these nuances are due to racial/ethnic/culture/class differences in different states; however, I don’t have easily accessible data on each birth to do a more sophisticated analysis.

Here are the names from the data above where Utah was the first to peak (including ties):

##  [1] "Addison"   "Alec"      "Alexis"    "Ariana"    "Ariel"     "Ashlee"   
##  [7] "Aubree"    "Aubrey"    "Austin"    "Avery"     "Bella"     "Braden"   
## [13] "Brady"     "Brayden"   "Brenden"   "Brittany"  "Brooke"    "Caden"    
## [19] "Cameron"   "Carson"    "Carter"    "Cayden"    "Chelsey"   "Colby"    
## [25] "Colton"    "Dalton"    "Devin"     "Easton"    "Gabriel"   "Hayden"   
## [31] "Isabella"  "Jackson"   "Jaden"     "Jase"      "Jenna"     "Jordan"   
## [37] "Jordyn"    "Kaden"     "Kaitlyn"   "Kaleb"     "Kaylee"    "Keira"    
## [43] "Khloe"     "Kristen"   "Krystal"   "Kylee"     "Kylie"     "Kyra"     
## [49] "Lincoln"   "Mackenzie" "Madelyn"   "Madison"   "Makenna"   "Makenzie" 
## [55] "Mariah"    "Mason"     "Mckenzie"  "Natalie"   "Nevaeh"    "Paisley"  
## [61] "Parker"    "Payton"    "Ryleigh"   "Scarlett"  "Shania"    "Sierra"   
## [67] "Trevor"    "Tristan"   "Whitney"   "Zoey"

Take-home message

I’m tempted to say that none of this really matters, but I suspect that either:

  • Class signaling by name really does impact a child’s life course, or
  • People think that class signaling by name really does impact a child’s life course

… meaning that this probably does matter to you, especially if you’ve read this far.