The S&P 500 Historical Components & Changes - Analyzing Alpha (2024)

You can pay to get the S&P 500 historical constituents or use various free sources. Due to survivorship bias, getting an accurate list of the S&P500 components over time is critical when developing a trading strategy.

Getting the S&P500 Historical Constituents

Multiple paid and free data providers provide the S&P500 constituents list. Finding the components of other indices can be more complex and generally requires a paid source. I show the best free and paid resources that I’ve found for S&P500 constituents below. I added Analyzing Alpha’s files created from Wikipedia for convenience:

SourcePaid or Free
Siblis Research S&P500 Historical ComponentsPaid
Norgate Data Historical Index ConstituentsPaid
iShares Core S&P500 ETFFree
Wikipedia List of S&P500 CompaniesFree
Analyzing Alpha (Components without History from Wikipedia)Free
Analyzing Alpha (Components with History from WikipediaFree

Download the S&P 500 Historical Components

If you’re just here for the CSV data, please use the following, which is offered freely using the followingcreative commons license:

  1. CSV File of SP500 Constituents
  2. CSV File of SP500 Historical Changes

Creating your Own S&P500 Components List

I will show you how to create your own S&P500 constituents dataset using Python by web scraping Wikipedia as it provides more S&P 500 history data. If you’re following along with me, the Jupyter notebook is the best tool to use for this sort of data manipulation and cleanup.

You’ll notice that Wikipedia stays up-to-date and includes the S&P 500 additions and deletions for 2022.

Scraping The Constituents with Pandas

pandas.read_htmlenables us to scrape a web page for Html tables and add them into a dataframe. I import the required libraries and grab the data.

import datetime as dtimport pandas as pdurl = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'data = pd.read_html(url)

If you check outthe Wikipedia List of S&P500 Companies, you’ll notice there is a table containing the current S&P500 components and a table listing the historical changes.

Let’s grab the first table and use Pandas to manipulate it into the format we want. Iterative data tasks like these are usually best performed in Jupyter Notebook. I added the first day of S&P500 trading and verified the dates usingRegEx if there was missing data.

# Get current S&P table and set header columnsp500 = data[0].iloc[:, [0,1,6,7]]sp500.columns = ['ticker', 'name', 'date' , 'cik']# Get rows where date is missing or not formatted correctly.mask = sp500['date'].str.strip().str.fullmatch('\d{4}-\d{2}-\d{2}')mask.loc[mask.isnull()] = Falsemask = mask == Falsesp500[mask].head()
ticker name date cik7 AMD Advanced Micro Devices NaN 248851 T AT&T 1983-11-30 (1957-03-04) 732717126 ED Consolidated Edison NaN 1047862130 GLW Corning NaN 24741138 DHR Danaher Corporation NaN 313616139 DRI Darden Restaurants NaN

Fill The Missing Data

Next, we’ll use zfill to zerofill the cik code as it’s a ten-digit string and not an integer and set all missing dates to 1900-01-01. Hopefully, the community and others can help fill in these gaps!

current = sp500.copy()current.loc[mask, 'date'] = '1900-01-01'current.loc[:, 'date'] = pd.to_datetime(current['date'])current.loc[:, 'cik'] = current['cik'].apply(str).str.zfill(10)

With the current table organized in the manner we want, it’s time to work on the historical adjustments using pandas to wrangle the data into the format we want. We’ll create a dataframe for additions and removals, then concatenate them.

# Get the adjustments dataframe and rename columnsadjustments = data[1]columns = ['date', 'ticker_added','name_added', 'ticker_removed', 'name_removed', 'reason']adjustments.columns = columns# Create additions dataframe.additions = adjustments[~adjustments['ticker_added'].isnull()][['date','ticker_added', 'name_added']]additions.columns = ['date','ticker','name']additions['action'] = 'added'# Create removals dataframe.removals = adjustments[~adjustments['ticker_removed'].isnull()][['date','ticker_removed','name_removed']]removals.columns = ['date','ticker','name']removals['action'] = 'removed'# Merge the additions and removals into one dataframe.historical = pd.concat([additions, removals])historical.head()
 date ticker name action0 September 20, 2021 MTCH Match Group added1 September 20, 2021 CDAY Ceridian added2 September 20, 2021 BRO Brown & Brown added3 August 30, 2021 TECH Bio-Techne added4 July 21, 2021 MRNA Moderna added

Now that we have both the current and historical data let’s add any tickers in the S&P 500 index but not in Wikipedia history.

missing = current[~current['ticker'].isin(historical['ticker'])].copy()missing['action'] = 'added'missing = missing[['date','ticker','name','action', 'cik']]missing.loc[:, 'cik'] = current['cik'].apply(str).str.zfill(10)missing.head()
date ticker name action cik0 1976-08-09 MMM 3M added 00000667401 1964-03-31 ABT Abbott Laboratories added 00000018006 1997-05-05 ADBE Adobe added 00007963439 1998-10-02 AES AES Corp added 000087476110 1999-05-28 AFL Aflac added 0000004977

Merge and Dedup the Data

We’ll now merge the historical and the S&P 500 companies and then dedupe them.

sp500_history = pd.concat([historical, missing])sp500_history = sp500_history.sort_values(by=['date','ticker'], ascending=[False, True])sp500_history = sp500_history.drop_duplicates(subset=['date','ticker'])sp500_history
 date ticker name action cik112 September 8, 2016 CHTR Charter Communications added NaN112 September 8, 2016 EMC EMC Corporation removed NaN113 September 6, 2016 MTD Mettler Toledo added NaN113 September 6, 2016 TYC Tyco International removed NaN208 September 5, 2012 LYB LyondellBasell added NaN... ... ... ... ... ...484 1900-01-01 00:00:00 WAT Waters Corporation added 0001000697493 1900-01-01 00:00:00 WHR Whirlpool Corporation added 0000106640483 1900-01-01 00:00:00 WM Waste Management added 0000823768491 1900-01-01 00:00:00 WRK WestRock added 0001732845492 1900-01-01 00:00:00 WY Weyerhaeuser added 0000106535

Export Data to CSV

And finally, we’ll export both files out to a CSV for download, which you can find in this notebook and the associated files on theAnalyzing Alpha Github.

The S&P 500 Historical Components & Changes - Analyzing Alpha (2024)
Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 6251

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.