### What is Estimated Plus-Minus (EPM)?

EPM is an all-in-one player metric that estimates a player's contribution to the team in points per 100 possessions for individual seasons. It is retrodictive which means a sum of EPM values by team (weighted by possessions played) will approximate that team's net points per 100 possessions. In a typical season EPM values range from about -8.0 to +8.0 with an average of around -1.0. The average is less than zero because better players play more minutes (the average weighted by possessions played will approximate zero). EPM, and other empirically-based plus-minus metrics, represent player values in this way because net team points per 100 possessions (adjusted for schedule) is a good indicator of team strength, more-so than record for example.

#### The problems with raw plus-minus

Raw plus-minus, which is net team points while the player is on the court, goes up and down based on many, many factors beyond what the player is actually contributing, and is thus extremely noisy even in large samples. For example, at the time of this writing more than half-way through the 2022-23 NBA regular season, Kentavious Caldwell-Pope, Isaiah Joe, Derrick White, Mike Muscala, Rudy Gay, Cedi Osman, John Konchar, and Isaac Okoro all have a higher raw plus-minus per 100 possessions than Stephen Curry, LeBron James, and Giannis Antetokounmpo (source: basketball reference play-by-play). While perhaps fine players, none are close to the same level as Steph, LeBron, or Giannis.

You may have noticed a pattern above with the players mentioned that many with high plus-minus happen to play alongside really great players on good teams (Caldwell-Pope next to Nikola Jokic, Cedi Osman and Isaac Okoro on a great Cavs team, etc.). This illustrates that one of the major problems with raw plus-minus is that it is blind to context--it is not aware of bias due to who players are playing with (or against for that matter). As will be discussed later, this is largely mitigated with a longstanding technique of *adjusted* plus-minus (APM), which EPM utilizes.

Another big issue with raw plus-minus is luck due to all the random things happening outside of the player's control. You may have also noticed in the players listed above that Rudy Gay is mentioned, who while a great player in his day, is having the worst year of his career, at least in terms of shooting efficiency (24% from 3 and in the 4th percentile in overall shooting efficiency), but he has the 11th highest raw plus-minus per 100 possessions. This is largely due to random luck of how well *his teammates* happened to shoot from three while he was on the court (the team shot 37.1% *including* Rudy's 24%; source: http://www.pbpstats.com/), and how poorly opponents shot from three while on the court (31.7% vs. 37.8% when he was off). There may be a temptation to attribute the difference to the player, but it is almost certainly due to the random chance as it has been shown that defenses have little control over opponent 3-point shooting percentage. This is just one example of many possible things that can be happening outside of the player's control. While it is difficult to mitigate this issue of luck even with adjusted plus-minus, a more stable value of individual player contribution can be gained by using player stats in an empirical way in a method called statistical plus-minus (SPM). EPM uses a proprietary statistical plus-minus model as part of its overall calculation. More information is below.

#### EPM Methodology

EPM has two main components: 1) a statistical plus-minus (SPM) model that uses some play-by-play and player-tracking-derived stats to estimate a player's contribution per 100 possessions, and 2) a regularized adjusted plus-minus (RAPM) calculation meant to capture some of the impact beyond the stats used in step 1 and thus nudge player values to more closely match team points per 100 possessions (effectively filling some of the gap remaining between what we can know from player stats and team strength). Step 1 serves as a jumping off point for the final calculation in step 2, making EPM a Regularized Adjusted Plus-Minus (RAPM) calculation with a Statistical Plus-Minus (SPM) Bayesian prior. More information on each of these methods is below.

*Regularized Adjusted Plus-Minus (RAPM)*

RAPM is used in the final step to calculate EPM values for a given season, but it is also used to create the SPM model used in the first step. As mentioned, adjusted plus-minus (APM) is a technique to help with the contextual blindness of raw plus-minus. It was initially brought to basketball by Dan Rosenbaum to adjust a player's plus-minus by controlling for who they play with and against, as well as other variables such as home court advantage.

It is calculated with a huge regression model but can perhaps be more easily understood as the algebraic strategy of solving system of equations in which the unknowns are player values and the equations are all of the distinct times that unique combinations of players played together and the related point differential for each. There are two variables per player (offense and defense) and around 50,000 unique 10-man lineup equations in a given season.

*Regularized* adjusted-plus minus (RAPM) was a very important improvement upon APM and was first introduced by Joe Sill in 2010 at the Sloan Sports Analytics Conference (you can read the abstract here) at which he reported nearly doubling the accuracy of traditional APM. RAPM uses a statistical technique called ridge regression that helps mitigate issues in coefficient estimation due to multicollinearity (players frequently playing together).

*Statistical Plus-Minus (SPM)*

EPM only uses stats from the given season, but those stats are fed into a statistical model that was trained on many years of RAPM data. This is the SPM model that is used in the first step of calculating EPM. The idea of SPM has been around since APM and was also initially used by Dan Rosenbaum in his article referenced in the previous section. RAPM is a massive upgrade to raw plus-minus but is still noisy. SPM uses player-level stats to help stabilize an estimate of their impact per/100 possessions.

To create an SPM model, modern metrics use large multi-season samples of RAPM (to minimize the noise) along with player stats corresponding to the same time frame to empirically estimate how each stat predicts RAPM. These weights can then be applied to player stats in a given season to get a feel for what their RAPM could be without even calculating it. Output from SPM stabilization is interesting on its own if modeled well, but is limited to only what we can measure at the player level. Quite a bit of what is important on offense is measured at the player level, but defensive stats are very much lacking. For more information about SPM, Neil Payne wrote a nice summary about it here. Daniel Myers is the creator of Box Plus/Minus (BPM) which is a fantastic SPM metric with a very thorough write-up--it is also very much a part of the inspiration behind EPM.

*Combining SPM and RAPM*

EPM is also inspired by Jeremias Engelmann and Steve Illardi's Real Plus-Minus (RPM) which combined these two methods to arrive at even more accurate player values. While Engelmann and Illardi have since moved on, and the formula has changed, RPM values can be found here. RPM used a hand-crafted SPM model as a Bayesian prior in a one-season RAPM calculation with great results. EPM uses the same methodology but with its own SPM model.

### Calculating EPM

*Creating the SPM model*

One of the most important parts of building the SPM model for EPM was variable selection. The goal for EPM was to use only player-level stats that also fit nicely together mathematically to approximate team ratings for individual seasons *before* any force-fitting or RAPM was applied. This also means that maxing out the variance explained in model building (r-squared) was not the goal; for example, EPM does not use team-level stats such as a player's "offensive rating" (which is really just team rating while the player is on the floor), because although it would increase r-squared, and would also help match what the team was doing (well, because it is a team-level stat), it would introduce an overfitted team effect into the model that decreased the accuracy in measuring player impact.

EPM uses mostly possession-based stats derived from play-by-play data so that possessions can be counted rather than estimated when calculating the stats. This means inputs into the model were as accurate as possible. Calculating possession-based stats before play-by-play data required possessions to be estimated so EPM is only calculated for the modern era as of this time.

All inputs for the model were represented relative to average adjusting for the ever-evolving NBA game. For example, Stephen Curry's 68% True Shooting in 2016 meant more than his 68% (so far) in 2023 because the league as a whole is more efficient now (which league-wide evolution is at least partly due to Curry's prodigiousness from deep). All inputs are gently regressed/padded to handle small sample sizes and values that fall outside of ranges included in the training set.

All inputs for the model are linear (no higher-order or interaction terms) to keep things simple and to avoid overfitting. Player tracking inputs exist only for defensive EPM (DEPM) and are mostly derived from publicly available matchup data.

To build the SPM model for EPM, a 10-year RAPM sample from 2004 to 2013 was calculated and used for offense and a 4-year RAPM from 2018 to 2021 for defense. The sample for defense was smaller and more recent to allow for player-tracking data to be used. DEPM values from 2014 to 2017 use a slightly different model based on the player-tracking data that were available at the time, and before 2014 use no player-tracking data.

*Calculating the statistical prior and then RAPM*

The SPM model is used in the first step of calculating EPM for a given season. Only stats from the current season are passed into the model which generates initial player values. Then a one-season RAPM calculation is performed using the initial values as the prior. The RAPM uses a fairly strong lambda value which results in not very large changes to prior values (usually within 1). The RAPM also serves as a sort of team-fit that helps move player values to more closely match team ratings (when aggregated). There is no actual force-fitting happening and aggregated EPM values only approximate team ratings.

*Game EPM*

On player pages you can find Game EPM which is based on EPM's prior only, meaning it is only based on player-level stats. It is also converted into a total production stat (as opposed to a rate stat like regular EPM). Since EPM is an estimate of points contributed per 100 possessions, *Game* EPM is the EPM prior values calculated from in-game stats weighted by possessions played out of 100 (i.e. Game EPM = EPM prior X Possessions Played / 100). The result is an estimate of the total points contributed in the game. It does not account for everything in the game and may not sum to team point totals. It is to provide a rough estimate of how much the player contributed and should be more useful than raw +/-.

You can find this season's EPM values here. To see how EPM stacks up against other metrics, check out this metric comparison study.