# Introducing Infield Outs Above Average

Fielding evaluation has historically lagged behind hitting and pitching evaluation due to the lack of relevant data. Since Statcast was first fully implemented for the 2015 season, detailed batted ball and player tracking data has enabled us to quantify defense like never before, including shift counts and Outs Above Average for outfielders. Now, we have taken another step forward with the release of Infield Outs Above Average.

### History of Fielding Metrics

Why is it so difficult to measure fielding performance? We want to measure fielding performance, and so, we need to count the successes and opportunities. In the official stats, we have putouts and assists as successes, which are largely influenced by the other infielders or the pitcher, and they don’t provide any indication of the difficulty of the play. Opportunities as a count of games or innings is also problematic.

This is why Zone Ratings based on batted ball location data became the next step in fielding metrics. Every batted ball was either assigned to a specific fielder, or that batted ball was assigned to NO fielder at all, a no-man’s-land batted ball. So, this gave us the denominator that we were looking for, or at least, a denominator that was better than using games or innings. In addition, the denominator was set to the fielder who either fielded the ball (the first touch fielder), or the fielder who was assigned responsibility for the base hit that went through his zone of responsibility. Subsequent models such as Ultimate Zone Rating (UZR, a nod to ZR) and Defensive Runs Saved (DRS) improved the process by weighting the opportunities based on a fielder’s typical starting location, among other variables.

### Enter Statcast

Notice the above phrasing: “*fielder’s typical starting location*”. Prior to 2015, we never tracked the actual location of the fielder. With Statcast starting in 2015, we have. And so now, we can take the next step and instead of treating every batted ball that went to a particular zone equally, we can now compare that batted ball to where the responsible fielder is actually standing.

In other words, if we have a ball that was hit half-way between the normal SS spot and the normal 3B spot, ZR would count that as a no-mans-land ball (and remove it from the opportunity set of either fielder). But, if we had a RHH at the plate and the shift was on, the SS might actually be standing right in line for that **particular **batted ball: what might in one system be considered a no-man’s-land batted ball (and given a presumed out rate of 0%), in the Statcast system, it would be considered routine (and given a presumed out rate of 95% or higher). Now we can separate two batted balls hit to the same spot, based on the proximity of the fielder of when the play started (and we’d define the start of the play at pitch release).

### Intercept Model

Great, right? Yes, but still not enough. Why? Because there are two central things that affect the chance of a play being converted into an out: the proximity of the fielder’s starting location to the eventual path of the batted ball, and** **how much time the fielder has to reach that spot.

So, the core outfield Catch Probability calculation is based on these two parameters: distance and time. We create a smoothed out model that adheres to a simple principle: the more distance you have to cover, the lower the catch rate. And the more time you have to cover, the higher the catch rate. While the core for Catch Probability is a Distance/Time math model, there are two additional parameters that we use for outfielders: the direction of travel to the ball’s landing point and the potential impact of the wall. All these parameters taken together forms the Catch Probability Distance-Time model. Now that we know how it works in the outfield, we can describe how it works in the infield, using the same core idea, but with additional considerations.

### From Outfield to Infield

The core Distance-Time model still applies. We have the Opportunity Distance, which is how much distance the fielder has to cover to reach the ball (whether he has to charge in, run back, or move laterally). And the Opportunity Time, which is how much time the fielder has to get there. The two taken together give us the fielder’s Opportunity Space, and that point when fielder and ball cross is called the Intercept Point.

However, infield plays have an additional layer of complexity to them. For example, a batted ball that goes halfway between the SS and 3B in 1 second will likely skip past the infielders, while one which gets there in 2 seconds has a great chance of becoming an out. But, if it gets there in 4 seconds (certainly enough time for the fielder to reach it), the batter will have almost reached first base.

So, in addition to establishing the Intercept Point, we need to establish how far away the batter is to reaching first base. And since we know how fast each runner is, knowing how far away he is in *feet*, we can also estimate how far away he is in *seconds*. As for the fielder, we know the Intercept Point, and so, we also know how long the throw is in feet. And that means, we can estimate how much time in seconds the throw will take from the Intercept Point.

With an estimate of the time for the throw and with an estimate of the time for the run, it comes down to a simple comparison: if the time for the throw is longer than the time for the run, the runner is safe; if the time for the throw is shorter than the time for the run, the runner will be out. Sometimes, the two times are very close. And in those cases, rather than estimating a pure safe/out, we’d have a sliding scale from 1% to 99%: if the two times are identical, we’d estimate the Outs Made at about 50%, and the more there is a gap in the estimated time, the more our estimate will deviate from that 50% point.

### Recap

So there you go, this is the basis for Infield Defense Outs Above Average: we apply an Intercept Model based on distance and time. And the reason this model will resonate is because it is *intuitive*. When you watch a play develop, you are inherently making those calculations already. You are taking a snapshot of the play, and you are taking an educated guess as to whether the runner will beat the play or not. All we are doing with the Intercept Model is applying math and probability at the Intercept Point and answering the questions that we are intuitively asking.

### Example Play

On July 28, 2019, Brian Anderson of the Marlins faced this Diamondbacks shifted fielding alignment, with all three infielders on the pull side playing somewhat deep.

Anderson hit a slow roller toward the third baseman Escobar, who had to charge in. The Intercept Point, the point at which the fielder and the ball crossed paths, took 3.2 seconds from pitch release. Escobar had to cover 44 feet. At the Intercept Point, Escobar was 107 feet to first base. With a speed of a bit over 100 feet per second, the thrown ball will be in the air for roughly 1 second. It will also take about ¾ of a second for the fielder to release the ball. All told, Escobar would have 1.7 to 1.8 seconds from the Intercept Point to get the ball to first base (¾ of a second to get rid of it, and 1 second for the ball traveling through the air).

The batter at the Intercept Point was 47 feet from reaching first base. At his Sprint Speed of 27.1 feet per second, we’d estimate he had 1.7 to 1.8 seconds to reach first base (47 feet divided by 27.1 feet/second is 1.73 seconds; though he was still not on a dead run by that point, so it’s an estimated 1.77 seconds).

Here’s what the Intercept Point data looks like:

You will note that with the fielder having 1.7 to 1.8 seconds to make the play and the runner having 1.7 to 1.8 seconds to reach first base, both starting from the Intercept Point, then the chances of the fielder getting the out is around 50/50. In the above chart, our estimate is 53.7%.

Fast forward 1.7 to 1.8 seconds, and this is what happened.

The runner was initially ruled out, and the call was overturned to safe on a replay challenge. In other words, an expected 50/50 play based on the *model *was in fact a 50/50 play based on *reality*.

And that’s all we are trying to do when we create a model: how can we represent reality? And we represent reality by modeling how we process the play. And we process the play by *implicitly *measuring everything in terms of distance and time. And so the model will *explicitly *measure or estimate all the touch points in distance and time. Welcome to the Distance/Time Intercept Model for Infield Defense.

### Probability Distributions

So, how do we arrive at 53.7%? It’s actually the product of three separate estimates. As mentioned, we have three touch points to consider, what you can consider Action Events:

- The first Action Event is the fielder arriving at the Intercept Point. In the above play, with the ball being hit in line with the fielder, there was essentially a 100% chance that the fielder would arrive at the Intercept Point before it passed him. This isn’t always the case, especially if you think of base hits that go through the SS/3B hole. The range is naturally from 0% to 100%. This play is an example of 100%.
- The second Action Event is the fielder retrieving the ball cleanly. In this play, it was fairly routine, and so, we estimate the ball being picked up at 97.5%. In tougher plays, such as a sharp liner directly at a fielder, he may have a near 100% chance of
**arriving**at the Intercept Point, but he may have an 80% of**retrieving**the ball (think of deflections or simply muffing the ball) - The third Action Event is the race: given that the fielder has the ball in his hands for a throw, how often will he beat the runner to the bag.

- In the conditions represented by this play, the typical fielder will get the ball to the first baseman in 1.72 seconds. It won’t be exactly 1.72 seconds. It’ll be a distribution centered around 1.72 seconds, which you can consider to be +/- 0.25 seconds. He might get off a very quick release and very strong throw, or he might double-pump or lob it.
- A runner with the given speed (~ 27 ft/s for Brian Anderson) and distance (47 feet) to the bag will get there in 1.77 seconds. It won’t be exactly 1.77 seconds. It’ll be a distribution centered around 1.77 seconds, which you can consider to be +/- 0.25 seconds. He might kick it up an extra gear, or he may have a misstep.
- So we have two distributions of races denoted in time, and we come up with a probability that the fielder will beat the runner given those two distributions. And in this case, we estimate the fielder will beat the runner,
*given*that he has ball in hand, 55.1% of the time. - For the runner, we can see the “to go” distance and time, graphically for all speeds of runners at every point along the baseline. Noted in grey is Brian Anderson, the batter/runner on this play.

With all three Action Events probabilities estimated — the chance of arriving at the ball, the chance of retrieving the ball given that he arrived at the ball, and the chance he’ll win the race to the bag given that he’s retrieved the ball — we simply multiply them, as all three actions have to be successful in order to make the out. In Probability Distribution parlance, that’s respectively: 100% x 97.5% x 55.1%, which gives us 53.7%. And that 53.7% gives us our estimated success rate, or the Out Probability, on this play.

There are additional wrinkles, such as a first baseman fielding a ball and being able to run to the bag. As well as sharp liners that don’t need to be caught for the out, since the fielder still has a chance for a force play after knocking the ball down. However, this establishes the core model upon which future iterations can be made.

### The Results

Every play tracked is assigned an Estimated Success Rate, or an Out Probability. This is how it looks when we group all the tracked plays into one of four ranges of Out Probability:

The second line reads: for plays where the Out Probability was between 20% and 79%, the average of those plays was estimated to become an out 61% of the time. And in reality, those plays were turned into an out 64% of the time.

In order to get an out *value *on an individual play, it’s straightforward:

- If you make the out, then your value-added is the difference between 100% and the out probability. So, if the out probability was 60%, and you made the out, you get 40%, or +0.40 outs.
- If you don’t make the out, then you have a negative value, and this time, it’s the difference between 0% and the out probability, or simply the negative of the out probability. Using the same above example of a 60% out probability, if you don’t make the out, then you get negative 60%, or -0.60 outs.

We add up all these partial pluses and minuses. And the total of these becomes your Outs Above Average. (This is the same process followed for Outfield Outs Above Average.)

### The Leaders

Naturally, what we care about is showing all this at the individual fielder level. Here’s how the top seven infielders look. Javier Baez has a total Outs Above Average (OAA) of +19 outs. This means that adding up all his partial pluses and subtracting all his partial minuses left him with an overall total of +19. And that was good enough to lead the league.

The chart below shows additional breakdowns. For Baez for example, the Estimated Success Rate of his average play was only 83%, indicating that he faced a bit tougher set of plays than his peers. You can see the other shortstops on the leaderboard were all at 87% or 88%. As a result of his great fielding, he was able to convert 88% of his responsible plays into outs. While that number is lower than his peers Story, Ahmed and DeJong (90% to 91% or each), when you compare their actual results to their individual Opportunity Space, Baez is ahead. Note that Simmons converted an astounding 93% of his responsible plays into an out, compared to his Opportunity Space value of 87%. However, because he missed alot of playing time, his overall OAA was “only” +16. Had he played as often as Baez, he might have led the league.

In addition, by having every play tagged, we can do additional breakdowns. Above you can see a breakdown based on direction: charging in, running back, or moving laterally toward the 3B or 1B sides.

### Fielder Roles

What helps make the system work is the focus on Roles rather than Positions. In this new world order of Shifting, we don’t want to compare SS Baez standing in short right field to other shortstops. But rather, we want to compare players based on where they are standing. If (Official) 3B Chapman stands in a (traditionally) SS slice, we compare Chapman to other fielders standing in the same slice. So as to not confuse *locations *from official *positions*, we’ve designated these locations or slices as Fielder Roles. The field is split into slices like so:

And when you go to Baseball Savant to slice and dice all the tracked plays, you will be able to do so based on where they are standing:

### Future Iterations

As you might expect, infield defense poses a more complex challenge than outfield defense; we will continue to improve and develop the model. For example, this iteration only looks at the “first out” of a play, so double plays are not considered (beyond the first out) but will be in a future iteration. We also have not adjusted credit for when a player (most often the pitcher) fails to cover a base to record a putout.

We’re excited to finally bring this to the public, and we can’t wait to bring even more. If you think of assists, putout, and errors as the first inning of fielding metrics, Zone Rating as the second inning, and UZR/DRS as the third inning, we’re now in the fourth inning. We’ve come a long way, but we’ve got plenty of room to develop more.

### Further Reading

For more information about Infield Outs Above Average, please see Mike Petriello’s introductory article on MLB.com. You may also want to check out Baseball Savant’s Infield Outs Above Average leaderboard and infield defense visualizations.

Introducing Infield Outs Above Average was originally published in MLB Technology Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

*technology.mlblogs.com*