The 2020 Phillies pitching staff retold the tale of Jekyll and Hyde. The competent, well-intentioned starting staff would nightly give way to a malicious, brutal bullpen. As evening turned to night, games became as watchable as the film adaptation of League of Extraordinary Gentlemen. Nevertheless, the cumulative performance placed the staff among the top half of teams in the league by the publicly available summary metrics (e.g., WARs).
A mystery lurks here. The Phillies had an above average offense. They were top ten by both wRC+ and OPS+. They were sixth in runs scored per game. The rotation, for its part, posted as much rWAR as the Indians, came third in fWAR, and sixth in FIP. Obviously, the bullpen dragged those sterling results down for the staff overall. But apparently it didn’t drag those results down enough to explain why the Phillies’ run differential ended the season underwater. How can the offense be this good, the pitching be above average cumulatively, and leave us with a negative run differential for the season?
We can locate the mystery in one stat: the team’s BABIP-against. After the full 60 games the Phillies staff allowed a .343 batting average on balls in play. If they had played 162 games, that would be a record during the post-integration era. It is 11 points higher than the 2020 Red Sox, which also would have been a post-integration record. The Red Sox fielded a replacement level staff. It’s not surprising that a quad-A staff gave up an elevated BABIP. But the Phillies staff as a whole was much better than that.
There are four sources of BABIP inflation: quality of contact, defense, chance, and, especially this season, an unbalanced schedule. I will set defense aside here because the defense requires an article unto itself. And since chance is just that for which we can’t find an explanation, I won’t say much about it. So, let’s dive into the other two sources, starting with the unbalanced schedule.
How elevated was the Phillies BABIP-against this season? Normally, we would compare theirs to the league average (.291) and conclude that the Phillies were a whopping 52 points above average. But even in normal seasons a team might face higher quality hitters due to their unbalanced schedule. As we all know teams in the same division play each other more than teams outside it, and a team only plays select teams in the other league during interleague play. If that subset of opponents is better than average at getting hits on balls in play, we expect the correlated team to surrender an elevated BABIP-against.
And, boy, did the Phillies face good offenses this season. Only one opponent had a below average offense by wRC+ (Marlins), and only one opponent hit for a below average BABIP (Yankees). The Mets, Braves, and Yankees are all top 5 offenses. The Rays came in a tie with the Phillies for 9th to round out the top 10. It is safe to say that the pitchers of the East divisions had a harder task than any other pitchers this season.
This collection of nine opponents hit for a cumulative .307 BABIP, 16 points higher than the league average. Given that the Phillies did not play the rest of the league, we do better to compare their BABIP-against to this one. But here we run up against a methodological impasse. The .307 BABIP includes at-bats against the Phillies. We want to isolate the Phillies contribution to their elevated BABIP against by finding a normalized BABIP. On the one hand, leaving the Phillies in the group potentially underestimates their own contribution. On the other hand, removing the Phillies potentially overestimates it because these teams might have produced the same results against an ersatz club. What to do?
I’m going to remove the results against the Phillies. The question I want to answer is how many extra hits were the Phillies’ responsibility as opposed to luck. Ultimately, I won’t use the BABIP numbers themselves to answer this question. The choice only affects how many hits we attribute to luck. So, the Phillies opponents hit for a .302 BABIP when not facing the Phillies. Yes, the Phillies increased their collective BABIP by 5 points. That’s both small and massive. As always, the 2020 Phillies are an oxymoron.
(One could object that 60 team games is still a small sample, and we ought to regress their opponents’ BABIPs toward the mean somewhat. But I can’t do that research for this article. So, I’m taking the BABIP result as pure signal. Come at me in the comments.)
The Phillies, then, allowed a BABIP 41 points above what we’d expect all else equal. That’s still a lot. It amounts to 57 extra hits. Too much for bad luck on contact quality and direction to be a satisfying explanation without some investigation. So, let’s look at quality of contact against the Phillies staff. I’ll be honest. We aren’t going to get clarity. Things are going to get weird.
When a team’s BABIP-against is high you expect to see a few indicators in their pitching peripherals. One is a high rate of hard contact. The harder a ball is hit the less likely a defender is to intercept it. The Phillies, however, fared well in hard contact. According to BIS they had the sixth lowest hard hit rate (31%). If we look at Statcast, they didn’t fare as well but still were much better than average with the 10th lowest average exit velocity (88.1 mph) and hard hit rate (36.8%).
Huh. Well, not all hard contact is the same. Perhaps, when they allow hard contact it is particularly good contact. Statcast gives us the barrel metric to try to capture this idea. By barrel rate, the Phillies look slightly worse. They allowed a league average rate (7.6%). That’s still much better than we’d anticipate from a team with such a high BABIP against.
Moreover, the Phillies had the lowest average launch angle (9.3) by more than a degree. The average ball-in-play against the Phillies was a groundball. Indeed, the Phillies induced the highest rate of groundballs and the lowest rate of flyballs.
Let’s take a step back and appreciate how weird this is. Forget for a moment that we’re talking about the Phillies pitching staff. In the abstract you would like this peripheral profile a lot. You might guess that this staff allowed a slightly higher BABIP because groundballs go for hits more often than flyballs and their line drive rate must be a bit high to account for the low flyball rate. Sure. But you’d also guess that they wouldn’t allow many dingers, would generally limit extra bases, and would collect double plays that stranded runners.
Nope. Nope. And, well, sorta but really nope. The Phillies allowed the second highest HR/FB rate (18.6%), second highest wOBA on contact (.421), and the fifth highest expected wOBA on contact (.378). And although they turned the fifth most DPs in the league, they were the 6th worst team at stranding runners (69.4%).
Now I know what you’re thinking. We all watched this staff throw meatballs. Even when the good pitchers seemed to be rolling, they often suddenly grooved a fastball or hung a breaking ball and damage was done. If you throw meatballs, you’re going to get hammered and give up runs, no matter how well you pitch outside of the meatballs. That’s true. But the Phillies also didn’t throw that many meatballs. Their meatball rate (7%) is a tick below league average (7.1%). Although opponents swung at those meatballs at a slightly more aggressive rate (76.2%) than most (75.5%), the overall effect of these meatballs certainly doesn’t account for much if any of their increased BABIP.
Let me pause to acknowledge a caveat. Metrics like meatball% are buckets. Like colonialist cartographers, buckets draw hard boundaries through continuous populations. They present as binary, data that paints a spectrum. It is possible that the Phillies threw many more near-meatballs than their competition. And, presumably, near-meatballs go for hits nearly as often as meatballs. But we’ve already seen with other bucketing metrics such as quality of contact and batted-ball type that the Phillies staff limited dangerous contact. So, the Phillies meatball stat coheres with the larger picture.
Before I finish off this article by calculating how many extra hits we can account for with the staff’s peripherals, I want to draw one important conclusion. The Phillies staff, including the bullpen, got worse results than it deserved. None of the peripherals we’ve reviewed explain the staff’s extreme BABIP. As measured by ERA estimators, the overall impact was significant: at least .75 runs per 9 innings, at most 1.2. Granted, ERA estimators assume a league average BABIP, and we are rejecting that assumption for this investigation. What we’ve seen so far suggests that the difference between ERA and its estimators should be less. But the decrease certainly couldn’t be enough to span the gap. As bad as the bullpen was, it just couldn’t be responsible for an extra run per game over and above its awful performance by core metrics. If it were, that responsibility should appear in the staff’s contact peripherals.
Ok. So how many extra hits can we find in all the stats I’ve reviewed here? I will describe the math for the estimate at only the highest level of generality for fear of pedantry and boredom. First, using opponents’ BABIP on batted ball types, we can use the Phillies batted ball profile to get a first estimate on how many extra hits we should expect. As noted above, the Phillies low flyball rate indicates we should expect a higher than normal amount of hits on balls in play. And that’s what we find: 4 extra hits, almost all the result of giving up more line drives than most. But still, just 4. 4 out of 57 extra hits according to their BABIP against.
Although it requires more fudging, let’s look at quality of contact on batted balls as well. By drilling down into the quality of contact on each batted ball type we can refine the above estimate. Maybe the Phillies tended to give up more hard hit flyballs, which would fall for hits much more often than soft or medium contact in the air.
For example, the Phillies limited hard contact on line drives. Sometimes that is a false friend. Medium contact on line drives goes for a hit more often than average because they turn into bloopers instead of glove-finders. In the Phillies case, however, they did such a good job getting soft contact on line drives (2nd in MLB) that I found they should have given up one hit less on line drives than I calculated above.
In fact, when I incorporate quality of contact into the batted ball profile the 4 extra hits I found disappear. The Phillies staff took the sting out of enough bats to overcome a slightly disadvantageous distribution of batted balls. As best as I can estimate the effect of the staff’s contact peripherals on their BIP outcomes, the effect is nil.
That’s not to say the staff at large was not at all responsible for the elevated BABIP. The bullpen was genuinely not up to the level of competition in MLB. It is possible that some causes of the elevated BABIP do not show up in the data we have. But those causes would certainly not explain much of the elevation. We’ve already seen how little the batted ball data might have explained: not even 10% of the extra hits. At most, perhaps it could explain 20% if the profile were extreme. Other, harder to measure aspects of performance will explain less.
If we want to pursue the mystery of the Phillies extraordinarily high BABIP any further, we have to turn to the Phillies defense. Otherwise, we only have recourse to luck, which is just another word for whatever we can’t explain. That recourse, for someone willing to write an article this long to come to no quantifiable conclusion, forebodes as much cosmic horror as anything depicted in Lovecraft Country.
UPDATE: Per my reply to lorecore, if we look at an xBABIP based on exit velocity and launch angle of individual batted balls, then we get a much more significant result. Using Statcast’s search function I found the Phillies xBABIP to be .328, which accounts for ~30 extra hits. So, that last sentence in the second-to-last paragraph is just wrong.