clock menu more-arrow no yes

Filed under:

Trouble with Aces: Phillies' struggles against top-tier pitching

New, comments

Phillies fans have become very familiar with aces. Over the last couple of years, the rest of the NL East have become accustomed to having to deal with Halladay, Lee, and Hamels when they enter Citizens Bank Park. At the same time, while the Phillies have unleashed their aces upon MLB, other teams' aces have seemed to handle the Phillies' offense with ease. In the playoffs, Tim Lincecum, Matt Cain, and Chris Carpenter have "out-aced" the Phillies trio on their way to to World Series victories. While the Phillies have always had one of the better offenses in the regular season, in the playoffs, their inability to hit against aces has been exposed. Or has it? In this look at the Phillies offense, I'll examine how the 2011 Phillies performed against these ace-type pitchers, and determine whether or not the Phillies offense was doomed to fail come October.


Before we begin, let's consider a few questions.

1. Which pitchers can be considered aces?

2. What would differ between the postseason and the regular season?

3. Should only starting pitchers be considered, or should elite bullpen arms be added in as well?

Now, here's my take. To consider aces while not limiting the sample size to an extreme degree, I've defined ace to be "a pitcher who has pitching statistics 10% lower than a league average starter". However, given the various pitching statistics available, I chose to take three different samples. Thus, my definition of aces for this post includes any pitcher with either an ERA-, a FIP-, an xFIP- of 90 or lower. These minus statistics refer to a league and ballpark adjusted percentage on a scale where 100 is average, so a FIP- of 90 indicates a FIP that is 10% lower than that of an average pitcher. Going by this definition, three different groups of pitchers that started against the Phillies were considered to be aces. Out of the three, only nine pitchers were unanimous: Josh Johnson, C.J. Wilson, Josh Beckett, Tim Lincecum, Matt Garza, Madison Bumgarner, Tim Hudson, Felix Hernandez, and Jon Lester. Next, while ERA- had a very different idea of who is and isn't an ace, the two advanced statistics also agreed that seven other pitchers were aces: Chris Carpenter, Cory Luebke, Brandon Beachy, Anibal Sanchez, Jaime Garcia, Michael Pineda, and Jonathon Niese. As for the individual differences, ERA- considered Johnny Cueto, Ricky Romero, Ian Kennedy, Matt Cain, Jair Jurrjens, Ross Detwiler, Jorge de la Rosa, Hiroki Kuroda, Jordan Zimmerman, Josh Collmenter, Guillermo Moscoso, Jhoulys Chacin, Daniel Hudson, R.A. Dickey, and Jeff Karstens to be aces. Not exactly the list you'd expect, but that's what happens when you go by results and not by the skills that lead to those results. Now, as for the other groups, FIP- thought that Matt Cain, Ian Kennedy, Jorge de la Rosa, Daniel Hudson, Jordan Zimmerman, Matt Harrison, Johnny Cueto, Mat Latos, and Ricky Nolasco are aces. xFIP- had a smaller group, adding only Yovani Gallardo and Tommy Hanson. Now, you may disagree that some of the pitchers deserve the moniker of "ace", and I'd probably agree with you. But, for this exercise, adding in some very good pitchers in addition to "aces" enables a much larger sample of games to be considered, which can help limit some of the randomness.

Moving on, the only major differences between the regular season and postseason are the quality of opponents and the large amounts of days off. Fifth starters and bench players are relegated for the most part. For pitchers, this indicates that the bulk of the innings in the postseason go to the number one and two slots in the rotation, with the third and fourth starters usually only getting one start. While not every #1 starter is an ace, even for playoff teams, it necessarily follows that very good pitchers will be getting a much greater share of the innings in the postseason than they do in the regular season. Thus, if a team "pads their stats" against mediocre 4th and 5th starters but performs noticeably worse compared to league average against top-tier starters, they'll be a weaker team in October than they'd appear at first glance. Now, for hitters, the only difference worth noting is that teams will utilize all of their starting position players without rest, so long as they're mostly healthy. Thus, I'll only be considering the Phillies starting position players for 2011. However, as an additional piece of consideration, I've limited this group to only players who are still with the team. Thus, this group consists of Ryan Howard, Chase Utley, Jimmy Rollins, Placido Polanco, Hunter Pence, Shane Victorino, John Mayberry Jr., and Carlos Ruiz. Mayberry didn't get much playing time for the Phillies in the playoffs, but with Ibanez heading to the Bronx, I decided to choose the starting LF for the 2012 Phillies in his stead.

As for the third question, I chose to lean on the side of simplicity. Starting pitchers' routines are the same throughout the regular season and the postseason, give or take an occasional extra day of rest. Relievers, however, can face a team having not pitched for a week, or having pitched for the fifth time in six games. Bullpen usage varies so significantly between the regular and postseason that I felt that adding in relievers would complicate the process unnecessarily without much gain. Also, while it may have expanded the sample size slightly, the work required to expand the data and include relievers would have increased by a far greater amount. As such, this post will not be addressing relievers at all.

Data Collection

Unlike many different types of batter splits, such as R/L, Home/Road, or even GB pitcher/ FB pitcher, the splits I'll be referring to are not available. To remedy that, I went through game by game, using box scores and play logs, in order to separate each of the eight batters' PAs against that game's starter from the PAs against relievers. Then, after aligning each game with that game's opposing starter, I collected most of the available statistics on these pitchers from Fangraphs and Baseball Prospectus. Next, I ordered the pitchers by each of the statistics chosen (ERA-, FIP-, xFIP-) and selected the games that were started by the aforementioned groups of starters. Utilizing these games, I created the splits for each player.

However, simply producing splits of this nature without any context would be a meaningless exercise. To provide this context, I also calculated the league average splits against each pitcher (simply using their total singles, doubles, walks, HBPs, PAs, etc.). Then, player by player, I weighted each pitchers' league splits against by the number of PA that batter had facing them in each game. This produced the splits for what a league average batter would have done in the plate appearances of each individual player.

Now, with the context available, I went and determined the difference between each of the eight batters performance against "aces" compared to league average and their performance overall against starters compared to league average. Rather than clutter up this post with thirty-two different sets of data, I felt it prudent to simply compute percentage differences. So, in the charts below, the percentages will all refer to the ratio of the specific players' statistic and the league average players' statistic against aces, divided by the ratio of the specific players' statistic and the league average players' statistic against all starters. Follow that? I doubt it, seeing as I'm pretty sure that I don't, so let me explain. The ratio of the specific player and the league average player against aces provides how well that player does above or below an average player in that statistic against aces. This is nice, but, again, context is crucial. Therefore, by dividing it that value by the ratio of the specifc player and the league average player against all starters, what you end up with is the percentage difference between how that specific player would perform if he performed exactly the same against aces relative to league average as he did against all starters and how that player actually performed against aces. For example, if Raul Ibanez hits 20% above league average against all starters, but hits 10% below league average against aces, this percentage would calculate his difference from the expected to be (performance against aces vs. average)/(performance against all starters vs. average) - 1, or (90%)/(120%) - 1, which would be -.25, or -25%. In this case, Ibanez is hitting 25% worse compared to the average against aces than he does overall against starters.

Now that you've got an idea of what you're looking at, let's delve into the data.


So, what kind of picture does the ERA- data set paint? Well, it's not positive, that's for sure. Howard declines by nearly 40% against aces, with his triple slash falling almost uniformly. He hits far fewer line drives against aces than expected, and sees his walk rate against them fall to less than half of his expected walk rate. In fact, the whole team appears to be almost completely unable to draw a walk against aces, with Utley, Polanco, and Ruiz all seeing massive declines.

Now for the totals, I've provided an average total, weighting each player equally, and a weighted total with each player's numbers weighted by the number of PA they had. While the average total is what would actually occur in a game situation, the very small sample sizes for some of the players make me wary. I've provided it in the tables in case anyone is interested in seeing how it differs from the weighted total, but for any analysis, I'll stick with the weighted data.

The team's performance against these ERA- aces deviated by a fairly large amount from what one would expect, and not in a good way. The large declines in LD%, BB% and the uptick in K% indicate that the team is far worse against these aces then they are against the average starter when compared against a league average batter. The decline in LD% would at least partially explain the decline in BABIP, indicating that it may not just be random BABIP variation driving down the team's numbers. To clarify, the BABIP and HR/FB% numbers are not only based upon luck for batters. Batters have different true-talent HR/FB% (Ryan Howard vs. Placido Polanco), and the same goes for BABIP. Some of the difference may be random variation, but an unknown percentage of it could also be a change in the mean true-talent value. The Phillies batters' may actually have a lower true-talent BABIP against aces compared to league average than they do against all starters. The same goes for HR/FB. Of course, there's really no way to know whether or not this is the case without repeated testing, and that's impossible for something like this. The only thing that it's really fair to say is that the team's decline shouldn't simply be waved away as random variation. In the ERA- sample, the 2011 Phillies truly seem to be a "regular season" offensive club, beating up on weaker starters while failing against playoff pitchers. The ERA- aces were fairly different than the other two groups, however, so let's examine if either of those sets of numbers have a different tale to tell.

Before we get into the FIP- data, I'd like to clarify the differences between these two types of "aces" with data that isn't shown in any of the tables posted here. The FIP- sample of pitchers here actually had a BB% similar to the ERA- sample, 7.1% and 7.0% respectively. However, the difference between these groups comes from K%, where the ERA- pitchers struck out 19.2% of the batters they faced while the FIP- pitchers struck out 21.5%. How then, did the ERA- pitchers have lower ERAs? The answer is simple: BABIP. For ERA- pitchers, it was 0.274, while for FIP- pitchers it was 0.295. With that out of the way, here it is.


The FIP- data is not quite as bad for the Phillies, but the decline is still evident. Walks are down, as are line drives. Ryan Howard once again sees almost across the board declines in his skills. His strikeout rate compared to league average goes up by 11% and his walks and line drives decline by 21% and 16% respectively. He also sees major declines in BABIP and HR/FB. With all of that in tandem, he sees a 30% decline in his triple slash against aces compared to league average (okay, I hope this part is implied by now, as I'm going to stop repeating it from this point on). Jimmy Rollins also falls off, but his drop-off seems to be less of a decline in skills and instead comes from decline in his BABIP and HR/FB. Unlike the team totals, the player PA totals aren't that large. Therefore, most of the variation in BABIP and HR/FB based statistics should be chalked up to random variation instead of a change in the true-talent mean value. Back to Rollins, his decline is especially odd, given that he actually has fewer strikeouts and more walks against aces. His decline, then, seems much less likely to be representative of his future performance against aces than Howard's. The only other notable data point among the players is that the FIP- ace data concurs with the ERA- ace data on one players' skill: Chase Utley's walk rate. Utley walk rate compared to average declines by over 50% against aces. That's concerning, but for Utley, his skill in HBP may not be getting accounted for here, which could explain why he doesn't see any decline in OBP. The lack of OBP decline also comes from his increased BABIP and HR/FB, but certainly not all of it.

Now, as for the team totals, the story is similar to the ERA- data. Declines in BB%, LD%, BABIP, and HR/FB% lead to a nearly 10% across the board decline in the triple slash statistics. The sample size here is larger, so this data is a bit more significant. It seems fairly conclusive that the Phillies are, in fact, faltering against aces. The decline isn't as large in this data set, but it's still there. There is still the xFIP- data set remaining, however, so let's find out whether or not it confirms this effect.

Again, let's first see what differences there are between these two data sets. For xFIP and FIP, the only major difference should be HR/FB%. And that turns out to be the case, as the FIP- group has a HR/FB% of 8.4%, while the xFIP- group has a HR/FB% of 9.1%. However, there are other differences. The xFIP- group has a K% of 22.6%, where as the FIP- group has a K% of 21.5%. The xFIP- and FIP- groups' BB% vary slightly, at 7.2% and 7.0% respectively. The only other notable difference is GB%, where the xFIP- group has a GB% of 47.2% and the FIP- group has a GB% of 46.5%. With that in mind, let's see what the xFIP- data tells us about the Phillies' performance against aces.


The xFIP- group is mostly similar to the FIP- group, so it makes sense that the same type of trends emerge. Ryan Howard and Chase Utley struggle to draw walks, and Howard sees a decline in BABIP and HR/FB% that drags down his triple slash by nearly 25%. Rollins has an even odder set of numbers, with a massive increase in his BB% and decline in his K%, but with correspondingly large declines in BABIP and HR/FB%. Carlos Ruiz finally deserves a mention here, as his BB% increases even more than Rollins' does. Still, the team totals paint a similar picture here. The team is unable to draw nearly as many walks against aces, and has a lower BABIP against them as well. However for the first time, the team's HR/FB% doesn't decline. The only big change here is that the team actually strikes out at a lower rate than expected. Still, all of this in conjunction leads to about a 7% decline in the team's triple slash. It's not quite as strong of a change, but even in the xFIP- data, the team is still classifiable as a "regular season" offensive club. A decline of 7% compared to league average against aces would be fairly significant. While some it may still be random variation and not a real shifting of the mean, the 8% decline in OBP against aces would transform the 2011 Phillies into the 2011 Padres, going from a .323 OBP to a .305 OBP. If even a part of that is a real shift in the mean, it's a pretty big deal.

With all three data sets covered, let me begin with a disclaimer before I draw any conclusions. This exercise was anything but statistically rigorous. Even though I warned about sample size issues, I still may have drawn too much from the individual player data. Additionally, this exercise was far more descriptive than it was predictive. Any conclusions drawn about the production of the 2011 team against aces doesn't automatically indicate anything about the 2012 team. Even for the team totals, samples of 550-750 PA are still subject to random variation.

That said, if a conclusion can be drawn from this exercise, it's that the 2011 Phillies were indeed a team with a "regular season" offense, and a weaker playoff team than their 102 wins would indicate. Ryan Howard managed to hit like Michael Martinez against aces, and the entire team failed miserably at drawing walks, especially Chase Utley. This effect alone would not explain the loss to the Cardinals, but it may have been a factor. It's sad to say, but unlike most of the groupthink that has surrounded this team the past few years, the 2011 Phillies really do appear to be a team with an offense that was guaranteed to falter in the playoffs.

One final note: Although I did this work with this analysis in mind, I now have the capability to sort the pitchers' statistics in any way that Fangraphs or Baseball prospectus offers. If any of you have an interesting idea about a way to sort pitchers, I'd consider it for a future article.