/cdn.vox-cdn.com/uploads/chorus_image/image/44194102/usa-today-8114545.0.jpg)
Jeff Sullivan is just the latest Fangraphs author to argue that Cole Hamels is not as good as and is worth less in a trade than he is. Since the run-up to 2014's trade deadline we've been reading articles on Fangraphs minimizing Cole's success on the mound, the value of his current contract, and how much the Phillies should expect to receive for him in a trade. If you read FG and believe what they--mostly Dave Cameron--say about Cole, then you are probably angry that the Phillies might have Cole throw a single pitch for the Phillies this season. The gall of the Phillies to ask for so much for so little! Doesn't RAJ know he owes the other teams, who are ready to compete now, an all-star caliber run-preventer?
Some of that is unfair. But, in my defense, the arguments for the standard FG position on Cole are bad enough that they invite hyperbolic, conspiratorial interpretation. Ok, fine, that's unfair too. Nevertheless, the arguments are frustratingly abbreviated and selective. It's great that we have lots of means to quantify on-field performances and compare them and even evaluate future value. But before we rush off to the quantification we have to make sure that the quantificational methods are better than the heuristic methods they are to replace. Otherwise, we risk being seduced by the Siren Quanta.
Let's get to the point of contention. Actually, points. Two, in fact. They are:
- That it is a settled opinion among the statistically wise that Cole is not as good as Scherzer or Lester.
- That the Phillies should receive non-top prospects for Cole in a trade.
Take 1. first because it is 1. If you only look at the FG's metrics 1. seems to be undeniably true. After all, since 2010 Cole has amassed around 20 fWAR whereas Scherzer and Lester have produced 22-23. At about half a win per season, that's a HUGE difference. VERY strong evidence for the conclusion. [Error bars, anyone?] If we look at the individual seasons, however, the evidence improves. Lester has 4 seasons in his career at or above 5 fWAR. Scherzer has 2. Cole has none. Maybe Cole is more consistently above average than the others, but he also doesn't have the peak talent they do. Maybe. If we are only looking at FG, sure.
But why should anyone limit their evidence set to a subset of what's available and reliable? While FG's metrics--their methods for deriving summative calculations of production like FIP and WAR--are solid and useful, they are not the only one and are based on weighting certain aspects of a player's performance more than other aspects. Specifically, for pitchers, FG relies heavily on the premise that pitchers do not reliably influence their outcomes on balls in play, except that they can be better or worse at generating ground balls. Admittedly, FG writers will acknowledge that their pitching metrics are limited by this premise. But when they write up their conclusions, that epistemic humility evaporates.
The lack of humility is frustrating because there is a wealth of evidence that contradicts the settled FG opinion on Cole. Our own John Stolnis summarizes that evidence in an excellent article on numberFire. Here's the evidence in a nut-shell:
Rk | Player | WAR | IP | BB | SO | ||||
---|---|---|---|---|---|---|---|---|---|
1 | Clayton Kershaw | 33.6 | 1099.2 | 281 | 1160 | 2.26 | 2.54 | 163 | 60 |
2 | Cole Hamels | 27.8 | 1064.2 | 266 | 1021 | 3.00 | 3.27 | 129 | 83 |
3 | Felix Hernandez | 27.5 | 1155.2 | 285 | 1141 | 2.78 | 2.84 | 136 | 75 |
4 | Justin Verlander | 26.2 | 1138.0 | 328 | 1084 | 3.23 | 3.17 | 128 | 75 |
5 | Cliff Lee | 26.0 | 960.0 | 132 | 924 | 2.95 | 2.80 | 133 | 79 |
6 | Jered Weaver | 23.1 | 1016.1 | 257 | 859 | 2.99 | 3.57 | 127 | 79 |
7 | David Price | 21.9 | 1079.0 | 266 | 1033 | 3.08 | 3.11 | 124 | 80 |
8 | Max Scherzer | 21.5 | 1013.0 | 305 | 1081 | 3.52 | 3.32 | 118 | 86 |
9 | Adam Wainwright | 19.7 | 897.2 | 193 | 795 | 2.89 | 2.83 | 131 | 77 |
10 | Johnny Cueto | 19.6 | 863.0 | 235 | 705 | 2.73 | 3.50 | 143 | 78 |
11 | Gio Gonzalez | 18.5 | 956.1 | 391 | 929 | 3.22 | 3.35 | 123 | 78 |
12 | Hiroki Kuroda | 18.5 | 1018.1 | 226 | 783 | 3.36 | 3.62 | 117 | 89 |
13 | Doug Fister | 18.3 | 921.2 | 174 | 633 | 3.29 | 3.42 | 122 | 86 |
14 | Anibal Sanchez | 18.1 | 895.0 | 266 | 830 | 3.43 | 3.10 | 119 | 84 |
15 | Jon Lester | 17.8 | 1038.0 | 341 | 970 | 3.54 | 3.48 | 117 | 86 |
That's right. By Baseball Reference's rWAR Cole has been by far the best pitcher among the three for the last 5 seasons. Whereas FG pegs Cole as an above average starter, BR sees Cole as a perennial all-star. Given the vast difference between Cole and Lester and Scherzer over the rather large sample, it is ridiculous to settle on the opinion that Cole is not as good Lester or Scherzer. The evidence is too discordant. It would be more reasonable to say that all three are roughly equal and then go on looking for a reasonable way to draw any stronger conclusions.
So, why do FG and BR disagree so much on Cole? Well, each differs in how it calculates pitching production. As I explained above, FG derives pitching value primarily from K%, BB%, HR/FB%, and GB%. These, they say, are the aspects of a pitcher's performance over which they have the most control and can be measured independently from how the defense behind the pitcher performs. True enough. But the pitcher might deserve credit for other aspects of his performance that are not measurable independently from how his defense performs. In other words, some pitchers might be able to prevent runs from scoring by means not represented by the four stats above. In fact, studies since Voros McCracken's revolutionary conclusion that pitchers do not have BABIP skill have shown that pitchers do have influence over their BABIPs, albeit to a much lesser extent than they have over Ks and BBs. The mythical weak contact pitcher is not really a myth at all. (Of course, prior to McCracken baseball's common wisdom overestimated the extent to which weak contact could be a skill.)
In light of the possibility that a pitcher deserves some credit for his batted ball outcomes, BR has derived its pitching WAR from the runs actually scored against a pitcher (including unearned runs). Rather than use ERA they use runs against per 9 innings (RA9) as a base because even when errors are made pitchers often have an opportunity to prevent the run from scoring. BR then makes adjustments to the RA9 based on the quality of opposition the pitcher faced, the parks in which he pitched, and the defense that played behind him. By adjusting the RA9 this way BR attempts to isolate the runs that are attributable to the pitcher's own performance rather than the hitters he faced, his defenders, or the assortment of parks he pitched in.
To sum up the difference between FG and BR on pitchers, FG judges them by a core of reliable performance measures but ignores factors (skills?) harder to measure, whereas BR starts with the holistic results of the pitcher's performance and then figures out what shouldn't be attributed to the pitcher himself. Which of these methods better approximates a pitcher's production? That's still an open question, which is just one more reason not to ignore the evidence that either method provides.
So much for point of contention 1. On to point of contention 2. For months now FG has been carping about RAJ's demands in return for Hamels. Three top prospects! Outlandish! Preposterous! He should get a middling prospect and be thankful he doesn't have to pay the best pitcher in franchise history anymore.
Look. I doubt the Phils can get a prospect haul for Hamels. Now that it has become impossible to build a winning team with players paid at free agent market value, teams would rather hold onto their chances at cheap production than move them for more certain and immediate production. That's just the way the MLB world is. But there is a big difference between a prospect haul (e.g., Bryant, Almora, Edwards from the Cubs) and Eduardo Rodriguez. First, the Phils should expect a better return than Rodriguez because recent trades have indicated that a pitcher as good as Hamels is worth more than that. Second, the Phils shouldn't make that trade. Hamels is worth more to them as a veteran fan-favorite on a rebuilding and perhaps rebuilt team than Rodriguez would likely be.
But this disagreement over what the Phillies "should" get for Hamels in a trade is not what I find galling about the standard FG opinion. I find the reasoning for it, ummm, simplistic. Sullivan and Cameron determine the value of Hamels in a trade by calculating the surplus value of Hamels production over what his contract will pay him. They then find a prospect package that equals that surplus value and call it an even trade. Let's jump right to a witty retort to that line of reasoning (brought to you by phillsandthrills): "Hey, this below average player is being paid 500,000 a year, he's netting just as much WAR above his contract as Cole Hamels, let's call that an even trade!" Yep, that makes sense.
But let's not leave it at a quick demonstration of the absurdity. We could be allowing our biases to deter us from accepting the cold rationality that surplus value presents. We wouldn't want to be accused of fanaticism, would we? So, why is the surplus value analysis ridiculous? Start with this basic question: what does the analysis show? Is it a description of how the trade market works? Or is it a normative argument about what someone ought to pay for Hamels?
Sullivan at one point claims it is the former: "Whether you like arguments based on contract surplus value, that's the framework by which moves get made, even if unintentionally." Sullivan here clearly stakes out a descriptive claim about the trade market, but he doesn't present any evidence for it. So, I went looking briefly and found this article on DRays Bay that falsifies the descriptive claim, at least as applied to starting pitchers. Granted, I did not do a thorough search and perhaps there is evidence that the trade market works on contract surplus value. But I'm skeptical. First, the MLB trade market is not a stable entity. Many factors determine how and when moves get made and those factors change as the game changes. It seems unlikely they would coalesce around any particular conception of market rationality. Second, most trades do not involve commodities like Hamels. Even if you discover that the average trade market behaves a certain way it will not be clear whether Hamels is just another commodity or a market-bending one. I don't yet see a reason to treat the surplus value analysis as a descriptive analysis. But I'm open to hearing more evidence.
On the other hand, the analysis might be a normative one: surplus value is how teams ought to negotiate trades, even if they don't do it that way. Against this I give you the wise words of dajafi: "The delta on any prospect--ANY prospect--is much bigger than it is for any established MLB player, let alone a superstar, that the whole 'surplus value' argument is stupid. It's quite possible that even Boston's best prospects will fall way short of their hype." By delta, dajafi is referring to the distribution of outcomes for prospects. In order to find the surplus value of a prospect the analysis must average the outcomes for prospects of various. In order to distinguish levels of prospects it uses the rankings and buckets them into classes (very best, very good, good, middling, etc.). The average production outcomes for these classes are then used to calculate their surplus value based on what they earn on average over their lifetime of team control. What this average obliterates is the bust rate of even the very best prospects. When considering what Hamels is worth in a trade, we are not concerned only with the average outcome for the class of prospect involved. We are also concerned with the likelihood that the particular prospect will actually achieve that average outcome or something within a certain range of it. If Hamels's expected surplus is equal to a given prospects surplus but Hamels is much more likely to achieve his surplus value or something close to it, then the prospect is not as valuable as Hamels. Given the rate at which prospects bust and the aging curves for pitchers like Hamels, Hamels is much more likely to provide the surplus value Sullivan determined for him than any one prospect no matter how good that prospect is, let alone Eduardo Rodriguez. The surplus value analysis overlooks the risk prospects present. At the very least, I've seen no evidence that the risks are equivalent. So, why should we use surplus value analysis to devise rational trades?
From this criticism of the normative analysis we can generalize a criticism about the dangers of the march of quantification in baseball media. The sabermetric revolution was great at poking holes in the traditional wisdom about what is valuable in player performance and how to construct the best teams. Researchers posed questions, sought evidence, and drew limited conclusions. The further the research developed, the more reliable the quantificational methods seemed, and the more eager fans became to expand the methods of sabermetric research to other areas where heuristic judgments still reigned. Today, at times, it seems to this baseball fan that any quantification is preferred to any heuristic. But quantifications hastily developed can be just as distorting as the traditional wisdom locked away within particularistic judgments about how to value performances, players, prospects, etc. Any quantificational scheme presupposes that the thing quantified is fungible and fungibility is only possible where the factors contributing to the thing measured are satisfactorily accounted for. Perhaps before we leap to strong conclusions about, e.g., how to quantify the assets in a trade, we should work a little harder at developing the basis for that quantification. Let's put questions before conclusions.
tl;dr: