Using data to be uncertain

Kyle Kendrick's BABIP in his last 14 starts is .342, after a .272 BABIP in his first 13 starts. Over his career, Kendrick's BABIP is .282, and .279 since the start of the 2012 season (including the last 14 starts). Obviously the first 13 starts are in line with his career BABIP, but as the Phillies consider whether to tender Kendrick a contract, the question is whether the last 14 starts are an aberration from his career mean or a statistically meaningful trend.

Certainly BABIP may veer up or down, but over time, it settles near the mean. However, there are bounds on how far those variations from the mean might be if luck is the explanation. Outside of those bounds, and over time, there may be a trend and luck might not be a satisfying explanation.

In his career, Kendrick has pitched in 191 games, with the mean BABIP of .282 and a standard deviation of .138. With games as the unit of analysis, this suggests that in 95% of Kendrick's games, his BABIP would be somewhere between .262 and .301. Given the fairly large sample size, we wouldn't necessarily expect a BABIP in those bounds for any particular game, but to see a BABIP repeatedly outside of those bounds over a period of time suggests that luck is a bad explanation for the variation.

Kendrick's individual game BABIP bounced around in the first half of 2013, from a low of .105 to a high of .375. In the second half, he's exceed that .375 in seven of 14 starts, and the .375 was itself well outside the expected range. The odds of this happening purely by chance are less than 5%. So something's going on right?

Well, in 5 of those other second half starts, Kendrick's BABIP is below the low bound expectation of BABIP, with little evidence of any trend, except for 3 mid-August starts with BABIPs all above .350. Otherwise, he's pitched to a good BABIP, then a bad one.

So what do the last 14 games point to? A BABIP trend that is extremely unlikely and likely due to "something"? A streak of back luck and nothing more?

This is the challenge for a GM who uses data. The data require interpretation, and could lead to two conclusions, one of which is wrong:

1. Kendrick's BABIP has been so high recently that chance is an unlikely explanation and something is wrong -- he's missing spots, he's hurt, etc.

2. Kendrick's BABIP has been high lately, but it's likely an anomaly and he'll still usually be around .282 which is valuable for a mid-rotation starter.

My two cents -- Selecting the last 14 games is cherry picking a cut-off point. His BABIP this year is .307 and over the past two years, it's .279. In other words, increasing the sample size to include last year's starts shows that his BABIP is right around the career mean. The last 14 starts represent about 7% of his career games and less than 5% of his career starts. The fact that his BABIPs in the past 14 starts have been so far out of his career confidence bounds isn't all that surprising -- we'd expect 5% of his games to be outside the bounds. It's odd that so many of his second half starts have been so much higher than his norm, but there's no reason to think it's anything other than odd at this point. The most likely outcome is that his BABIP will settle in and regress back to that career mean around .285. That said, it is entirely possible that his BABIP will remain generally high and settle to a new, higher mean level.

Data may not lie, but they also can't predict the future. Even a GM who is sophisticated with data deals with significant uncertainty. With his solid health record and age, I'd probably keep him around for another couple of years.

