Raul Ibanez may not be streaky (statistically)
I ran a homemade study this morning to test the notion of "streakiness" in regards to Raul Ibanez. Using Baseball Reference game logs, I condensed his stats into analysis block and tested to see if there was a correlation in OPS from one block of games to the next block. Each analysis unit includes k games started; the stats from the games in which he came on a sub are included however (basically, for each non-started game I carried over his hitting statistics to the next game). I ran the study for k = 3,5,7,10, and 15.
|
Raul |
k=3 |
k=5 |
k=7 |
k=10 |
k=15 |
|
2002 |
0.11 |
0.05 |
0.31 |
0.22 |
-0.10 |
|
2003 |
-0.22 |
-0.09 |
-0.17 |
-0.44 |
-0.22 |
|
2004 |
0.04 |
-0.33 |
0.01 |
-0.52 |
-0.71 |
|
2005 |
0.11 |
0.11 |
-0.15 |
-0.65 |
-0.20 |
|
2006 |
-0.04 |
0.14 |
0.05 |
-0.40 |
0.10 |
|
2007 |
0.21 |
0.22 |
0.11 |
0.18 |
-0.17 |
|
2008 |
0.04 |
0.21 |
-0.05 |
0.15 |
-0.11 |
|
2009 |
0.08 |
0.09 |
0.19 |
0.32 |
0.34 |
|
2010 |
-0.03 |
0.02 |
-0.11 |
-0.15 |
-0.55 |
|
Avg |
0.03 |
0.05 |
0.02 |
-0.14 |
-0.1 |
3 comments
|
0 recs |
Do you like this story?
Comments
Interesting study. I don’t know if this would show the streakiness, since chances are a streak would begin in the middle of a block somewhere.
Two comments about methodology:
- I think you should include a bigger k (30?), as his streaks tend to be 3-5-8 week affairs, rather than 1 or 2 weeks. Although I suppose a streak would then be even more likely to begin in the middle of a block.
- An average of the coefficients doesn’t seem useful to me, since that blends together all the positive and negative correlations. What you could do is calculate an average of the absolute values of the coefficients.
Celebrating 50 years of slightly more Phils wins than losses: 1962-2011
Thanks for the comment. In regards to the smallish k values, yes, the idea I had was to make sure the k was small enough so that, probabilistically two data periods are more likely than not to be in the same regime.
In terms of averaging the years together, I guess my idea was that if some years show negative correlation between periods, it would seem that the hitter is anti-streaky in those years, and should be used as evidence against the hypothesis that the hitter is streaky.
I have another idea to switch the study around to the following: for each year, locate the worst n game stretch and best n game stretch, and observe the numerical difference between the two. Unlike the above study, where a correlation either exist or doesn’t, the numerical difference will have to be compared to other players to determine if it is unusually high or low. A couple of other slight problems might exist too: a power hitter’s production might be inherently more volatile (without necessarily being streaky; the returns of any short period of time are heavily dependent upon a few balls clearing the fence.)
Anyhow, I had a technical difficulty with this study that was corrupting the results, hopefully I can fix this shortly.

by 






























