About six weeks ago, I posted about a piece about homefield advantage that can be found here:
http://www.thegoodphight.com/2008/7/3/564256/homefield-advantage
Today, I will continue with a few more interesting observations about homefield advantage, and a request for explanations about a few interesting phenomena.
The main discoveries a few weeks ago were that while homefield advantage had very little to do with the specific team or stadium, it seemed to affect nearly every aspect of a team’s game, and games within the same division had smaller homefield advantages.
I have played with the numbers a little bit more and I have found a very bizarre result. I checked further and found that looking through ten years of data, from 1998-2007, the result indeed held. Keep in mind that with ten years of data, the high homefield advantage in middle games of series and the low homefield advantage in last games of series are statistically significant.
Homefield Advantage:
Overall: 53.8%
First games of series: 53.9%
Middle games of series: 54.7%
Last game of series: 52.7%
I have tried to figure out the cause of this peculiar result and I do not have a solid explanation why. I have several hypotheses and a lot more segmentations of the data, and I would appreciate if anybody has any explanation about this. If you are interested, read on.
This is a very large difference. With approximately 8,000 games in each group, the odds of having results this different by luck are very small. The difference between middle games of series and the rest of the time is statistically significant at the 95% level, and the difference between the last games of series and the rest of the time is as well. While even ten years is not enough data to directly disprove the difference between the first games of series and middle games of series, it does seem like there is a persistent difference there and that the cause is not random.
The importance of this result is very significant. If there is a simple reason for this, then a team who can control their schedule or travel patterns may be able to neutralize/capitalize on its cause and improve their record by about a game per year. Helping a team by one game in the standings is significant and this is valuable analysis.
I tested a number of things. No explanation seems immediately obvious, but there are some interesting results here for sure.
The following table lists the percentage of games won by the home team on various days of the week. It also lists the home team’s winning percentage for the first, middle, and last games of the series for all days where a significant numbers of days happened.
Day
|
Total
|
1st game
|
Middle game
|
Last game
|
All
|
53.76
|
53.87
|
54.68
|
52.70
|
Monday
|
53.81
|
54.77
|
|
48.12
|
Tuesday
|
53.78
|
54.12
|
54.00
|
49.54
|
Wednesday
|
53.91
|
|
55.64
|
51.65
|
Thursday
|
52.91
|
52.22
|
|
53.05
|
Friday
|
53.58
|
53.54
|
53.14
|
|
Saturday
|
54.52
|
|
54.74
|
|
Sunday
|
53.61
|
|
|
53.57
|
All weekdays
|
53.62
|
|
|
|
All weekends
|
53.90
|
|
|
|
It is clear that the day of the week does not have too much of an effect on winning percentage. The similarity between winning percentage for the last game of the series when it is on Sunday vs. when it is on a Wednesday or Thursday indicates that day games are not a particularly large effect on homefield advantage. It is worth noting that when the last game of the series is on a Monday, there is actually a road-field advantage! There are only 320 of these games, but such an extreme difference is actually statistically significant.
As you may recall from my previous post on this topic, homefield advantage is larger between teams in different divisions. For non-interleague games, intra-division home teams won 53.3% of their games, compared with 54.1% of home teams in inter-division games. This is something to keep in mind when we consider the next factor—length of series.
As it turns out, three games series have much larger homefield advantages than two game series and four game series: Keep in mind that these may be partially biased by the fact that inter-division may be more or less likely to be played in 4-game series than intra-division games.
2-game series: 50.63% (715 series)
3-game series: 54.21% (5,730 series)
4-game series: 52.53% (1,372 series)
While there weren’t many other lengths for series, for the sake of completeness:
1-game series: 59.70% (67 series)
5-game series: 57.27% (22 series)
Clearly, there is something different about the three game series. Four game series seem to have much less bias for the home team. Two game series seem to have even less. The differences between two-game and three-game series and between three-game and four-game series are both statistically significant.
Breaking down the 2-game series:
1st of 2 game series: 51.47%
2nd of 2 game series: 49.79%
1st of 3 game series: 54.42%
2nd of 3 game series: 55.08%
3rd of 3 game series: 53.14%
1st of 4 game series: 52.62%
2nd of 4 game series: 52.19%
3rd of 4 game series: 53.28%
4th of 4 game series: 52.04%
Clearly, the middle game phenomenon occurs primarily within three game series. With that in mind, I decided to check whether an off-day for either team had a noticeable effect on homefield advantage. After all, four-game series infrequently follow off-days. The results were surprising.
When the away team did not have an off-day before starting a series, the home team won the first game of the series 54.7% of the time. When the away team did have an off-day before starting a series, the home team only won 52.32% of the time. This difference is weakly significant, but not strongly. Chances are that it is true that away teams will do better in the first game of a series with an off day before a series starts, but there are not enough games (only 5,141 first games with no day off the day before and only 2,370 first games with an off day before). Home teams did not do much better after an off-day.
Next, I checked the odds of the away team winning the next day. As it turns out, the opposite of what you might expect is true. When an away team played their first game of the series the day before, after no break between series, they lost 54.21% of their middle games of series. When an away team played their first game of the series the day before, after a break between series, they lost 55.95% of middle games of series. Clearly, this seems counterintuitive until I tested how home teams did. Two days after an off-day, home teams won 56.72% of the middle games of series. For middle games of series when the home team had not had the day off the day before, they only won 53.68% of games.
This difference is huge and statistically significant. For some reason, if a home team had a day off before starting its series, they didn’t necessarily perform so much better in the first game back, but they did terrifically in the second game. In fact, they did so much better that it seems like the away team is doing worse two days after an off-day, but instead it is probably that so much of the time, that happens on Wednesday when both away and home teams have off-days on Monday before the series starts. This difference seems to be the key behind the results.
It seems to me that there is some effect of traveling here—away teams do worse if they had to travel without an off-day. There is also something that makes home teams do much better if they had an off-day two days before. The absence of these two factors make the last game of a series more evenly split between home and road teams.
So here are my questions:
(1) Why do home teams play so well in the second game of a series when they had an off-day before the series started, and why is that effect absent when they did not have an off-day before the series started?
(2) Why is homefield advantage so weak in two-game and four-game series?
(3) Is there something else that I can check to explain these results better?