Part 2: How to Predict Postseason Success in Baseball
Wouldn't it be nice to predict the next time your team will hoist the Commissioner's Trophy? |
While Part 1 looked at driving in runs without hitting home runs, the second hypothesis has more to do with hitting the league's most elite pitchers in the postseason. Will this hypothesis lead to some statistically significant results?
Performance Against Top Pitchers
Hypothesis
Against top-line starters and relievers, it is very difficult to hit home runs, so my theory is that teams that have a more simplistic batting approach will have a better opportunity against these very good pitchers. Also, because a team is very likely to face great pitching in the postseason, I also hypothesize that teams that face good pitchers (I have categorized “top pitchers” as those who finish in the top 20 of ERA minus, or ERA-, as calculated by Fangraphs) more often and/or have more success against them (in terms of runs scored per nine innings) are more likely to have playoff success.
Results
By hand, I compiled the top 20 starting pitchers in terms of ERA- every year from 2003-2012, and then used Baseball Almanac to record every game these pitchers played against teams who made the playoffs that year. I compiled total innings, total runs scored, total games and runs scored (not just earned runs) per 9 innings for each team each year. The reasoning behind looking at all runs, and not just earned runs, was because runs of any kind are so hard to come by in the postseason, or when facing a top pitcher, and even if a run is unearned, most of the time the opposing team would still need to string together a couple of hits to allow that unearned run to score.
When I finished compiling data on team performances against top 20 pitchers, I ran individual regression analyses with PV being the outcome variable, and these new statistics being the predictors. However, no single statistic correlated to having a high PV. Even when using multiple predictors with the top 20 pitching stats, there was still no significant correlation.
Conclusion
Based on the results of my tests of these two hypotheses, I unfortunately did not find any significant regression models that could predict PV from any of these statistics, I was not hugely surprised by this outcome for a few reasons. Because I only looked at playoff teams in the past ten years (many of the statistics I used in these models were not compiled before then), my sample size was smaller than ideal to start with. Also, there is high multicollinearity among so many of these statistics. This means that it was it was difficult to interpret the individual coefficients.
Also, having too many predictors, or controlling for too many variables, makes it extremely difficult to find a model that is both significant, and that makes sense from a baseball perspective. There were a few interesting findings, such as how LDp is marginally correlated with playoff wins (but not correlated with playoff series wins), but for the most part, no major discoveries were made.
Possible Improvements
Also, having too many predictors, or controlling for too many variables, makes it extremely difficult to find a model that is both significant, and that makes sense from a baseball perspective. There were a few interesting findings, such as how LDp is marginally correlated with playoff wins (but not correlated with playoff series wins), but for the most part, no major discoveries were made.
One of the changes I could have made included how I calculated the top 20 pitchers statistics. I chose the number 20 randomly, but I also compiled the top 20 pitchers regardless of league. In hindsight, I probably should have compiled the top 20 pitchers from both the American and National Leagues in each year. Also, maybe there is a better statistic than “runs per 9 innings” to gauge how well teams do against these top pitchers. Also, when my second hypothesis failed, I started to compile 28 new statistics from Fangraphs’s “high leverage situations” split. I originally tried this because essentially all playoff batting situations can be considered “high leverage.”
However, these statistics were compiled from late and close game situations, rather than ability to drive in runs without hitting home runs, which is what my two hypotheses were related to. My time might have been better spent looking at statistics with runners in scoring position. Those kinds of statistics would have been more relevant to my hypotheses, as driving in runners in scoring position is not only the most effective way to score off top pitchers, but it is also a skill that requires the batter to shorten his swing, and have a more simplistic batting approach. As I continue this research in the future, I will take into account all of these factors in my quest to find a formula for postseason success in Major League Baseball.
However, these statistics were compiled from late and close game situations, rather than ability to drive in runs without hitting home runs, which is what my two hypotheses were related to. My time might have been better spent looking at statistics with runners in scoring position. Those kinds of statistics would have been more relevant to my hypotheses, as driving in runners in scoring position is not only the most effective way to score off top pitchers, but it is also a skill that requires the batter to shorten his swing, and have a more simplistic batting approach. As I continue this research in the future, I will take into account all of these factors in my quest to find a formula for postseason success in Major League Baseball.
Labels: ADistler, MLB, Moneyball, Opinion, Original Content, statsandfigures
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home