Friday, October 25, 2013

Calling the 2013 MLB Most Valuable Player Award Races (Part 1-Process)



This is Part I in a two part series, explaining the writer's process in forecasting the Major League Baseball's Most Valuable Player Awards. Read here for the predictions themselves.

In this post, I'll detail the process I followed in my attempts to predict the 2013 Most Valuable Player awards winners. This was not meant to be an exercise in actually deciding who the deserving recipients are in, but an attempt to guess how the Baseball Writer's Association of America will vote. To do so, I looked back at the last few seasons of American League MVP voting outcomes and attempted to determine what sorts of traits(statistics) the writers are holding in the highest esteem.

Compiling the data is time consuming, and I hoped that I could gain the necessary insight by looking at the most recent years. I was not necessarily attempting to develop a formula to predict the MVP, as that has been tried by others a number of times, with mixed results. However, I did attempt to come up with a sort of weighted value system that would award points for various categories and give me a framework to start my analysis with.

I started looking at past MVP races with three basic areas in mind that I wanted to determine weights for: sabermetric, traditional, and intangible.

Next, as I gained some insight as to what influenced races of the past, I started plugging different weights in to a framework that would assign a value based off of where in a given category a player ranked at the end of the year. After multiple rounds of tweaking, I settled on a point system that did a nice job at reflecting how the writers had voted in recent years. The final stats that I ended up using in the system were: wins above replacement (WAR), runs, runs batted in (RBI), home runs (HR), stolen bases (SB), and batting average (AVG). I also assigned varying degrees of value for the success of a player's team, whether or not he played high quality defense at a premium position, the perception of his base running, and certain accomplishments (like the Triple Crown) that can influence voters.

In assigning weights to each category, I used my own discretion based on observation and whether or not the points system was reflecting the way the actual voting went. For instance, I gave the components of the triple crown a total point value that was just slightly greater than what I gave for WAR. Last year's race showed that the voters value these stats in total, more highly than WAR. Miguel Cabrera won the Triple Crown and the MVP, while Mike Trout led the league in WAR and finished second. As I applied those values to previous years' MVP races, they seemed to be weighted properly.
This sort of comparison-observation tweaking, over multiple trials, led me to my final point system.

Under the points system, to receive points in a category, a player had to finish in the top five of said category, or qualify (i.e. lead his team to the playoffs, win the Triple Crown). I multiplied the weights of each category by 5, and then awarded points by where a player finished within the top five. For each place below first that a player finished in the category, he lost the original amount. When written out it may sound complex, but it is actually extremely simple. For example:

In 2011, Mike Trout led the league in WAR so he received 10 points (original weight of 2 * 5) in that category. Robinson Cano finished second in WAR so he received 8 points (10 minus the original weight of 2). Miguel Cabrera finished third so he received 6 points (10-2-2=6)... Hopefully this paints the picture.

The intangible weights were added last as I reworked the system to accurately reflect recent years' voting outcomes.

I want to add that this system is far from perfect and that the weights I used were largely determined through my own interpretations. I think that they could definitely be done more accurately, but it would require quite a bit more time. I'll continue to touch on areas this system falls short as I go. In general, I wanted it to give me a rough outline of who expect at the top of the voting this year, and then be able to tweak the results based other factors and observations to arrive at my predictions.

Here's the final template for the point system:



Category
Weight
Total Pts

1st
2nd
3rd
4th
5th
WAR
2
10
10
8
6
4
2
R
0.25
1.25
1.25
1
0.75
0.5
0.25
RBI
0.75
3.75
3.75
3
2.25
1.5
0.75
HR
0.75
3.75
3.75
3
2.25
1.5
0.75
SB
0.25
1.25
1.25
1
0.75
0.5
0.25
AVG
1
5
5
4
3
2
1

Div
1
5
WC
0.5
2.5
TriCrown
2
10
Exc/Exp
0.5
2.5
Def/Bsr
1
5

Exc/Exp stands for 'Exceeding Expectations'. To me, that category was used to reflect something like Mike Trout bursting onto the scene in 2012. As I worked on my predictions, I sort of morphed the 'Exc/Exp' into a category that could also represent a player reaching a milestone like 50 homeruns. The 'TriCrown' category is self-explanatory, and also unnecessary for predicting the 2013 race, as no one won it. 'Def/Bsr' awards points for the premium defense and high end base-running (assigned at my discretion based on how I believe a player is perceived by the writers and using actual statistics). 'Div' and 'Wc' show the points awarded to a player for leading his team to the playoffs, either as a division champ or as a wildcard.

So, after all the tweaking, the point system worked well in giving me an idea of where and why the top 5 finished where they did. Here's the 2012 race as an example:

Actual voting:


AL 2012
Voting (1st)
Player
362 (22)
Cabrera
281 (6)
Trout
210
Beltre
149
Cano
127
Hamilton

Saber and Traditional Points using the System:

Player
WAR
R
RBI
HR
SB
AVG
Cabrera
6
1
3.75
3.75
5
Trout
10
1.25
1.25
4
Beltre
4
3
Cano
8
0.75
Hamilton
3
3

Intangibles Points using the System:

Player
Playoffs?
TriCrown
Exc/Exp
Def/Bsr
Cabrera
5
10
Trout
2.5
5
Beltre
2.5
5
Cano
5
Hamilton
2.5
5

Total System Points:


Player
Total
Cabrera
34.5
Trout
24
Beltre
14.5
Cano
13.75
Hamilton
13.5

For sake of space, I won't post each year's results, just some of the takeaways that influenced my predictions for this years race. Of the years I tested, the system always predicted the winner correctly, and usually had the top five finishers in an order that was close to their actual placing.

2012 was an easy win for Cabrera, and I believe the point system gives a fairly accurate representation as to why. Despite not finishing highly in the league in traditional power stats (RBI and HR), Trout finished highly in the voting. I believe his WAR got some respect, as did his high batting average, well rounded play, and the excitement factor he provided as a rookie superstar. Hamilton was hurt by his lower batting average, and lack of high caliber defense at a premium position.

Now, 2011 was an interesting year. A pitcher won. I assigned points for pitching stats for my system in a similar way that I did offensive stats. Although, for 2011 I don't think it matters much for the take away. Justin Verlander lead the AL in ERA, Wins, strikeouts, base runners allowed per inning (WHIP), and innings pitched. He also led the league in pitcher WAR. He clearly had an extremely dominant year. So, the takeaway is: pay attention to dominant pitching. Jacoby Ellsbury finished as the first non-pitcher in the voting and this was interesting as he beat out another centerfield candidate in Curtis Granderson, despite the fact that Granderson led his team to a division title. I think this supports the idea that higher batting average and defensive prowess are important to the voters. My point system actually sold the 3rd place finisher, Jose Bautista, short. This was a result of him finishing just outside of the top 5 in a number of categories. I think its obvious, but in assessing the 2013 class, players with strong offensive numbers across the board that don't necessarily finish very highly in multiple categories shouldn't be ignored.

In 2010, the system accurately showed the winner, Josh Hamilton, again. He also led the league in WAR and AVG. Again, batting average bears significant importance. Hamilton also played very solid defense in 2010 and showed more speed in the form of stolen bases. While he didn't finish in the top five in a lot of categories, I believe he was perceived as a very well-rounded contributor. Also worth noting, is that Hamilton had far fewer strikeouts this season than he did in 2012, when he finished further down in the voting. History shows that strikeouts can affect MVP voting.

In 2009, the system again showed the winner accurately (Joe Mauer by a large margin). Mauer led his team to the division title while leading the AL in average from behind the plate (offensive as a catcher is an intangible to consider). He didn't finish among the top 5 in the AL in traditional power stats, but he did have respectable numbers, none-the-less, and I'm sure the voters took notice. The voters really rewarded Mark Teixeira for leading the AL in RBIs and HRs while playing excellent defense. He beat out his high average hitting teammate, Derek Jeter. The Yankees division title helped both of these guys. Lastly, Miguel Cabrera and Kendry Morales finished 4th and 5th respectively, but did not receive a lot of points in my system. This can be attributed to them finishing just outside of the top five in a number of categories. This again reminds me not to overlook very good offensive campaigns that didn't necessarily finish on category leader boards.

Now, with these results in mind, plus other observations made throughout the process I'm going to address 2013.

This is Part I in a two part series, explaining the writer's process in forecasting the Major League Baseball's Most Valuable Player Awards. Read here for the predictions themselves.

Labels: , , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home