On New Stats

I love baseball, and I love math.* It’s only natural, then, that I love baseball statistics. One of the amazing things about baseball is the way it lends itself to analysis; it’s a series of discrete events, most of which are one-on-one battles between a pitcher and a batter, and those battles are much easier to break down than the team-on-team clashes involved in, say, basketball or football.

*Generally. I make exceptions for complex analysis and differential equations, both of which were banes of my existence.

A few weeks ago, I wrote about my views on some traditional statistics, and I touched several times on the philosophies behind those stats. Most critics of newer stats, as far as I’ve seen, argue that things like WAR and FIP are not rooted in the realities of playing and observing baseball in the same way traditional stats like ERA and BA are. However, if you take a closer look at some of those newfangled stats, you’d find a lot in common with the ways that players, coaches and fans have been talking about the game for generations.


Let’s take that closer look.


Quality Starts (QS): I tend to call less familiar statistics new rather than advancedbecause some of them aren’t terribly advanced. Quality starts aren’t advanced at all. In fact, they’re significantly simpler than the stat they’re designed to replace: pitcher wins.

A starting pitcher is awarded a Quality Start if he pitches six or more innings and gives up three or fewer earned runs.

That’s it. Start the game, pitch six innings, allow no more than three runs and you have a quality start. Anyone with a basic understanding of baseball can get that in about 30 seconds.

When people inside the game talk about great pitchers, they won’t often talk about winning a lot of games; after all, a pitcher may pitch well and lose or pitch poorly and win. Rather, they’ll say that a great pitcher always stays in the game and gives his team a (good) chance to win.

If a pitcher gives his team a quality start, his team has a good chance to win.

The most common criticism I’ve heard of quality starts goes like so: If a pitcher pitches exactly 6 innings and gives up exactly 3 earned runs, his ERA for the game is 4.5. Granted, that’s not especially good, but:
  • That’s the bare minimum for a quality start. The average across all quality starts is less than half that.
  • A 4.5 ERA isn’t good, but it’s not terrible either. The league ERA for all starting pitchers is about 4.2, so we’re talking about a difference of three tenths of a run per nine innings.
  • Put another way, a hypothetical pitcher who pitched exactly six innings and gave up exactly three earned runs every time out would end up with 192 innings pitched (in 32 starts) and a 4.50 ERA. Most teams would take that from a fourth or fifth starter.
  • In fact, our Mr. (Just Barely) Quality Start would have outperformed at least one starter on eight of the last 10 World Series champions.
Quality starts are far from perfect, and there’s a reasonable case to be made that the bar should be raised to exclude the bare-minimum performance. Even as defined today, though, they’re a quick way to see whether a pitcher consistently keeps his team in the game. Isn’t that exactly what teams are looking for?


Defensive Runs Saved (DRS): For many years, the only fielding stats anyone cared about were errors and fielding percentage. I’ve talked a bit about the definition of an error before, but let’s briefly revisit it now: A fielder is charged with an error when he misplays a ball that he should have turned into an out with ordinary effort.

As written, this definition makes some sense. If a fielder doesn’t make a play he should have been able to make, we charge him with an error. Fine. The problem is that the rules don’t really define what constitutes ordinary effort, leading to some very strange scoring decisions. For instance:
  • If an outfielder loses a ball in the lights and it falls out of his reach, that’s a hit.
  • If an outfielder loses a ball in the lights, catches sight of it at the last moment, reaches out to make a catch and has the ball bounce out of his glove, that’s an error.
  • If a shortstop fields a ground ball cleanly but hesitates before throwing, allowing the batter to reach safely, that’s a hit.
  • If a shortstop fields a ground ball cleanly but makes a high throw that pulls the first baseman off the bag, allowing the batter to reach safely, that’s an error.
  • If a third baseman reaches out to catch a ball that slips under his glove, that’s a hit.
  • If a third baseman reaches out to catch a ball and then bobbles it on the transfer to his throwing hand, that’s an error.
You get the idea. The problem with errors isn’t that the official scorers are making bad judgment calls, although that doesn’t help. The real problem is that the stat requires a judgment call in the first place. If we had a clear, objective definition of ordinary effort, errors would work just fine.

Now, let’s talk about Baseball Info Solutions. BIS uses video scouting to figure out exactly what the league-average (in other words, ordinary) defender does on every possible batted ball. They categorize every ball hit in play during the year based on its direction, distance, speed, and type (i.e. ground ball, fly ball, line drive, bunt and ‘fliner’ – somewhere between a fly ball and line drive). Based on this data, BIS determines the probability that a given fielder will make a play on each ball and assigns a plus/minus value.

For instance, the BIS data may indicate that the average shortstop makes a play on a hard-hit ground ball that’s heading right for the normal shortstop position about 90 percent of the time; in other words, it’s basically a sure thing. If Stephen Drew misplays a ground ball that fits that description, he’s docked .9 points in the plus/minus system.

The great thing about this methodology is that it gives credit for good plays along with bad plays. The data may say, for instance, that a shortstop only makes a play on a soft ground ball three feet to his left 20 percent of the time. If Drew fields a ball that’s hit there and gets the out, that’s an excellent play, and he’s credited with .8 points in the plus/minus system.

Under these rules, it doesn’t matter whether Drew makes a spectacular diving catch to stop that ball or gets a good jump and makes it look easy. It doesn’t matter whether he charges and barehands the ball or throws with his feet planted. The only thing that matters is the only thing that should matter: making the out.

Now, this plus/minus system is in terms of plays made, not runs saved. It doesn’t quite get to the point of defense: run prevention. There’s a lot of math that goes into turning Plus/Minus into Defensive Runs Saved, including adjustments for extra-base hits, runs saved on bunts, double plays, outfield assists and more. Advanced defensive stats have come a long way, but they still have a long way to go, and right now we need three years’ worth of data to really understand how good (or not) a player is in the field.

My point, though, is that the core of the system is objective, thorough observation of actual plays. Coaches and managers say all the time that “you have to see him play” to understand his defense. Well, the folks behind DRS haveseen him play.


Base-Out Runs Added (RE24): Okay, one reasonable strike against this stat: Its full name is pretty unwieldy. Fortunately, the abbreviation RE24 is much easier.

Anyway, let’s talk about base-out states. There are eight possible base states:
  • Bases empty
  • Runner on first
  • Runner on second
  • Runner on third
  • Runners on first and second
  • Runners on first and third
  • Runners on second and third
  • Bases loaded
Likewise, there are three out states: nobody out, one out and two out. Combine the two and we have the 24 base-out states: nobody on/nobody out, nobody on/one out, nobody on/two out, runner on first/nobody out, runner on first/one out and so on.

As baseball fans, we intuitively know that each base/out state has a certain run expectancy, even if we can’t attach a number to it. With nobody on and two out, we don’t expect our team to score at all; it could happen, but it’s unlikely. With the bases loaded and nobody out, we expect a big inning, and it’s actually pretty disappointing if only one run scores. The only difference between RE24 and these casual observations is that RE24 uses thousands of games’ worth of data to actually quantify those expectations.

Every time the base/out state changes, the batter gets (or loses) credit for the change. Suppose Dustin Pedroia comes up to bat leading off an inning; with nobody on and nobody out, the run expectancy is about half a run. If Pedroia belts a double, the run expectancy goes up to about 1.1, and he gets credit for the difference, .6 runs. If he makes an out, the run expectancy drops by about .25 runs, and he’s debited the difference.

When an actual run scores, the batter is awarded a full run in addition to the change in base-out states. For instance, if David Ortiz comes up and drives in Pedroia with a single, the run expectancy changes from about 1.2 (runner on second, nobody out) to about 0.9 (runner on first, nobody out). Ortiz is awarded .7 runs on the play, which is 1 for the run that scored minus .3 for the change in base-out states.

All sort of things that baseball people love are incorporated into RE24. If a player consistently makes productive outs, that shows up in his RE24; if he executes on a hit-and-run, that shows up in his RE24; if he hits the ball behind the runner to get an extra base, that shows up in his RE24. When a walk is as good as a hit (e.g. with the bases empty), it’s as good as a hit in RE24. When a walk is not as good as a hit (e.g. with a runner on third), RE24 reflects that reality, too.

If DRS is a better fielding percentage, RE24 is a better RBI. It’s rather like measuring three feet with a yardstick instead of estimating using your arm: The intent is the same, but the result is much more useful.

If you want to know how much a hitter contributes to scoring actual runs, look no further.


Wins Above Replacement (WAR):Right now, WAR is probably getting more attention than any other advanced stat, and that attention is only going to increase once the season ends and the Cy Young/MVP debates begin. Many fans dislike WAR, I think, because its calculations seem so obscure: Some math nerd just throws a bunch of stats together and gives you a single number that sums up a player’s entire value.

Let’s look at WAR from the most fundamental perspective. Baseball, like most sports, consists of offense and defense. Offense can be further broken down into hitting and baserunning, while defense consists of pitching and fielding.

One of the fundamental assumptions of advanced baseball analysis is that a run scored on offense is equivalent to a run saved on defense.* Like most things analytic, this isn’t a crazy assumption. Considering great pitchers and great hitters get comparable contracts, I think it’s fair to say that baseball teams hold pitching and hitting in roughly equal esteem. As far as fielding is concerned, how many times have you heard a coach or manager say that “even when he’s not hitting, he’s saving runs in the field” or the hyperbolic “he saves 100 runs a year** with his glove?” Incidentally, 100 runs is about the amount you’d expect a good hitter to score or drive in.

*This actually isn’t quitetrue – runs saved are very slightly more valuable than runs scored. That’s because runs saved hit an absolute lower bound: If you give up 0 runs, you’re guaranteed to win (or at least to not lose). Conversely, a great offense can’t completely ensure victory; it’s possible to score 20 runs and still lose.

**Taken extremely literally, this is true. If, say, Stephen Drew went out to play shortstop without a glove, the Red Sox would almost certainly give up 100 more runs over the course of the year.

WAR just takes all of a player’s contributions on offense and defense and puts them together. When Miguel Cabrera produces runs with his bat, they go into his WAR bucket; when he gives runs away in the field, they come out of the bucket. When Andrelton Simmons saves runs with his glove, they go into his bucket. When Michael Bourn produces runs with his legs, they go into his bucket. There’s also a positional adjustment: A first baseman who hits 30 home runs is good, but a shortstop who can do that is much, much better.

One of my favorite things about WAR is that it makes it easy to compare players with wildly disparate skill sets. According to Baseball Reference, Dave Parker and Dave Concepcion are perfectly tied at 40.0 WAR. Apart from their shared first name, those guys had almost nothing in common: Parker was a slugging outfielder who won two batting titles and belted over 500 doubles; Concepcion was a slick-fielding shortstop who hit .267 for his career. They produced their value in completely different ways, but when you add it all up, they were worth the same number of wins to their teams.

Now, WAR is not perfect by any means. The defensive component is especially suspect, at least in small samples, because it uses one year’s worth of data; as I mentioned above, we really need three years’ worth of information to accurately assess defensive value. WAR doesn’t account for clutch* performance the way RE24 does, and there’s reason to believe it significantly underrates catcher defense. As with all advanced stats, WAR has a margin of error; the difference between a 6.7 WAR player and a 7.1 WAR player is small enough that we can’t conclusively say which is the better player. Certainly, nobody in the analytic community is arguing that we should just blindly give the MVP award to the player with the highest WAR.

*Much has been made of the idea that people who like advanced stats think “clutch” doesn’t exist. The actual issue with clutch is that it’s not an especially repeatable skill; many hitters have great clutch performances one year and poor performances the next year. WAR ignores clutch performance by design because it’s intended to help us understand a player’s true talent level; it ignores things that are likely to fluctuate and focuses on things that the player can directly control.

Nevertheless, WAR is the best stat available to tell us, in the aggregate, how good (or not) a player is. It’s a stat that invites further conversation: Once you know that Mike Trout is worth 9 wins to his team, you’re likely going to wonder how he does that. How much of that is from his hitting? How much comes from his baserunning? How much is defense? How much is his positional adjustment?

There’s a lot more to understanding a player than his WAR (or, for that matter, any other statistic), but every one of these tools makes our picture a little more complete.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: