On Stats

One thing I’ve noticed about baseball fans is that we all love stats. Some are more in love than others, but I’ve never met a baseball fan who didn’t at least mention batting average or RBI or wins or something. (By comparison, I know several football fans who don’t care about the numbers at all, except the number of games their favorite team wins.) Even fans who say they don’t like stats usually use them to back up their arguments.

When I write about baseball, I write about stats often. I have strong opinions about many common (and not-so-common) stats and about how they should be used. Some stats are good as they are, some are OK but often misused, and a few are flat-out useless.

Let’s take a moment to define the two major types of baseball statistics.

counting number stat is a record of how much or how many of something a player produced. Obviously, counting numbers depend on playing time. A lesser player who has a long career may accumulate higher counting numbers than a more talented player with a shorter career; Harold Baines, for instance, racked up over 200 more career hits than Ted Williams. 

Counting numbers measure what has happened, which makes them useful for picking award winners at the end of the season or evaluating candidates for the Hall of Fame.

Common counting numbers for hitters: HR (home runs), H (hits), RBI (runs batted in)
Common counting numbers for fielders: E (errors), A (assists), PO (putouts)
Common counting numbers for pitchers: IP (innings pitched), W (wins), K (strikeouts)

rate stat is any stat that’s normalized for a certain length of time – for instance, per at-bat, per fielding chance or per nine innings. Rate stats are great for comparing players who have different amounts of playing time, but in small samples, they tend to vary a lot. It’s not at all uncommon for a hitter to maintain a .400 batting average over the course of a week or even a month. It’s much, much harder to bat .400 for a full season. 

Once we have enough data for the stats to stabilize,* though, rates can be projected forward, which makes them very useful for predicting future performance.

*Just how much data is required depends heavily on the stat we’re talking about. Strikeout rates, for instance, tend to stabilize pretty quickly: A month’s worth of data is usually enough. Fielding stats, in contrast, are subject to huge variation: We need three years’ worth of information to really get a grasp of a player’s true talent level.

Common rate stats for hitters: BA (batting average), OPS (on-base plus slugging)
Common rate stats for fielders: Fld% (fielding percentage), RF (range factor)
Common rate stats for pitchers: ERA (earned run average), OBA (opponents’ batting average), WHIP (walks plus hits per inning pitched)

Now that we’ve got some definitions down, on to the stats!

Batting Average (BA or AVG): Shorthand for hits divided by at-bats. Batting average tells us one thing: how often the batter gets a hit of some kind when he comes to the plate. In situations where the team really needs a hit, but any hit will do (say, a runner on third with two out), batting average is very important.

Batting average probably shouldn’t be the one stat that we’re shown when we look at the lineup at the start of a game – OPS, for instance, would work much better. Still, it plays a non-trivial role in our understanding of the game and the talents of individual hitters.

The problem with batting average is that analysts and commentators use it as shorthand for a hitter’s overall prowess, which is absurd. First of all, batting average doesn’t include walks, even though coaches have been saying that a walk is as good as a hit for generations. Second, batting average treats all hits the same; a home run counts just as much as a single. Clearly, a hitter’s batting average doesn’t give us a complete picture of how skilled (or not) he is.

My proposal: Instead of batting average, call it “hitting average.” That tells us exactly what the stat means: how good the batter is at getting hits. We already informally say that the league leader in batting average is “leading the league in hitting” and that a high BA player is a “.300 hitter,” so why not make it official?

On-Base Percentage (OBP): Here’s one of the core truths of baseball: Baserunners are precious. On defense, your goal is to not allow runs, and the most foolproof way to keep the other team from scoring is to keep your opponents from reaching base. On offense, your goal is to score runs, which requires getting on base. On-base percentage, then, is one of the most important stats in the world.

OBP is pretty simple to calculate: It’s just times on base divided by plate appearances.* It suffers from some of the same drawbacks as batting average in that all hits and walks (and hit-by-pitches) count the same, but at least the name of the stat tells us exactly what it’s supposed to describe.

*Technically, OBP doesn’t count a few things that are considered plate appearances, namely catcher’s interference and fielder’s obstruction. Those calls are so rare at the major league level, though, that it makes very little difference. As an aside, I tend to think OBP should count interference calls as times on base – Jacoby Ellsbury, who leads the Majors with four times reached on interference this year (no one else has more than one), would probably appreciate this.

My proposal: The one thing that irks me about OBP is that it charges the batter with an out when he reaches on an error – though that’s more of an issue with errors (see below) than OBP itself. If no actual out is recorded, it shouldn’t count as an out on the batter’s record; if you’re not going to call it a time on base, at least remove it from the OBP equation.

Runs Batted In (RBI): There are many stats that I like, and many that I dislike. RBI are definitely in the latter column. In fact, I’d call them my third-least-favorite stat (see below for one and two).

The first strike against RBI is that they’re actually pretty complicated to count. The batter is awarded an RBI when:
  • A runner who is already on base scores on a hit, without the aid of an error.
  • The batter hits a home run, thereby batting himself in.
  • The batter hits a fly ball that’s caught for an out, and a runner on third base tags up and scores (this is called a sacrifice fly).
  • A runner scores on a ground ball that’s turned into a single out, but not a double play.
  • The batter draws a walk or is hit by a pitch with the bases loaded, forcing in a run.
So, okay, the definition of an RBI is convoluted. That’s not the main reason I don’t use the stat, though. RsBI* are a poor measure of hitting ability because they depend more on the skill of the batter’s teammates than the skill of the batter himself. Like all counting numbers, RsBI depend on playing time, but even if we adjust for games played or plate appearances, RBI chances are not distributed evenly. Every at-bat is a chance to get a hit, but (home runs aside) not every AB is an RBI opportunity.

*Another strike against RBI: I’m never sure what the plural is. RBIs sounds wrong. Technically the singular and plural could (and should) both be RBI (Run Batted In versus Runs Batted In), but if I want to make it clearly plural, that’s no good. RsBI it is.

Someone who bats behind players who are great at getting on base will have the opportunity to drive in more runs than someone whose teammates are bad at getting on base. Moreover, there’s baserunning skill to consider: If I hit a double* with Jacoby Ellsbury on first base, I’ll almost certainly get an RBI. If I hit a double with David Ortiz on first… yeah, probably not.

*Let’s leave aside for a moment the absurdity of saying that I could ever hit a double against an MLB pitcher… or an MiLB pitcher… or even a decent Little League pitcher…

The league leaders in RBI are almost always cleanup hitters on good teams. Now, often those guys are good hitters, but there are better ways to measure their hitting skill than counting runs batted in. As constructed, the RBI is a useless* stat.

*I draw a distinction between “useless” and “meaningless” here. No stat is truly meaningless. If you ask me to choose between Hitter A, who has 120 RBI, and Hitter B, who has only 80, giving me no other information, I’ll take Hitter A and I’ll be right more often than not. However, we live in a world where we have access to better stats than RsBI, which means they serve no real purpose.

My proposal: Really, I’d like to just stop counting RBI altogether. If you insist on keeping them, though, I’d institute a rule that a batter can be awarded no more than one RBI on a single hit (perhaps allowing for a second if the hit is a home run). There’s really no material difference, as far as the hitter is concerned, between a hit with runners on second and third and a hit with just a runner on second – they’re both base hits with runner(s) in scoring position. Why should one count as double the other?

Slugging percentage (SLG): Remember how batting average counts all hits the same? Slugging percentage is the answer to that problem. It’s calculated the same way as batting average, but doubles count double, triples count triple and home runs count four times as much as singles. Alternatively, take a batter’s total bases and divide by his at-bats.

The problem with SLG, such as it is, is that it overcompensates for that flaw in BA. A home run is clearly more valuable than a single, but is it four times as valuable? Put another way, would you say that a batter who hit a home run and struck out three times had as good a game as someone who hit four singles in four at-bats? SLG says yes. I’m guessing all but the most homer-crazy fans would say no.

Still, a batter’s slugging percentage is a good measure of a valuable skill: his ability to drive in runs. Somewhat ironically, SLG does a much better job of describing a batter’s RBI prowess than his actual RBI totals.

My proposal: Call it slugging average – I realize some people already do, but as far as I’ve seen it’s called a percentage much more often. It’s not a percentage of anything, though; it’s an average.* More to the point, calling it an average clearly presents SLG as an alternative to BA.

*This would apply to OBP too, but at least OBP has a reasonable excuse. If we called it on-base average, we’d have to abbreviate it OBA, which would get confused with opponent’s batting average for pitchers.

On-base plus slugging (OPS): Just add up a batter’s OBP and SLG and you have his OPS. There are multiple issues with OPS (the denominators are different, for one), but it gives you a much better at-a-glance look at a hitter’s overall skill than his batting average. I’ll cite OPS pretty often.

Earned Run Average (ERA): On the surface, a pitcher’s ERA is easy to calculate: just take his earned runs allowed, divide by his innings pitched and multiply by nine. The issue lies in the concept of an “earned” run.

An earned run, essentially, is a run that scores without the aid of an error or a passed ball. If a batter reaches on a fielding error (say the shortstop makes a high throw) and then comes around to score, ERA argues that that run isn’t really the pitcher’s fault. Thus, it doesn’t show up in his statistics.

The thing about ERA is that it tries to do something awesome. It acknowledges that the pitcher isn’t completely responsible for every run that scores; some of the blame falls on the defense behind him. Then it attempts to edit out the impact of the defense and focus solely on runs that are the pitcher’s own responsibility.

Unfortunately, ERA does so in an incredibly ham-fisted and ineffective way.

First of all, let’s look at the definition of an error. It’s defined in MLB rule 10.12, which is far too long to reproduce here, but the short version is that a fielder is charged with an error when he fails to make a play that could have been made with “ordinary effort.”

Central to the definition of an error is this concept of “ordinary effort.” I think the intent of this rule was to judge defensive players against the league average; if Robinson Cano fails to pick up a ball that the average second baseman would have reached, he’s charged with an error. In practice, fielders tend to be judged against themselves: If Cano isn’t quick enough to make a play at all, it’s scored a hit rather than an error.

In most cases, an error is scored when a fielder touches the ball (or comes very close to touching it) and then makes a physical misplay.* For instance, making a wild throw counts as an error. Dropping a ball after catching it on the fly counts as an error. Picking up a ground ball and then bobbling it counts as an error (unless the fielder recovers to get the out anyway). Failing to get to the ball, even if it’s a fairly routine play, doesn’t count as an error.

*Curiously, mental mistakes are not considered errors. If the shortstop throws to the wrong base, thereby allowing a run to score, that run counts against the pitcher’s ERA; if the shortstop makes a high throw to the correct base, thereby allowing a run to score, that run doesn’t count against the pitcher’s ERA. Likewise, if the first baseman is pulled off the bag to catch a wide throw, it’s an error; if he fails to cover the bag, it’s a hit.

Thus, if a pitcher plays in front of fielders with good range who make the occasional catching or throwing mistake, it’s all good: Runs that score on those errors don’t hurt his ERA. If a pitcher plays in front of a bunch of slow, rangeless liabilities* who don’t make many errors because they don’t get to the ball in the first place, tough luck: Runs that score on the hits that sneak past them do hurt his ERA.

*Oh hello there, Detroit Tigers! (Though the addition of Jose Iglesias remedies much of this issue.)

It’s bad enough that errors themselves are seriously flawed, but what the official scorer is asked to do with those errors is borderline insane. To determine whether a run is earned or unearned, the scorer must reconstruct the inning as it would have gone without the error or passed ball. Essentially, he takes out the errors, assumes that everything else would have gone the same way and goes from there.

In some cases, this makes a degree of sense. If a runner on third scores on a passed ball* and the next batter hits a single, it’s considered an earned run on the grounds that the passed ball made no difference; the runner would have scored on the single if he hadn’t come home already. In other cases, though, reconstructing the inning requires some assumptions that are flat-out stupid.

*Passed balls, by the way, are arguably even dumber than fielding errors. Quick definition: A pitch that isn’t caught and allows a runner to advance is called a passed ball if the catcher could have caught it with “ordinary effort” and a wild pitch otherwise. A wild pitch is charged to the pitcher (and thus any runs that score count against his ERA); a passed ball is similar to an error on the catcher (and thus runs that score don’t count against the pitcher’s ERA). I used to think this was a reasonable distinction; then I saw some rather sound evidence that pitchers have more influence on passed balls than catchers do – in fact, they have more control over passed balls than wild pitches. The official scorers may as well be flipping coins.

Consider the following scenario: With one out, Batter A hits a single and advances to second base on a throwing error. This brings up dangerous Batter B with a runner in scoring position, and he’s intentionally walked to bring up light-hitting Batter C. In this case, it backfires, as Batter C slugs a three-run homer. Batters D and E strike out to end the inning.

All three runs that scored on that play are earned. The official scorer assumes that Batter A would have held up at first without the error, that Batter B would have walked anyway, and that Batter C would have brought them all home with the big fly. In other words, according to the rules, the error didn’t matter.

Of course, this assumption is ludicrous: Batter B was intentionally walked precisely because there was a runner in scoring position and first base was open. Take the error out of the equation, and the other team would have pitched to B with first base occupied, possibly changing the entire course of the inning. He may have struck out or even hit into a double play. He may have also walked, but we can’t know for sure.

That exact situation is unusual, but it speaks to the larger problem with ‘reconstructing’ the inning as though the error didn’t happen. Individual at-bats are not independent events. Pitchers throw differently with runners on base. Hitters change their approaches depending on the game situation. Managers call for sacrifice bunts to advance runners who reached on errors. Assuming that everything that happened after the error would still have happened without the error is absurd.

Just as ERA treats individual at-bats as separate events, it also treats individual innings as separate events. If Stephen Drew misplays a ground ball that would have been the third out of the inning and Felix Doubront proceeds to give up five home runs in a row, none of those runs are considered earned because “the inning would have been over.” Really? Doubront isn’t even a little bit responsible for that?

Moreover, while ERA tries to filter out the effects of defensive miscues, it doesn’t do anything to adjust for the effects of excellent plays. If Jon Lester gives up a fly ball over the right field fence and Shane Victorino makes an awesome leaping grab to pull it back in, shouldn’t Lester, by the same logic, be charged the runs that would  have scored on a homer? He’s responsible for the batted ball, and he had nothing at all to do with the catch.

I can’t quite hate ERA, even after all that, because I really do love the intent of the stat. A for effort. F for execution.

My proposal: Just drop the E and use Run Average (sometimes called RA9). It’s simple to understand and accurately describes what happened on the field. There are better stats like FIP (Fielding Independent Pitching) that actually do what ERA attempts to do.

Saves (SV): Saves are the marquee stat for relief pitchers – specifically for closers. The definition of a save is tough to explain, so I’ll just quote the rules. A pitcher is awarded a save when:
  • He is the finishing pitcher in a game won by his team.
  • He is not the winning pitcher.
  • He pitches at least 1/3 of an inning.
  • One of the following:
    • He enters the game with a lead of no more than three runs and pitches at least one inning.
    • He enters the game with the tying run on base, at bat or on deck.
    • He pitches for at least three innings.
Strike one against the save is that the definition is so arbitrary and convoluted. It’s my second-least favorite stat, though (and it’s a close thing with the one I’m about to get to) because of the impact it’s had on the game.

While I tend to think that statistics are primarily for fans and front-office types, they also impact the way managers run their teams. In many cases, that impact is a good thing: Earl Weaver, for instance, rather famously understood the value of on-base percentage in constructing his lineups. The save, however, has led to some very… questionable bullpen usage patterns.

How many times have you seen a manager hold his best reliever out of a tie game, even well into extra frames,* because he’s waiting for an arbitrary save situation? How many times have you seen a lesser reliever cough up a lead in the seventh or eighth while the closer sits on the bench? It’s crazy.

*The most egregious example of this in recent memory, and possibly ever, came on April 17, 2010, when the Mets played the Cardinals in a game that lasted 20 innings. Mets manager Jerry Manuel had closer Francisco Rodriguez warming up in every inning from the 9th through the 18th, only to sit him back down because the game was still tied. By the time K-Rod finally took the mound in the 19th inning, he’d thrown more than 100 pitches in the bullpen; unsurprisingly, he gave up the tying run.

My proposal: I really like the way the Rolaids Relief Man Award distinguishes between a regular save and a “tough save,” although I think their criterion (the tying run must already be on base) is too restrictive. I’d propose the following addition: A reliever who records at least twice as many outs as the size of his team’s lead is awarded a tough save. Thus, getting the last two (or more) outs of a one-run game would count, as would the last four outs of a two-run game, the last two innings (six outs) of a three-run game and so on.

Wins (W): Here’s the first issue with pitcher wins: They’re defined even more arbitrarily than saves. On the surface, granted, it’s a pretty simple definition: The pitcher who’s in the game when his team takes a lead that holds up for the rest of the game is awarded the win. A starting pitcher must also go at least five innings to be eligible for a win; a relief pitcher, though, merely needs to record at least one out.

Wins made some sense back in the days when pitchers were expected to finish what they started. Today, though, each team uses multiple pitchers in almost every game, and the value of the win as a statistic has gone down accordingly. At least saves are always awarded to the guy who closes the door at the end of the game; often, the pitcher who gets the win contributed little to his team’s winning effort.* If the starter pitches seven shutout innings, but his teammates don’t score until after he’s relieved in the eighth, should a one-inning pitcher be awarded the win over the seven-inning pitcher? Really?

*A few years ago, Alan Embree actually earned a win without throwing a pitch. He entered the game in the eighth with two outs and immediately picked a runner off first base to end the inning. The Rockies took the lead in the next half-inning, and Embree was the pitcher of record. Now, I think that’s actually sort of cool, but it also speaks to the silliness of the win as an individual stat.

More to the point, if the starter leaves the game with a lead, a relief pitcher promptly coughs up that lead, and then the offense takes the lead back… should the win really be awarded to the reliever who blew it? It’s absurd, yet this actually happens on a pretty regular basis. We call it a “vulture win.”

My biggest issue with the win as a statistic, though, isn’t a quibble with particular wins – it has to do with the nature of baseball. Pitchers don’t win games, nor do they lose games. Even a guy who pitches a shutout relies on his offense to score at least one run and his defense to make some plays in the field. Even a hypothetical pitcher who strikes out all 27 batters he faces and hits a home run* needs someone to catch his pitches.

*This has never happened in real life – at least, not in organized baseball – but it did rather ludicrously happen at the end of the movie “The Scout.”

When we award a win to a pitcher, we’re assigning a false significance to his accomplishments – or more accurately, things that we perceive to be his accomplishments. Even people who’ve been following baseball for a long time can assign too much value to wins because, well, they’re called wins. Hey, Bartolo Colon won 21 games!* He must be the best pitcher in the league! Give that man a Cy Young!

*Sorry, Johan, your teammates weren’t good enough for you to be the best pitcher in baseball. Better luck next year.

However, people who understand baseball are starting to realize that wins aren’t so important. From the writers’ perspective, Felix Hernandez’ Cy Young a few years ago showed that W-L record is no longer the be-all, end-all of awards voting. More encouragingly, I saw an interview recently in which 10 current and recent MLB players shared their thoughts on pitching statistics. Not one of them cited wins as the most important stat, and several pointed out that individual wins are pretty flawed.

No, my concern about wins is for the new fan, the fan who’s just starting to understand the game. When we say that a pitcher won 15 games or lost 15 games, we’re implicitly saying that pitching is the be-all, end-all of baseball, rather than one of three critical components. What we choose to call the stat says more about us than it does about baseball itself.

My proposal: It pains me to say this, it really does. In case it’s not already clear, I love the history of the game, and so much of that history is bound up in individual stats. The 300-win club is one of the most exclusive in all of sports, and with the steroid taint* on some of the game’s other great milestones, it may be the most cherished achievement we have left.

*I’m aware, of course, that one very prominent member of the 300-win club is caught up in the PED scandal. However, I don’t get the impression that fans view the 300 win milestone as diminished in any way by his inclusion. Compare that to hitting 500 home runs, which doesn’t seem anywhere near as impressive today as it did a generation ago.

More so than any other individual stat (yes, including home runs), pitcher wins tell us how much baseball has changed from generation to generation. In the early decades of baseball, it wasn’t at all uncommon for great pitchers to rack up 30 or more wins per year. A generation later, 25 was the gold standard, then 20, and now it’s not unusual for the best pitchers in the league to end up in the mid-teens. Cy Young gave us one of the most storied records in all of sports with his 511 wins;* today, great pitchers struggle to even make it halfway to that total.

*It should be noted that while Young’s win record is his most famous, it’s probably not his most unbreakable. Cy Young pitched an unbelievable 749 complete games; the active leader, Roy Halladay, has less than 1/10 as many. The only other pitchers to even start that many games, ever, are Don Sutton and Nolan Ryan. If Greg Maddux, arguably the greatest pitcher of the last 30 years, had finished every game he started, he still would have been short of the record.

I hate pitcher wins, and I love pitcher wins.

My proposal, for real: Stop counting them. It’ll hurt, but it’s for the good of the game.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: