Here’s one of those phrases that The New Yorker would label as “sentences we never read past”:
"I was skimming the program for the annual meeting of the American Statistical Association . . ."
But really, where else can you find not only research on “Modeling Sparse Generalized Longitudinal Observations with Latent Gaussian Processes” but also on managerial strategies in baseball, parity in the NFL and the accuracy of sports predictions?
It’s striking how many statisticians who study weighty matters—how to
tell if a cancer drug works or a compound is dangerous—got their start
studying sports statistics.
“A lot of us
really enjoyed baseball statistics when we were growing up, and that’s
how we got into the field,” biostatistician Michael Schell of the Moffitt Cancer Center in Tampa told me.
So I got in
touch with Jack O’Gara, who wrote the book on using statistical
techniques to spot chicanery in business (that would be the 2004 “Corporate Fraud: Case Studies in Detection and Analysis”). Now retired, O’Gara has put his statistical skills to use analyzing baseball, especially cheating.
In the
business world, he focused on what he calls inflection points, a sudden
discontinuity in data. That is what he saw, galore, when he analyzed the career stats of pitcher Roger Clemens.
Clemens, of course, was named in the Mitchell Report,
which last December reported that an alarming number of baseball
players had taken performance-enhancing drugs such as steroids.
(Clemens' section starts on p. 215.) Clemens and his camp deny it.
O’Gara decided to see if stats could tell us anything.
One of the
most telling is ERA Margin, which compares a pitcher’s earned run
average in a given year to the league average. It’s more informative
than ERA alone because it controls for weird things like hitters
league-wide being in a slump (which would reduce every pitcher’s ERA
but not ERA Margin), or the use of a juiced ball that year, which would
raise pitchers’ ERAs but, again, not the margin. The ERA Margin tells
you how one hurler is doing compared to his peers.
O’Gara
compared Clemens’ ERA Margins to those of the 20 post-World War II
pitchers with the most wins, turned in by legends such as Warren Spahn,
Tom Seaver and Bob Gibson. Through age 34, Clemens’ margin was 1.09,
notably better than the others’ 0.6. Fine, the guy was an ace.
But from age
35 to 40, when most pitchers fade, Clemens’ margin was 1.18, compared
to 0.43 for the other greats. Here's where it gets weird: from age 41
to 45, it was 1.30, while the others’ was a negative 0.01. That is, the
other great pitchers’ margin shrank as they got older, falling more in
line with the league average and normal aging patterns, but Clemens’
soared. As O’Gara put it, “Clemens is the only pitcher who gets
progressively better as he ages into the post-40 category.”
When the ERA
Margins for baseball’s top 10 or top 20 pitchers each year is graphed,
Clemens is better than the rest when he was 29 and 30, then twice
more—three performance peaks while none of the top 20 had more than
two. “More significantly, the second two peaks were higher than the
initial peak, which occurred in the presumed prime of his life,
contrary to normal aging patterns,” O’Gara says. “At age 43, Clemens
had the seventh-best season [measured by ERA Margin] since World War
II.”
Of the 20
best ERA Margins since 1945, all came when the pitcher was 34 or
younger (average age: 28), with the exception of Clemens, who did it
when he was 35 and again when he was 43. The best two-year average ERA
Margins cluster when pitchers were in their late 20s (Sandy Koufax: 29-30; Greg Maddux:
28-29), and again Clemens’ best coming when he was 43-44 stands out.
Clemens’ ERA margin at age 43 was the best in the majors that year and
the best-ever for a 43-year-old.
Testimony
taken for the Mitchell Report and given to Congress this spring
included accusations from a trainer that he injected Clemens, which the
pitcher denies. As it happens, the three periods when the trainer said
he administered shots “correspond to performance bursts by Clemens,”
says O’Gara. “The ERA for these three periods totaled 1.92 over 183
innings, significantly better than his career average ERA of 3.12.”
As has been
widely reported, in 1996 Clemens, then 34, was coming off a sub-par
1995 season and struggling through the first months of the '96 season,
his last of 14 with Boston. “Then he suddenly went from being mired in
the worst multiple year performance of his career (the preceding one
and 2/3 years) to his best two-year-plus performance of his career,”
says O’Gara. “He averaged a 2.91 ERA margin for the remainder of 1996,
better than for any single calendar year.”
One baseball
statistician I asked about this analysis warned me against “guilt by
graph”—that is, concluding that someone was juiced based on stats
alone. “Stats can tell you if someone’s performance is unusual, but by
definition a great player has an unusual performance,” he said. See,
for instance, this post by another stats guru.
So in Clemens’ case, do the stats lie—or expose a lie?