Major League Baseball recently released a report about pitcher injuries. It was the culmination of interviews with 200 subject-matter experts about the growing rash of arm troubles in the sport, and the word “stuff” was used 47 times. The report includes entire sections about the concept of stuff metrics — like Stuff+ — and how they may relate to pitcher health.
The study of the physical characteristics of a pitch, and how they relate to outcomes, has been improved immensely over the past few years by new technology and machine learning techniques. Now a number like Stuff+ can tell you how good a pitch is based only on its velocity, spin, and movement. The recent explosion in the use of pitch types like sweepers, hard sliders and cutters across the league can be tied back to these metrics, which pointed to these pitch types as underrated.
“It’s been an important tool for us as we evaluate and develop our pitchers,” said one major-league pitching coach, one of multiple team employees who were granted anonymity because they weren’t approved to talk about these metrics publicly.
“Stuff+ has really helped bridge the gap between how the public and front offices think about pitchers and pitch quality,” said an MLB team analyst. “Teams keep their own metrics internal, obviously, but given how similarly teams build these metrics and how similar Stuff+ is to what these teams have, Stuff+ helps the casual observer understand what teams are seeing in pitchers.”
But it’s not just the doctors, coaches and analysts who care about these metrics. A player helped inspire one of the first stuff metrics. Brandon Bailey, now a pitching coach in the Dodgers organization, had the generative question in 2018 when he was pitching. He had a curve and a slider, and the Astros wanted him to either throw the curveball harder, or the slider with more movement. He didn’t know which idea was better.
“He asked us: Which should I do?” said Kyle Boddy of Driveline. “We were like, ‘Oh, that’s a good question. Can we quantify this?’ That was the first question that led us to develop Stuff+.”
Clearly, these stuff metrics are here to stay. They’re in the bullpen when the coach is assessing his guys, they’re in the offseason plans when pitchers get homework assignments, they’re in the scouting reports hitters mull over before the game and they’re in the office when the analysts are trying to find undervalued players to acquire. They’re now up on many of the best statistical websites in baseball and in most teams’ lexicons when it comes to developing and acquiring players, and they’re increasingly part of the regular parlance of the sport.
But, before we get into the ramifications of these new numbers, it makes sense to understand them better.
What is Stuff+?
Aptly named, Stuff+ is a number that evaluates a pitcher by studying their movement, velocity, spin, and release points. It’s generally trying to remove the context of how a specific pitch performed on the field by looking at how certain combinations of shapes, velocities, and spins usually perform across baseball and then assigning that value back to the pitch itself. What started with a revelation like “hard sliders that drop a lot are good” has become more complicated, but the analysis comes from the same place.
Pioneered by former Cubs research & development analyst Jeremy Greenhouse in 2009, the framework and concepts within were pushed forward by analysts like Harry Pavlidis at Baseball Prospectus and many others in the field, including Alex Chamberlain with FanGraphs and Tom Tango with Major League Baseball. Working with Ethan Moore, we debuted a Quality of Stuff metric here at The Athletic in 2020 before Max Bay (now with the Dodgers) brought Stuff+ here a year later and eventually on to FanGraphs, where it now lives in a sortable leaderboard. Driveline Baseball first posted about their model, built by now-Phillies R&D head Dan Aucoin, in late 2021 but had already been using it before they went public. Now there are many competing models available publicly, and most teams have their own private versions.
The most basic and powerful pillar of Stuff+ is that velocity is good. That’s no surprise, but it’s not just that the velocity of the fastball is good for itself. The velocity of the fastball is also good for the secondary pitches, which we define off the fastball using velocity as the “anchor.” This is because hitters have to time the fastball — they have to be able to swing early and hard enough to hit the pitch that is still the most common in baseball. When they do so, they open themselves up for mistakes and swings and misses.
Here’s a look at Max Fried’s fastball and curveball, which sit a whopping 18 mph apart. Look at where the curveball is when the fastball crosses the plate.
If you swing to time that fastball, you’ll miss the curveball by feet, so velocity is very important for whiffs. Movement is also key because it can influence the results of a ball in play. Movement can be difficult to talk about and understand in pitching terms because it’s defined theoretically. Here’s an example.
We know that “ride” is good on the fastball, and that Logan Gilbert has 16 inches of it. That means the spin on his four-seamer helps the pitch counteract the effect of gravity. The ball doesn’t rise, but it does drop less than the hitter would expect it to. Gilbert’s fastball has 16 inches more ride than a pitch that spins like a bullet and is only affected by gravity. It turns out that the Mariners’ starter actually throws a slider with one inch of horizontal movement and zero inches of vertical movement, so almost exactly this theoretical bullet pitch. If we overlay his fastball and slider, we can get a sense of what 16 inches of ride looks like in the real world.
Using machine learning, Stuff+ can test all sorts of different combinations of movement and velocity and spin and release points to find the best stuff. That makes it hard to produce top-line outcomes like “ride is good.” Even if ride is good, it’s more complicated than that because velocity, spin, and release still matter.
Here’s an example of some feature interactions within the model. In this case, you have slider velocity (x-axis) against slider drop (y-axis), where the colors indicate the Stuff+ of each combination of velocity and drop around the league. If you look for the red (good), then you’ll find that generally it’s good to throw your slider harder, but that drop still matters. All of the features have this sort of complicated interaction, and that adds up to a single number.
One surprise from these models is that release point is incredibly important. What seems likely is that hitters see a release point, and then automatically expect a certain type of movement from that slot. Pitchers that can play with that expectation — like Josh Hader does with his unique fastball — do really well in stuff models.
In this next visual, we can see how Bryce Elder and Clay Holmes throw their sinkers from almost the same arm slot but with different movement. Elder’s sinker shape is more expected given its high release point, so his sinker has an 80 Stuff+ (a Stuff+ score of 100 represents the average for all pitchers). Holmes gets four more inches of drop on his sinker from the same slot, so he has a 112 Stuff+. And the results follow, as Elder has allowed a slugging percentage that’s more than 100 points higher on his sinker in his career.
This finding has turned some of baseball’s traditional wisdom on its head, as a short pitcher with lots of ride (like Shota Imanaga) might receive preferential treatment from today’s teams over a taller pitcher with the same ride. Unexpected movement is huge.
“I wish I could be shorter, actually,” the 6-foot-3 Cal Quantrill once told me. “If I was shorter, it might improve the angle of some of my pitches.”
Unable to change their stature, pitchers have often turned to the baseball’s seams to produce unexpected movement. Clay Holmes has leveraged his knowledge of “seam-shifted wake” — a phenomenon in which seams can gather on one side of the ball and drag it in a certain direction — to make his sinker move like pitches thrown from lower arm slots. He gets tremendous drop from an over-the-top slot because of the seam effects on the fastball he throws.
These are the things that teams seem to value in today’s pitchers: velocity, spin, and unconventional combinations of movement and release points. That’s what you’ll see at the top of the Stuff+ leaderboards today, too.
What has Stuff+ brought to the game?
The research that produced Stuff+ contained discoveries that have changed how teams think about player acquisition, player development and in-game strategy.
The most obvious thing that came out in the first runs of the stuff models was that sliders performed so much better than any other pitch in the model. This led to the idea that they were being underutilized. In every season since Statcast was introduced, the league has thrown more sliders.
A closer inspection of the best sliders revealed that a certain type of sideways slider was particularly useful, especially against same-handed hitters. That pitch didn’t have a single name at first, going by the Dodger slider, or the whirly in the Yankees organization, and eventually turning into the sweeper in the collective consciousness. Some teams went all in, like the Mariners as they taught it wholesale in the minors, and others were more tentative, but there have been more sweepers with every season since Statcast was born.
These models have been able to incorporate seam-shifted wake since Statcast went to Hawkeye technology in 2020. Since then, we’ve seen an increase in sweepers, cutters, and sinkers, which can all use seam effects to increase unexpected movement. The last pitch listed is the most remarkable. Sinkers fell out of vogue during the first pitch-tracking era (2008-2015) when ride was first quantified, because a good four-seam with ride gets more whiffs. Now that teams know how to produce seam-shifted movement better, they’re able to produce sinkers that reliably affect the way batted balls perform, and they’re coming back.
This itself may end up as the biggest legacy of the stuff movement among analysts. The fact that the batting average on balls in play (BABIP) was around .290 across the league year in and year out led Voros McCracken to create a theory of Defensive Independent Pitching in 1999. Because pitchers demonstrated more year-to-year control over their strikeout and walk rates, he reasoned, it was better to hone in on those when evaluating pitchers. Essentially, pitchers weren’t seen as having control over what happens on a ball in play, even if that’s not the most correct way to sum up his research.
In the most recent revamp of Stuff+ on FanGraphs, though, the link between pitch shapes and batted-ball outcomes becomes even more clear. Sometimes the statistics have to catch up to the common wisdom, and it turns out that having more sophisticated tracking data helped the model understand that certain physical characteristics of pitches were a reliable predictor of things like ground-ball rates, home-run rates, and — yes — more extreme BABIPs than McCracken might have projected in the past.
“I think that’s probably simply because BABIP does such a poor job predicting itself — it needs help,” said McCracken about these new findings. “Strikeouts already predict strikeouts really well.”
In a way then, Stuff+ doesn’t refute his research, it simply refines it. Now Stuff+ can help us project BABIP better and show just how much control a pitcher can have over a ball in play.
Analysts tend to like models like Stuff+ because it helps them acquire pitchers who can do things (like suppress hits and home runs) that old models won’t pick up on. Pitching coaches value these models because — after evaluating only a handful of pitches — they can produce roadmaps for their pitchers who want to improve.
“Stuff+ has been an accurate indicator of how a particular guy’s pitches are performing at the big-league level — not only relative to the league but in relation to his arsenal,” said a major-league pitching coach. “If one is doing really well — this might impact how much we are throwing it, meaning we may bump up the usage. If one is doing poorly — it allows us to double-click on it and investigate why this might be the case: Is it the strikes? Is it the whiff? Is it the shape of the pitch?”
So, when a team picks up a pitcher with a funky release point, and coaches a pitcher to throw more sliders, pick up a sweeper, add a sinker or tweak a pitch shape, they are often acting in ways that Stuff+ would guide them. This has probably been a part of the rise of strikeouts across the league, because pitchers can optimize their stuff in ways that before were more intuitive and are now more precise.
If this Pandora’s box has been opened, it doesn’t seem likely to be shut, but there are a few hopeful ways forward. One is for hitters to use the same sorts of scientific tools to help their process. This is underway now, with the most modern approaches to hitting development including technology and concepts that pitchers have long valued. As hitters understand their bat paths with bat path grades that now resemble early Stuff+ grades, they can better fight fire with fire.
And then there are rules changes that can help the hitter. We’ve seen things like sticky stuff enforcement, the pitch clock and shift restrictions that lean toward boosting offense. One team analyst thought that baseball could paint lines on the ball that would help hitters better see the spin and better react to pitches. That could be viable, given the other changes baseball has recently seen.
Of course, since Stuff+ values velocity, spin and funky movement, and helps pitchers see the way toward optimizing their arsenals, it becomes obvious that there might be a link between the rise of these metrics and the rise of injuries across the game. Putting these things on one table brings that into focus.
But the research linking specific aspects of stuff and injury rates is a little murkier. For certain, velocity has a huge role. But is it how close a pitcher throws to their own personal maximum, as Glenn Fleisig found in his peer-reviewed study? Then why does a bigger velocity gap not lead to better health outcomes? Or is velocity generally a stress on the elbow, as Driveline found? And if 80 mph sliders are fine, but 90 mph sliders are actually more stressful, as at least one study found, then maybe breaking ball velocity is one of the biggest strains on elbows? Despite Dr. Keith Meister sounding the alarm bells about sweepers, there is no research directly linking sweepers to more risk. Are pitchers throwing with too much intensity in their pitch design sessions? How would that be knowable across the sport when those sessions aren’t tracked by the league?
As the rate of Tommy John surgeries on torn elbow ligaments has plateaued, overall days on the injured list have not. The biggest problem facing baseball is probably not that stuff metrics have found a way to characterize excellent pitches, though — that kind of work has been going on for nearly 20 years and seems impossible to stop. The problem is that velocity is good and is also a stressor, and there’s no way to tell a young pitcher who might make the big leagues that he needs to throw softer. They’re capable of doing the math, and they’ve made a calculated choice, as Justin Verlander pointed out about his pitching style.
In other words, players are always going to try to be better, just like Bailey when he asked the question that begat one version of Stuff+. If the sport is serious about improving injury, funding a bilateral effort would be a start, and adding rules changes that incentivize teams to carry pitchers who can go further into games (like a reduction in injured list slots) would do more than simply asking players to stop trying to throw nastier pitches.
What’s next?
Not everyone likes Stuff+, of course, beyond those linking it to injury.
“You can never get pitching into one number,” said Max Scherzer about the stat. “Even if you are able to, you’re still missing something.”
The effort to quantify aspects of pitching that stuff metrics miss is well underway despite his skepticism. Driveline (with Mix+ and Match+) and Baseball Prospectus (with their recently released arsenal stats) have attempted to put a number on the value of having wide arsenals with different movement and velocity profiles. Over at FanGraphs, Michael Rosen did some work on release angles that might better quantify command. To improve as a pitcher, you have to understand what the best do. So analysts will continue to try to define the best processes for pitchers.
“If you cannot measure it, you cannot improve it,” as Lord Kelvin, the legendary physicist, once proclaimed.
“We posted leaderboards with the Reds — we posted Stuff+, Command+ and times to the plate, those were the things we cared about,” said Boddy of his time as pitching coordinator. “Our coaches were being evaluated on that, we were determining who our best coaches were based on it. We found coaches that helped pitchers outperform our Stuff+ projections, like Brian Garman, our pitching coach at Dayton, and Forrest Herrman, our pitching coach at Daytona. Big shock, both are coordinators now.”
That said, every time analysts make an advancement that spreads throughout the game, like Stuff+, it quickly ceases to be an advantage. Boddy thought that 28 of 30 teams had their own internal Stuff+ model, and other analysts agreed that he wasn’t far off. So maybe the future is more about the exciting research being done in biomechanics that could set your team apart. Over at NTangible, they feel they’ve built a better test of makeup — the attitude and energy that fuels the most successful players — which is notoriously difficult to define, scout, and measure. At the winter meetings, people from all parts of baseball emphasized soft skills as a way to successfully bridge the gap between data and play on the field.
Despite the urge to quantify everything, there’s also the truth that the unquantifiable will always be important, and will remain a possible edge for a team that understands it best (including finding a way to quantify it). These more nebulous aspects of the game will always be a source of chaos in the machine of any metric. And that’s a good thing — it’s a sport, not a simulation.
(Graphics: Drew Jordan and John Bradford/ The Athletic; Illustration: Dan Goldfarb / The Athletic; Photo of Clay Holmes: Andrew Mordzynski / Icon Sportswire / Getty Images)