Saturday, April 13, 2013

Part 6: More thoughts

I thought I would post a little note explaining why I am approaching corrections in the manner I am currently pursuing.  Way back in 2008, when I originally started messing around with PITCHf/x data, I noticed a curiosity that I spoke a bit about then.  That measurements of the coefficient of drag varied by quite a bit at some parks.  This absolutely should not happen.  While at least one variation in Cd was traced back to an error in the spherical distortion parameter on a camera at PETCO field in 2007, my conversations with some of the people at Sportvision at the first PITCHf/x summit in 2008 led me to believe that for the most part, they had a good handle on these spherical distortion parameters.  I tried a few things that used the error in Cd to estimate correction factors, but those were utter failures.  Not long after that, my attention turned away from PITCHf/x for a while.  At the time, Josh Kalk had derived some correction factors that simply added a constant to each pitch parameter based on the park it was thrown in.  Mike Fast had used a variation of this method to correct initial and final positions at the game level (but to my knowledge never applied this method to velocities or accelerations).  I was never too satisfied with this method although on some level it did work.  The main reason I found this method unsatisfactory was that it was very difficult to find a justification for adding a constant value to velocities and accelerations.

So when I decided to come back to PITCHf/x I noticed two things:  1) Josh's correction method never really gained widespread adoption, possibly because it was pulled down when he was hired by the Rays and 2.) The calibration issues still had not been completely eliminated.  It was Jon Roegele's article detailing issues with horizontal movement at Tampa Bay that got me thinking about this again.  And I thought that if we could describe the systematic issues as a transformation of the world coordinates that then we could potentially derive corrections that have a justification behind them.  So the simplest case to look at first was an affine transform.  This is the class of linear transformations that consist of rotations, translations, scaling and shearing.  Specifically I thought there might be something to a shear transform.  It has a few properties that could fit the problem well.  Specifically, if you had a miscalibration that produced a shear that was entirely due to a tilt of the z-axis, then a large fraction of the calibrations come out looking just fine to the user.  It would only be when re-calibrating the z axis that this would possibly get noticed at all.  Secondly, this same kind of shear could also conceivably be responsible for the differences I saw across parks in the coefficient of drag.  If one measured gravity in a coordinate system that had a z-axis shear, then gravity would have an x or y component to it.  And it turns out it wouldn't take much to make an impact, as the typical drag force on a fastball is somewhere in the ballpark of 1g.

So that was why I tried an affine transform first.  It almost worked.  But not quite.

But in trying I was able to easily modify my code to the method I mentioned in part 5.  In thinking more about what I did the other day, I keep coming back to basically where I was in 2008.  That it's really the accelerations that matter (those are the only matrix elements that are more than 5% different from the identity matrix...actually, one other velocity component was significant too.).  Allow me to litter the screen a little bit with some plots now:  These are the results of this fit using only the lefthanded pitcher, Chen.  Again, in these plots green=Tropicana, red=Camden, blue=Tropicana made to look like Camden:

Chen's Trop pitches look much more like his pitches at Camden, and the final locations don't move a whole lot.  Hellicksons pitches mostly look better, but the final locations seemed to move in the wrong direction.

Lets go the other way, using only Hellickson:

Now the situation is reversed:  The final locations of Chen's pitches look terrible, but Hellicksons don't move much.

For reference then, from my previous post, when you basically average those two corrections (include both pitchers in the fit):

I think this brings us back to that place I didn't want to go.  Spherical Distortion.  I think that not only is it necessary here to correct the accelerations, but that that correction will actually be release-point dependent.  (Although we might be able to call the average of a righty/lefty correction a close enough approximation)Also, it's entirely possible that we could still limit ourselves to 9 parameters plus a small number of other parameters.  It's likely that there is one Y location that is close to being a fixed point.  If we can find that location to apply corrections to acceleration only, we likely don't need all the other terms.  I'm not sure if it's possible to find that though.  Actually, it need not be spherical distortion, but some sort of effect which makes the PITCHf/x coordinate system a little bit non-uniform.  Spherical distortion is simply the first cause of that I can imagine.  There may be others.

How to implement something like that without exposing ourselves to smoothing over changes in performance is something I need to think about.  I had ideas before that were dependent on the correction factors not being release point dependent.  This kind of blows some of them out of the water for a moment.

1 comment:

pobguy said...

I need to get back working on this problem. But I have some major knuckleball analysis that will keep me busy for a few weeks. I'll be closely watching this site for developments.