Monday, May 4, 2009

Why Retrosheet is bad for baseball

Let me start off that I don't have a beef with Retrosheet whatsoever. I've been a part of Retrosheet for many, many years and have known Dave Smith, the founder, since I was back in high school. I think Retrosheet has been one of the biggest advancements in how we understand and appreciate the history of the game in some time.

For those not familiar, Retrosheet is a volunteer effort coordinated by the above mentioned Dave Smith to computerize the play-by-play of as many pre-1984 baseball games as possible. Why 1984? Because that's when Project Scoresheet and then STATS began tracking play-by-play and making it available to the general public. Initially, in order to achieve this, Smith contacted the major league teams to see what they had in terms of scoresheets from games. The results were mixed but once he got his foot in the door, it soon opened wide. Scoresheets came from media outlets, official scorers and even fans donated sheets. A lot of information was found in play-play-play reports from newspapers and eventually, play-by-play existed for almost every game back to 1954.

All this information was made accessible by Sean Forman and his website Baseball Reference and this is where my beef begins. So many people get their information from there and use that to find stuff that it is becoming commonplace to act as if the beginning of baseball history was 1954.

Take this example from Kansas City Star sportswriter Joe Posnanski. He writes about cycles and just says "that's how far we are going back" meaning "that's all the data Retrosheet has and I'm being lazy enough to not do any research of my own besides using the Baseball Reference Play Index and also mention Retrosheet or Baseball Reference".

And I'm not picking on Joe, whose writings I enjoy immensely. This phenomenon is just becoming more and more rampant and Joe's is one of the more recent examples I've seen of it.

I also don't have a solution. Retrosheet has added boxscores from the games from the 1920's but that won't help as people certainly aren't going to begin citing "That's the most times that has happened since 1954 or in the 1920's". It's just frustrating that the ease of Baseball Reference is creating this arbitrary baseball history starting point.

As I mentioned before, I have an intent on increasing the scope of my own cycle research and so perhaps I can help in one small way. That doesn't do any good for other events but maybe it will open some eyes that there was more professional baseball played that Retrosheet doesn't have data for than that for which it does.

1 comment:

Sean Foran said...

Don't worry mad guru. We'll get all of the gamelogs at least in there at some point.