Pitch FX mysql database with pitch type


As always please help me with my bandwidth costs for this data. Just buy me a beer. Thanks,

2007-2009 Pitch FX database in mysql

I have went through this year and reworked the import scripts from Mike Fast and especially the mysql database to make the import faster much much much faster with indexes. It was taking over 5 minutes to import a game in 2009 with all the previous data in the database, and now it takes 30 seconds at most. I fixed games that had data errors, and made sure they imported. There is alot of time and effort that goes into this import with my brother bugging me to scripting it, to my brother bugging me, to testing, to my brother bugging me. It really isn’t that bad, but some times it feels like it. The import is up to 151MB compressed so we might have to look at splitting this up by year in the future or something. Ideas? I will only be releasing one file from now on for the Pitch fx MySQL database import is named pbp2.sql. Here is the reworked file that gets updated daily.

Please would also like to know when the 2009 data from retrosheet is out so I can import that data, and maybe rework the output. I have have hear people wanting it out by 10 or 20 years spans. I can do it I just would like to know that people will use it before I do it.

Download Pitch F/X Database here

Here is how it is all done. I have 4 scripts that run a night
1. hack_4day.pl

  • Downloads files from MLB for only the last 4 days to speed up the import.
  • Deletes files older then 4 days.
  • 2. hack_pbp2.pl

  • Downloads all files from MLB to make sure I have a full set of xml files to use if needed.
  • 3. 2009.pl

  • Imports the xml files that are downloaded by the hack_4day.pl script into the pitch fx mysql database.
  • Deletes each games xml files to keep the process clean as to only store one set of xml files and one set of records in the pitch fx mysql database
  • 4. update_db_with_count.pl

  • Updates db with counts on the pitches. great script.
  • All the scripts are available on the downloads page

    1. #1 by Nick Steiner at December 6th, 2009

      Darrell -

      Retrosheet for 2009 is out. I just wanted to let you know of that, and that I, along with several other people I’ve corresponded with, would be interested in having it broken up into groups of 10 years.

      Are you still going to be interested in doing this?

      • #2 by Darrell at December 22nd, 2009

        Nick,

        I have broken the database out into decades, 1950s, 1960s, …. I also have the big boy, and should be releasing this today. My brother slowed me down made me spend a week fixing his house, and wants the baseball data on time. What a butt. Anyway I will let you know when this is done.

        Thanks,

    2. #3 by Josh at January 3rd, 2010

      Darrell,

      Thanks very much for providing this data in mySQL format. A huge time saver.

      I d/l-ed the retrosheet database, but can’t get the pitch f/x link to work. I get an empty file.

      Any chance you could check the link to make sure it’s working?

      Thanks!

      P.S. I donated gladly.

      • #4 by Darrell at January 4th, 2010

        Josh,

        I have fixed the pitch f/x database download. It was probably me using the pitch f/x export script to make the Retrosheet sql.gz file. Let it run for a split second overwrite the data. Oh well sorry for the mistake.

        Darrell

    3. #5 by josh2 at January 4th, 2010

      How do I get the .gz file into my SQLyog? It won’t execute. Sorry, I’m a rookie.

      Joshua

      • #6 by nick at February 2nd, 2010

        I think the file might still be empty, no?

        • #7 by Jeff Zimmerman at February 3rd, 2010

          It is a large file, so it is not black.

          .gz files are zipped files, you will need to uncompress it

    (will not be published)

    1. No trackbacks yet.