As always please help me with my bandwidth costs for this data. Just buy me a beer. Thanks,
2007-2009 Pitch FX database in mysql
I have went through this year and reworked the import scripts from Mike Fast and especially the mysql database to make the import faster much much much faster with indexes. It was taking over 5 minutes to import a game in 2009 with all the previous data in the database, and now it takes 30 seconds at most. I fixed games that had data errors, and made sure they imported. There is alot of time and effort that goes into this import with my brother bugging me to scripting it, to my brother bugging me, to testing, to my brother bugging me. It really isn’t that bad, but some times it feels like it. The import is up to 151MB compressed so we might have to look at splitting this up by year in the future or something. Ideas? I will only be releasing one file from now on for the Pitch fx MySQL database import is named pbp2.sql. Here is the reworked file that gets updated daily.
Please would also like to know when the 2009 data from retrosheet is out so I can import that data, and maybe rework the output. I have have hear people wanting it out by 10 or 20 years spans. I can do it I just would like to know that people will use it before I do it.
Here is how it is all done. I have 4 scripts that run a night
- Downloads files from MLB for only the last 4 days to speed up the import.
- Deletes files older then 4 days.
- Downloads all files from MLB to make sure I have a full set of xml files to use if needed.
- Imports the xml files that are downloaded by the hack_4day.pl script into the pitch fx mysql database.
- Deletes each games xml files to keep the process clean as to only store one set of xml files and one set of records in the pitch fx mysql database
- Updates db with counts on the pitches. great script.
All the scripts are available on the downloads page