I need to have a program that scrapes webpages from [url removed, login to view] to create a database of play by play data. I would expect this will be done in python or PERL but am happy to consider other languages as well.
The files needed to be examined can be found by going to [url removed, login to view], navigating to “Schedules” under the “Scores and Stats” menu, going to the correct year/week and clicking on GameCenter for the game in question. Once inside “GameCenter”, click on “Analyze” and then “Play by Play”. This provides the info that needs to be parsed.
The program should take four command line parameters as input -- year, indicator for type of season (0=preseason, 1=regular season, 2=postseason), first week and last week. For example, using "myprogram 2008 1 4 12" should result in the program handling every game in the 2008 regular season in weeks 4, 5, 6, 7, 8, 9, 10, 11 and 12.
This will probably be signifcantly easier for someone who understands football and rules of the game (example—understanding how to update the score after touchdowns, safeties, etc). Without that background though, I am willing to work with you to help understand various situations where it may not be obvious how to parse the info. If ANYTHING is at all unclear, please do not hesitate to ask me, I am happy to clarify what the output should look like for any one situation.
I have several other projects like this available for a candidate who does a good job here. Details are important.
The output should be a CSV file with one line per each play listed here. (Example of web page and output below). The fields are described in fields.txt.
It is also possible I made one or two small mistakes on the example CSV file. If, after double-checking, you are still not matching my results but agree with [url removed, login to view] file, it is probably my mistake, not yours. Feel free to double check.
The example CSV file is what I think output should be from 2011, week 3 game of Carolina at Jacksonville.