I need a web scraper that will get all the following pieces of information from these California lotto sites
Drawing Date(Hour too if it's a multi draw for that date)
Drawing Number
Jackpot value
Jackpot cash value
BallValues
You must first describe the architecture of your scraper and it must be well documented.
The scraper should be something like methods getCaliforniaDailyThree('MM-DD-YYYY') returning a hash with the retrieved data. $result{'errorcode'} = 100 if validation failed. $result{'errorcode'} = 101 if this data doesn't exist. $result{'errorcode'} = 404 if the page can't be fetched.
A command line wrapper should be included so it can be run on the command line and should print the returned results.
Example you must include documentation (Source code comments )explaining the page that specific piece of data comes from and the content that the data is pulled from. So if a page must be retrieved with a post so a specific url, with a post variable displaying the date in a specific format this must be specified. If the surrounding html segment that the data is extracted from must be included with comments on what minimum regular expression makes it unique. Care should be taken to make the regular expression general as possible to help fault tolerance.
Each scrapper must include a sanity check. What I mean is that before it pull out any data it must extract and check agains a date with human verified data. Example,fake data, I go to the web site and see for the daily three on 03/04/09 the jackpot is $10,000 and numbers are 1,2,3, so that my code will get that date and check that the extraction of the known values work before fetching new unknown values.
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]