Hello, I need a Web Spider written in PHP 4.0. The Spider must read from a MySQL DB a list of Pending Sites to be Spidered. The spider must be able to access HTML pages (htm and html extensions), CGI, Perl, PHP, Cold Fusion, ASP, and each frame in Framesets. If a webpage uses a drop down list for links (common JavaScript feature), the Spider must be able to grab the links. Spider must recognize and ignore the following extensions MP3, GIF, PNG, JPG, SWF, MPG, AVI, WAV, and any other binary or non-text files. Spider must also be able to pull information and links out of tables. All links that the spider gets must be made into complete URLs, not relative links, and must include any querystring information. For each of the Pending URLS, the spider must 1. Get the title, Baseref, all meta tags, all links with their text (what the visitor sees as the link on the screen), all email addresses with their text(what the visitor sees as the link on the screen), and the text of the page, stripped of all. 2. This information must be put into the MySQL databasse in 4 tables. All page information, except links and email, will go into "SpideredSites" table. All links will go into "Pending URLs". All eamils will go into "SpideredEmails". And, all links will be added to "ReferralLinks". This last table will also contain the unique ID from "SpideredSites" for the site that was spidered to get that link. 3. Spider must update "Pending URLs" to indicate that the URL was spidered (this is a Yes/No column that will be set to Yes). 4. Spider must output to browser the ID from Pending URLs, the ID from SpideredSites, and the URL as a link, and on the following line the date and time. This is followed by two Carriage Return Line Feeds. 5. The spider should repeat steps 1-4 until all Pending URLs are spidered, or until a specific number of files have been spidered (a configuration file should be made to allow me to set the number of Pending URLs to do at one time).
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased.
## Platform
PHP 4.0