I have a list of local company web sites. I want to sell Web design and Web hosting services to these companies so I'd like to collect some information to help identify good prospects. To save a lot of human effort I need a spider program that can visit each of the web sites on the list. The spider will need to explore the home page and some other pages on the site but it will not follow links off onto other sites. I want the spider to create a record for each web site it visits and extract the following data. Sometimes the data may not be on the home page but on a page linked from it. Data required: Postal address Telephone number Fax number email address - NB may be text or image link ?If there is no email address, start an smtp session and establish if one of admin at info at sales at etc is a valid email Can the home page be found? Any http error codes - 403, etc? Any home page meta tags including description, key words, generator, content-type, etc home page title Frame set on home page? Any html forms on the site? How many links off the home page? Is there a [url removed, login to view] file? Any broken links? Get the Google PR of the home page Any scripting used on the site and what type - PHP, ASP, etc Any flash movies on the site? IP address of server Name of hosting company - perhaps the Autonomous System Number Domain name and whois information: Domain type - .uk, .com, etc registrant Registrant's agent Renewal date Registration status Name servers What else do you suggest the script should check for? Please detail what your bid includes and when you could complete the job by. I don't mind if the tool runs on a Windows PC or is a script to run on a Linux web server.
Can be done with all features that you have listed and also I can add more SEO statistics from my ready-made SEO scripts.