We require a program that can download files and web pages from a list of web sites.
We have a list of 65,000 websites
In most of these web site, they contain files, mostly in PDF format, that we need to download
These files or web pages will contain a specific word or phrase, either in the URL or the web page.
The web pages may have many different file extensions, such as .asp, .aspx, .php, .htm, etc
1. We need to have the ability to load in a list of websites to be searched
2. We need to have the ability to set the number of levels the spider will look into the website
3. We need to have the ability to determine one or more file types
4. We need to have the ability to download files that match a specific criteria, such as word or phrase in the url
5. We need to have the option of creating and downloading files into a folder with the website address name
6. We need to have the option of staying within the base website, or 1, 2, 3, etc website address away from the base url
7. We need to have a time out option, to stop the spidering/downloading after xx seconds of not finding any files, and moving to the next website on the list.
Look at the download/spider program called Medusa, by Candego.com. Many of the options on this program are required on our program.
This program must be able to operate on current Windows PC and NOT be dependent on a web browser.
Program must be completed within two weeks of project award.