I would like to build an application that can search the Internet and extract phone numbers from web sites, directories, online classifieds, etc.
(1) The outputted telephone numbers should be in single line, text format.
(2) Must be able to remove duplicates automatically.
(3) Must have ability to search by keywords and keyword phrases.
(4) Must be able to use basic Web boolean expressions including quotes, minus sign, brackets, etc.
(5) Must be able to search by specific site URL.
(6) Must have checkbox *option* to limit Google, MSN, Yahoo and other major search engines to 99 pages to avoid annoying the engines, but otherwise should search all records if left unchecked. If box is unchecked, it should limit 100+ page searches to once every 4 hours.
(7) The supported search engines must include most of the major search engines such as Google, Yahoo, Lycos, Ask, Excite etc.
(8) *Should* incluce a copy of the location URL of each number collected for easy results verification (i.e. to confirm that a result is truly a business phone number).
(9) Must do an auto-save every 20 pages or every 5 minutes to avoid data loss due to time-outs or lock-ups.
(10) Must be able to preset how many results I want to gather in a particular run (i.e. set it for 3,000 numbers).
(11) Must be able to save searches for easy searching later (i.e. Search terms include "MLM home business" and this can be saved and retrieved so this search doesn't need to be typed in again.)
(12) I'd like to be able to extract from free classified sites such as [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], etc.
(13) If it can collect an e-mail address, too, with a fair degree of accuracy, that'd be great.
(14) Must be able to search [url removed, login to view], [url removed, login to view] and Switchboard.com.
The following two softwares describe similar to what I wish to have built, only I wish to do it better:
(a) Phone G0LD Miner [url removed, login to view]
(b) 1st Fax Extracter: [url removed, login to view] Fax- [url removed, login to view]
The languages I've selected may or may not be the best ones for this job so I'm open to different programming options.
Can this be done?