Fermé

website scraper, spider and downloader

We require a program that can download files and web pages from a list of web sites.

Background:

We have a list of 65,000 websites

In most of these web site, they contain files, mostly in PDF format, that we need to download

These files or web pages will contain a specific word or phrase, either in the URL or the web page.

The web pages may have many different file extensions, such as .asp, .aspx, .php, .htm, etc

Requirements:

1. We need to have the ability to load in a list of websites to be searched

2. We need to have the ability to set the number of levels the spider will look into the website

3. We need to have the ability to determine one or more file types

4. We need to have the ability to download files that match a specific criteria, such as word or phrase in the url

5. We need to have the option of creating and downloading files into a folder with the website address name

6. We need to have the option of staying within the base website, or 1, 2, 3, etc website address away from the base url

7. We need to have a time out option, to stop the spidering/downloading after xx seconds of not finding any files, and moving to the next website on the list.

Look at the download/spider program called Medusa, by Candego.com. Many of the options on this program are required on our program.

This program must be able to operate on current Windows PC and NOT be dependent on a web browser.

Program must be completed within two weeks of project award.

Compétences : Exploitation de Données, Web Scraping

Voir plus : program download websites, program download website, finding match, creating website url, asp spider, web browser download, list of website creating website, spider, scraping pdf, finding a pdf, browser current url, data scraping word, address name format, website scraping php, downloader data, data mining scraping website, number scraper, program downloader, program pdf scraper, name scraping website, web spider project, download project web browser, scraper site data, download options data, url spider

Concernant l'employeur :
( 76 commentaires ) Edina, United States

N° du projet : #2388974

12 freelance font une offre moyenne de $579 pour ce travail

srinichal

I look forward to discuss further

750 $ USD en 22 jours
(63 Commentaires)
6.7
phpXpertbd

I specialize in similar projects. Please check PM for more details.

750 $ USD en 10 jours
(52 Commentaires)
6.8
diepbp

I am confident to handle your project, please check your inbox for details. Thank you.

699 $ USD en 10 jours
(8 Commentaires)
4.1
alrazon

Will write a bot which will crawl your site list, just need the site list to crawl them. Thanks.

750 $ USD en 10 jours
(3 Commentaires)
3.0
Waqas109

Hi! it will be pleasure for me to this work for you :)

450 $ USD en 10 jours
(1 Commentaire)
2.4
sszelag

Hey, I can write you a crawler in C++/Qt. It'll be cross-platform and fully browser independent.

500 $ USD en 7 jours
(1 Commentaire)
1.9
usha7770

Dear Sir, I can make a spider which exactly suits your needs. for more info please check your private message box. Thanks and regards

1200 $ USD en 15 jours
(1 Commentaire)
0.0
DorianMarie

I will satisfy your needs. The script will be exactly as you want. I am an expert in web scraping, so it will be really easy for me.

500 $ USD en 3 jours
(0 Commentaires)
0.0
roedmdbddmn

Please check the PMB

250 $ USD en 1 jour
(0 Commentaires)
0.0
domybestsl

Can be done.

250 $ USD en 30 jours
(0 Commentaires)
2.6
roemdskddkdk

Please check the PMB

250 $ USD en 1 jour
(0 Commentaires)
0.0
pablocosias

Hi, I have a code that does what you're asking in the project. Details on PM Regards

600 $ USD en 7 jours
(0 Commentaires)
0.0