En cours

simple scraping script


I need a simple script:

step 0: you get a file with a list of URLs (hundreds or thousands); they are in all sorts of format (subdomains, https, many SLD/TLD).

step 1: you extract the domain names from the URLs and generate a sorted list of unique domains; this is not as simple as it sounds as the function doing that must be able to tokenize any URL format as well as any form of TLD (like .[url removed, login to view], .fr, .[url removed, login to view], ... for example).

step 2: clean the list to remove some domains such as free blogs or .gov.

step 3: scrape [url removed, login to view] to get one data about some of the domains.

step 4: scrape [url removed, login to view] to get some data for a short list of domains (without getting banned for superusage).

step 5: scrape 2 data from the [url removed, login to view] page for each domain in the list.

step 6: sort the list and output as a flat file.

Potential for long term work with the right programmer(s)

Compétences : HTML, MySQL, PHP

en voir plus : scraping free, scraping com, url scraping, php script work for scraping 2, php programmer nz, namecheap, mysql flat file, scrape data list urls, script php extract data mysql, sorted list, script clean list, scrape data mysql, mysql clean data, page scraping mysql, sorted form, scrape sort, gov blogs, scrape domains, scraping php form, function scrape, simple list urls, scraping data url, php clean urls, page mysql output, simple url file

Concernant l'employeur :
( 9 commentaires ) London, United Kingdom

Nº du projet : #2369200

7 freelance font une offre moyenne de $159 pour ce travail


HI, I had gone through the requirements, and understood what you need. Please contact to discuss it further.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(88 Commentaires)

Can be done. Please check PM for details.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(53 Commentaires)

I can build the script for you. Please refer PMB, thanks.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 3 jours
(4 Commentaires)

I cleary understood your project requirements. I will be able to deliver as per you specifications.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(0 Commentaires)

I'm interested in your job offer. I have big experience in webscrapping. I live in Russia and am ready for a remote work. We can contact via ICQ, MSN, Google Talk, Skype or any other messenger. I'm a high experienc Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(0 Commentaires)

Hi, I have 4 year experience in linux & shell scripting. As this is a small task of few lines I will deliver this in 2 days. Thanks & regards Rishi Tiwari

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 3 jours
(0 Commentaires)

Hello, please check the PM i have some questions.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(0 Commentaires)

Hello: I shall prefer writing the script in python or shellscript. Python requests module and urlparse works beautifully for such tasks. After extracting the domain names and cleaning shall be making requests to the Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 2 jours
(0 Commentaires)