En cours

simple scraping script

Hello,

I need a simple script:

step 0: you get a file with a list of URLs (hundreds or thousands); they are in all sorts of format (subdomains, https, many SLD/TLD).

step 1: you extract the domain names from the URLs and generate a sorted list of unique domains; this is not as simple as it sounds as the function doing that must be able to tokenize any URL format as well as any form of TLD (like .[url removed, login to view], .fr, .[url removed, login to view], ... for example).

step 2: clean the list to remove some domains such as free blogs or .gov.

step 3: scrape [url removed, login to view] to get one data about some of the domains.

step 4: scrape [url removed, login to view] to get some data for a short list of domains (without getting banned for superusage).

step 5: scrape 2 data from the [url removed, login to view] page for each domain in the list.

step 6: sort the list and output as a flat file.

Potential for long term work with the right programmer(s)

Compétences : HTML, MySQL, PHP

Voir plus : php programmer, namecheap, mysql flat file, sorted list, script clean list, mysql clean data, sorted form, gov blogs, simple list urls, php clean urls, page mysql output, simple url file, simple url script, sort php list data, php sort unique file, simple page script, data clean script, step format file, simple php form example, mysql output, script list subdomains, alexa script, sort alexa, php script extract data, mysql form output

Concernant l'employeur :
( 9 commentaires ) London, United Kingdom

N° du projet : #2369200

7 freelance ont fait une offre moyenne de 159 $ pour ce travail

Soolved

HI, I had gone through the requirements, and understood what you need. Please contact to discuss it further.

200 $ USD en 5 jours
(88 Commentaires)
6.2
lafor

Can be done. Please check PM for details.

220 $ USD en 7 jours
(53 Commentaires)
5.8
joeguo

I can build the script for you. Please refer PMB, thanks.

150 $ USD en 3 jours
(4 Commentaires)
3.8
Athumani

I cleary understood your project requirements. I will be able to deliver as per you specifications.

250 $ USD en 5 jours
(0 Commentaires)
0.0
Weylin

I'm interested in your job offer. I have big experience in webscrapping. I live in Russia and am ready for a remote work. We can contact via ICQ, MSN, Google Talk, Skype or any other messenger. I'm a high experienc Plus

91 $ USD en 5 jours
(0 Commentaires)
0.0
rishi6451

Hi, I have 4 year experience in linux & shell scripting. As this is a small task of few lines I will deliver this in 2 days. Thanks & regards Rishi Tiwari

125 $ USD en 3 jours
(0 Commentaires)
0.0
Cyrix02

Hello, please check the PM i have some questions.

200 $ USD en 5 jours
(0 Commentaires)
0.0
bitmonk

Hello: I shall prefer writing the script in python or shellscript. Python requests module and urlparse works beautifully for such tasks. After extracting the domain names and cleaning shall be making requests to the Plus

100 $ USD en 2 jours
(0 Commentaires)
0.0