Fermé

Simple web crawler

Create a database/php application that will crawl a list of URLs, determined by a priority number using a master/slave system. The master and slave will most likely be done using Ubuntu/Debian EC2's. Using a LAMP stack and with php5-curl installed (To do the requests). The code has to work with that setup, it can be developed in windows but the code has to work for linux filesystem.

The main server/database (lets call it MAIN) will have a mysql database with a few tables:

Urls - (Url, Priority, SlaveId)

Slaves - (SlaveId, ServerIP, QueueSize, State)

State options: Online, Offline

Priorities will be 1-5.

Each slave reports to MAIN its state every 5 minutes, confirming its 'Online'. If MAIN doesn't hear from the slave after 5 minutes, it reports state as 'Offline'.

URLs will be removed once completed by the slave (The slave will do a SQL delete and delete it from the MAIN).

urls will be added to the URL table and can be added randomly to the slaves (doesn't need to be balanced, but if there are 5 new urls then they should be added to slave1, slave2, slave3...etc)

The balance algorithm needs to happen instantly when a slave goes offline, goes online, and every 1 minute.

The MAIN servers job is to assign slaves to the Urls and try to balance workload between all slaves as much as possible. If a slave gets marked as Offline, or a new slave becomes online all queued URLs get even distributed appropriately, making sure not only the number of assigned URLs to a slave is even but the average priority is about the same.

The SLAVEs job is to process their assigned URLs, in order by priority (5 is highest priority). The slave will use php5-curl to make a request to the URL, and then save the contents of the request to a file on the hard drive. Then it will report to MAIN that it's queue is 1 less, and it will delete the URL record it just deleted.

Compétences : MySQL, PHP, Architecture Logicielle

Voir plus : software write mq4, software write chip epson, useful software write book, software write web specs, flash movie player web simple, web simple trash can gif, web designer graphic designer average pricing, web simple xml creator, free software write user guide, software write edid, simple java program average, free software write company profile, freelance web design business revenue average, software write websites idea, web simple template, software write book images, software write books, photo galleries web simple viewer, software write protection, free software write book, free website crawler web search php language, software write book, web simple time input javascript, bot crawler web, web price calculator design average freelancer

Concernant l'employeur :
( 1 commentaire ) Adrian, United States

N° du projet : #13075992

7 freelance ont fait une offre moyenne de 125 $ pour ce travail

elbruninh

Hi, i'm interested, could you give me more details please? do you need work out all functionalities? regards [url removed, login to view]

250 $ USD en 3 jours
(46 Commentaires)
6.0
ChiragLathiya

Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Message me before you gonna project to me Mes Plus

15 $ USD en 1 jour
(59 Commentaires)
5.5
Humfi

Hello, I am available for your job, I can start right now. I will provide you good quality work with fast turnaround. Please hire me for this project. Waiting for your kind reply for more discussion. Humfi

250 $ USD en 1 jour
(8 Commentaires)
3.3
trans21

Hello, Hope you are doing well. I read your project description, Lets have a technical discussion then we understand, negotiate costing, timeline and then we proceed further. Also I shall show my past work when we di Plus

100 $ USD en 2 jours
(10 Commentaires)
3.4
deaswang

hello I have read your requirement. I can help you to finish this work. Can you provide more information about this project? I can use python to scrape. Thank you

25 $ USD en 1 jour
(1 Commentaire)
1.5
DanWalkerUK

Hey, I've bid, but I'd also like to say that based upon what you are trying to achieve (central list of URLs, work servers visit them and store the contents somewhere) I would approach it differently. I'd need to cl Plus

222 $ USD en 10 jours
(0 Commentaires)
0.0
pawarpankaj923

Hello, Nice to see your post,I am having 5+ years of experience in development,just share me your detail requirement with me so we can discuss more.I am sure after discussion with me you are satisfy and we will wo Plus

15 $ USD en 1 jour
(0 Commentaires)
0.0