Web Crawler & Search Engine

Hello. I am looking for a freelance programmer with the vision and interest in creating something that matters. The idea is simple, but ambitious; Build a useful search engine/portal in which all advertising proceeds go to help humanitarian projects (scholarships, health programs, etc.) Imagine if people could help others just by changing their homepage.


#1 - Parse/grab URLs from a HUGE (2gb) XML file. This can be done with your own script or with a freeware script that you know of. The result should be a list of over 5 million URLs. (Can be broken into smaller sections.)

#2 - Create a multi-thread web crawler that will access the list of URLs and use it to begin it's slow crawl of the Internet. Spidering each homepage and gathering information (meta-tags, content, etc.) The spider will harvest the first page and any secondary pages (future crawls may go deeper into the site). It will also use "stop words", meaning it will not gather certain words (a, the, an, etc.). All content will be sent to a database (of your suggestion). The size of the database will be large so speed is a concern. Suggestions welcome.

#3 - Create a search engine that will access the database and print results based on relevancy of keywords. Because the database will be huge, caching popular results may be required. Speed is a concern and your opinions are welcome.

Website Host: Currently using Globat. Please view their specs to make sure they can support your suggestions. Will change host if needed.

This is a large project but, once again, it's for a very good cause. Numerous media outlets have already shown an interest in this "little search engine that could." If you are interested in helping create something meaningful please contact me.

Thank you.

Compétences : Programmation C, Création de Liens, Perl, PHP, XML

en voir plus : web crawler speed, web support programmer, web support freelance, web site programs, website freelance site internet, Website freelance programmer , website creating freelance, web search freelance, web programs, web programmer freelance web site, web pages freelance, web host site, web go, web freelance website, web freelance projects, web freelance project, web for freelance, web-crawler, web by freelance, web based programs, web 3 website freelance, web 2 programs, vision for freelance website, useful website for programmer, to search a web programmer

Concernant l'employeur :
( 0 commentaires ) Austin, United States

Nº du projet : #30738

3 freelance font une offre moyenne de $283 pour ce travail


Please check your PM.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 30 jours
(116 Commentaires)

We are already working on a clone of [login to view URL] We offer an outstanding value added and high degree of quality with industry-specific design services at very competitive prices. We assure 100% satisfaction by qual Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 30 jours
(28 Commentaires)

We have base script for make all your requarements.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 12 jours
(1 Évaluation)