Webcrawler / Spider - Data Extraction

Webcrawler / Spider - Data Extraction

We need a webcrawler / spider that can collect the technical specifications of a particular product

•In essence we will want to input a name and or model number of a particular product and the spider should extract the technical specifications from multiple websites (10-20), you may want to query Google first for the top 10-20 results and then crawl those sites. The number of product could range from 100 to 1000's at a time and we should be able to upload the list with a csv or similar.

•The next step in the process is some level of “fuzzy logic” that will compare the specification names/fields and identify a tolerable level of similarity between the different results and that will be the field label for that particular feature/ specification. i.e. there are generally key technical specifications always mentioned for a particular type of product for example: megapixels for digital cameras.

•The next step is to apply similar same fuzzy logic for the actual specifications themselves as often webmasters don’t always post data accurately or completely and leave some specs out.

•All the data should then be stored in a database that is searchable. The data should be presented in a tabular format.

•Where possible the pdf’s with the technical specifications and or user manuals of the said product, a URL should be supplied by the application, the source URL’s of the data should be included as well

•Our preference is for a web based solution using open source such as php and mySql . The application must be secure and scalable.

•We will require a web based front end to display the results to users, so integration into a CMS such as Wordpress or Joomla would be preferable.

We have many ideas of the logical flow of achieving the above as well as the bigger picture to this entire project, however this will be shared with those short listed as potential suppliers. The code must belong to us and you must be prepared to sign a NDA.

This is the initial project and based on the success of the project there will be ongoing enhancements and features required. Please make sure to read the above properly and send through any questions you have as well as constructive responses.

Compétences : Exploitation de Données, MySQL, PHP, Architecture Logicielle, Web Scraping

en voir plus : wordpress websites step by step pdf, wordpress webmasters, web scraping solution, web scraping process, top query, spider web data extraction, solution specification, scraping web for ideas, range query, nda model, names for websites ideas, integration specification, google tabular, example of nda, data front, data extraction from web, data spider php, Webcrawler, web extraction, Web Data Extraction, technical specs, spider, similarity, searchable PDF, scraping pdf

Concernant l'employeur :
( 22 commentaires ) Zichron Yakov, Israel

Nº du projet : #1625679

8 freelance font une offre moyenne de $1150 pour ce travail


Hi, We have designed and built websites for various types of businesses very effectively. We work with all of our clients individually to easily coordinate and to keep track of the requirements and scope. We fulfil Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 30 jours
(224 Commentaires)

I specialize in similar projects. Please check PM for more details.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 18 jours
(27 Commentaires)

Hi, Kindly check PMB

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 14 jours
(16 Commentaires)

Respected madam, i am ready to work on this project. Please give me honor to work with you. Also check your inbox. Thanks

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 30 jours
(35 Commentaires)

Hi, This is Jeni from bistech support. We would like to inform you that we could able to develop this wordpress project. Please find the attached document for your reference. our ball park quote is 750 U Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(17 Commentaires)

Can provide a expendable solution for crawling product description. Please check your PMB for clarifications.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 20 jours
(1 Évaluation)

Dear Client, OUR S--K-Y--PE IS [url removed, login to view] WE ARE NOT TAKING ADVANCED, YOU CAN PAY AS PER WORK # We are ready to discuss the project with you and based on that move forward Lets discuss Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(0 Commentaires)

Hello, Please check PM for further details.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 15 jours
(0 Commentaires)

Assurance of excellent job done with expert team having 5 years of experience in this field within allotted time with consideration of minute details.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 10 jours
(0 Commentaires)