En cours

data crawler to login & spider inventory data from distributor website to csv file

We need to create a automated crawler that will log into a distributor warehouse website and download inventory data from tables to a delimtered file.

The website we will be crawling is the login/search catalog section of www.electrograph.com.

I have saved copies of their site locally to demonstrate what needs to be done. After closing of project we will provide actual login details to the live site for the job to be completed.

Walk through process of what needs to be done:

Login Home Page

[url removed, login to view]

Goto main website and login using the form in the uppler left hand corner of page. User name and password should be definable.

Successfully Logged In

[url removed, login to view]

After the login has been processed successfully the page is refreshed now including a "My Account" section in the upper left hand corner. Additionlly, The "keyword/ Item# search" form is now enabled for our specific account. It will display the specific pricing, and inventory quantities available for our account when submitted.

Currently their web site allows you browse through the inventory of items by category, and then paginate through the results (cannot show all products in one iteration). We need to follow each category link through the select menu "ddlCategory" individually, download all the data in the page to specified format, and continue on to the next page of results if another page exists.

Crawling first result page of the first category searched "Accessories"

[url removed, login to view]

This page displays the information that we are looking to store in a delimitered file format. We need to trim & store Model #, Manufacturer, Description, Availability, Reseller Price columns. Each table row, a new line in the delimtered file created.

Take note of the Availability column, it provides a total quantity number in stock and then a "I" icon. When you hover above this "I" icon it displays the breakup of which warehouse locations that product is stored in. For example: 18 (I says: 14 - NY, 4, NV, meaning 14 units in stock in New York, 4 units in stock in Nevada). We need to store both the total quantity available as well as those individual location listings. A column for each warehouse location.

Crawling second/additional result page(s) of the first category searched "Accessories" (page 2+)

[url removed, login to view]

Perform the same process as Step2 downloading & storing all the inventory data, and continue onto the next page if it exists.

(Note on the saved version of the this page i povided you; the javascript is not working to show the individual warehouse splitup, it will of course be operating on the live site)

Crawling first result page of additional LARGE category searched "Plasma Displays"

[url removed, login to view] (interim refine page)

[url removed, login to view] (actual results page)

Some categories of their website that contain a substation amount of products, when you first click on "SEARCH" it does not display results. It brings you to another "search plasma displays" form where you can refine your results, and search by attributes. We do not care to do this, we simply want to select the "GO" button, which will display all the products under that category in the same manner as step2.

Crawling second/additional result page(s) of additional LARGE category searched "Plasma Displays"

[url removed, login to view]

Perform the same process as Step2 downloading & storing all the inventory data, and continue onto the next page if it exists.

The end result needs to create a file that is Delimitered by Comma

Example result for parsing of example link [url removed, login to view]

Model Number, Manufactuer, Description, Reseller Price, Total Available Qty, Location NY Qty, Location NV Qty, Location XX Qty

ACE615, ADCOM, ACE-615 ILS SURGE (120V), [url removed, login to view], 12, 12, 0, 0

TRAVEL CS/42"PANASON, CALZONE CASE CO, TRAVEL CASE 42" PANASONIC, [url removed, login to view], 0, 0, 0, 0

FSD-4100, CHIEF MANUFACTURING, FSD-4100, [url removed, login to view], 0, 0, 0, 0

CMA-0608, CHIEF MANUFACTURING, 6'-8' ADJUSTABLE PLATE, [url removed, login to view], 0, 0, 0, 0

RC-1PXL, ELECTROGRAPH SYSTEMS, 24-BUTTON SWITCH PANEL FOR VS-1XL, [url removed, login to view], 0, 0, 0, 0

RC-1XL, ELECTROGRAPH SYSTEMS, NEW MODEL NUMBER (WAS VS-1XL) REMO, [url removed, login to view] 0, 0, 0, 0

FRAME-O, ELECTROGRAPH SYSTEMS, SINGLE GANG FRAME TO HOLD UP TO 3 W, [url removed, login to view], 0, 0, 0, 0

FRAME-W, ELECTROGRAPH SYSTEMS, SINGLE GANG FRAME TO HOLD UP TO 3 W, [url removed, login to view], 5, 5, 0, 0

Notice on the website, some products it gives a quantity, some it says "call for availability". We need to be able to map whatever text is in that field to a text/numerical equivalent. For example in this impelentation we define "Call for availability" as 0.

Also, because they are always adding and changing warehouse locations we need to leave room at the end of the delimitered file for new locations that are added. When text is found in the quantity available field, and we compare it to find its equivalency and apply that to all the other location columns. For example: "call for availabiilty" will result in 0, 0, 0, 0 (Total Quantity Available, Location 1 Qty, Location 2 Qty, Location 3 Qty). We should make room for up to 10 warehouse locations (0, 0, 0, 0, 0, 0, 0, 0, 0, 0). When a quantity is not defined for a warehouse that is indexed we will replace it with zero.

In this example Call for availbility means the product is not in stock, thus we are marking it and all subsequent warehouse locations as 0.

I also need to able to control the delimiter used in the output file (I have used comma in this illustration for ease).

I also need to be able to control the delay between page navigation (milliseconds)

A database should not be necessary; a simple config file is fine.

Need to get this project completed ASAP. We have several data crawlers that need to be created: Winner of this project can expect future work in the development of similar crawlers.

Compétences : .NET, ASP, Programmation C, PHP

Voir plus : work done home, work login, working web crawler, working home job search, working first data, work home line, work home new york, work home manufacturing, work home distributor, work browse, work home job search, find format, get home work, php used web development, need get manufacturer, well care, web page development pricing, web development new york, web development icon, web crawler job search, work pricing, to hold, simply well, second job work home, pricing web development work

Concernant l'employeur :
( 19 commentaires ) brooklyn, United States

N° du projet : #29093

1 freelance a fait une offre moyenne de 95 $ pour ce travail

gogetter

I have developed site crawlers in past. These crawlers are able to handle Cookie based sessions, Javascript URLs and http/html redirects. I can use existing codebase to complete this project. This poject can be impl Plus

95 $ USD en 5 jours
(2 Commentaires)
4.2