Fermé

Developing a file scraper to scrape firmware files and their info from various vendors websites

A python based CLI script that can download all product’s firmware (including all versions) from web pages for a given list of predefined vendors and store the information (meta data) in SQLite [login to view URL] mandatory metadata fields include ( Manufacturer, Model, Version, Type, Name, Release Date(if available), Download link, ( calculated Sha2 hash of the file)i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link" ) There is a non-mandatory binary field which indicates if the device is discontinued or not depending on the fact that vendor mention that on the website or not. The firmware files itself will be stored in the file system and will be referenced by index ID in SQLite.

The arguments to the script should be a list of comma separated vendor names or the location of a text file containing the vendor name.

There are no GUI components in the server where the script will run hence headless mode for browser should be used by the script

Solution Scope

1. Script will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page. However, efforts will be put to identify and implement reusable components, if any.

2. The script will only download new firmware that have been added by the vendor. Hence first execution of script will download all the firmware available but the subsequent runs will only download new ones which will get added. This will be achieved by analysing data available in SQLite and skipping the files that are already been downloaded and processed.

3. Each vendor, that will be provided, will be analysed manually to identify the following, which will be required to develop the script:

a. URL for the firmware download page

b. Credential Requirements (Simple Signups, Specific Signups, No Signups)

c. Any Captcha on the page

d. Any honeypot traps

4. If there are credential required to download the firmware and the credentials are simple ones where a simple sign up is required, the signup will be done manually as part of the manual analysis using a gmail account dedicated for this work.

5. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as uses Tor, so that if the vendor site has scraper/crawler detection logic implemented, it can be skipped. This will be achieved by adding random delays, random view time, avoiding honeypot traps through manual analysis

Solution Brief

A Python Selenium and SQLite based solution will be developed which will have the following features/components:

1. File Management Module: Responsible for storing and managing the downloaded files and meta data. Firmware and installer files will be stored on the filesystem which will have a structured folder hierarchy. Meta data of the files will be stored in SQLite. Meta Data will refer to the stored files through paths on the file system and file index/name.

2. Vendor Scrappers: Python Selenium based scrapper will be written for each of the vendor, responsible for downloading the files and grabbing the meta data from the vendor’s site. This will make use of the file management module to store the file and meta data to SQLite.

3. Configuration File: All the configurations for the framework (including vendor specific like credentials, url etc) will be stored in a json file which can be easily modified manually.

4. Execution Script: The configuration file can be setup to represent the polling interval for each of the vendor scraper and when the execution script is run it will go and schedule each of the vendor scripts individually according the polling interval defined in the config.

Deliverable:

1) Python Source Code including the comments in the code explaining each function & its details. We should be able to give any required input as an argument and execute it as one line command in the Linux terminal.

2) Dependencies

3) Manual to install, configure and use the scraper

Compétences : Python, Selenium, Web Scraping, SQLite, Linux

en voir plus : aspnet word file scraper, java file parser find text files, scrape mp3 files web page, file upload php large files, scrape pdf files, scrape php files site, vbs file copying large amount files, website file scraper, file upload perl big files, java scrape directory files, file upload handling separates files, scrape excel files, vbs file read values xml files, scrape amazon product info php, php upload file using ajax large files, wget scrape website files, scrape media files mp3 mp4 from file directory

Concernant l'employeur :
( 2 commentaires ) Brussels, Belgium

Nº du projet : #23126816

24 freelance font une offre moyenne de €15/heure pour ce travail

phpXpertbd

Dear Sir, I'm very much delighted to let you know that I've been doing data scraping with PHP-cURL, PhantomJS, Node.js, Selenium from many sites. I just scraped the data from web site and then wrote the data in mysql Plus

€15 EUR / heure
(68 Commentaires)
7.3
zekovicm

Hi there,I am Web Scraping expert from Bosnia & Herzegovina,Europe. I have carefully gone through with your requirements and I would like to help you with this project ! I can start immediately and finish it within the Plus

€20 EUR / heure
(117 Commentaires)
7.4
tangramua

Dear Sir,   Our team has a huge experience in Python, Linux, Web Scraping, SQLite, Selenium as a result we can successfully complete this project. Having the required skills, we will be glad to help you.   We have 20 y Plus

€15 EUR / heure
(54 Commentaires)
6.7
adeelpirzada

Hi, I'll try to keep this short i have done scrapping almost on Half of Worldwide web including eCommerce giants (Amazon, eBay, craigslist) News Feed, Social media websites, API's. I develop my own scrapers and Plus

€12 EUR / heure
(36 Commentaires)
6.5
schoudhary1553

Hello, I can help you with your project - Developing a file scraper to scrape firmware files and their info from various vendors websites I have gone through your job posting and become very much interested to work Plus

€18 EUR / heure
(43 Commentaires)
6.3
abhilashtv

Hi, ➲ 10+ years of full-time experience in Python / Django with 50,000+ Upwork hours billed and 50+ successful Python projects ➲ Upwork Top 10 Certification for Python and Django ➲ Guaranteed Results Policy: Pay only i Plus

€12 EUR / heure
(14 Commentaires)
5.6
Seeniea

Hell Sir. I'm a scraping exert, I have read your post detail I can make python scraper script and deploy on Linux server. I have confidence. Please give me this job thanks

€20 EUR / heure
(6 Commentaires)
5.4
oswaldoodavidal

Hello, I have experience in web scraping with Python. I can use Selenium, Scrapy, BeautifulSoup and Requests to make the best web scrapers! I'm confident that I can help you with your project, I have understood it co Plus

€15 EUR / heure
(4 Commentaires)
4.1
wujin92

hi! I have read your project detail. The scrap data and storing them in database is not so much difficult. It would be bulk of code since you want scripts for individual vendor's site, but there would be not that diffi Plus

€15 EUR / heure
(9 Commentaires)
4.1
nikhil929

Greetings, we have relevant experience in scraping website using python and scrapy and getting the output in the desired format as well Please share us the target website and the key fields to scrape fro Plus

€15 EUR / heure
(6 Commentaires)
3.6
mobileappdevin

Hi, After reading over your application this looks like a perfect fit for my skill sets. I have more than 6+ years of experience in Python/Django/Flask & MEAN developers working on node.js & angularjs since 2009. here Plus

€13 EUR / heure
(2 Commentaires)
3.9
devstart1234

Hi, We work in a team and providing services from last 5 years. We have checked your requirement and interested to work with you. It would be great if we could discuss project over chat to discuss it in detail. We ar Plus

€12 EUR / heure
(5 Commentaires)
3.6
greenguru2018

Hello, How are you? My pleasure to bid your project. I've read carefully your project description. I have more than five years experience in development related with your project. Your satisfaction with the project is Plus

€15 EUR / heure
(2 Commentaires)
2.9
JinTai

Hello I have just read your proposal carefully. I see you need some help for scrapping firmware information. I can help you with that and start right now. I have good expertise in building python script which can scrap Plus

€15 EUR / heure
(1 Évaluation)
1.4
inhe121

Dear sir! ⭐I read your project details carefully and I thought that I am the best fit developer for your project. ⭐I have rich experience with such a project, so I have a clear way to complete the project. ⭐Your projec Plus

€15 EUR / heure
(1 Évaluation)
0.8
dataspro

Hello!! We are DSPro, a software development agency specialized in providing services and products to companies, through cutting edge technologies such as Cognitive Computing, Backend systems, Data Pipelines, Machine Plus

€14 EUR / heure
(0 Commentaires)
0.0
xeeshanziab

I may look inexperience by my profile but i really have some serious experience in field of automation. I have been a full time we Automation software engineer for a UK based company where we made a system to automate Plus

€15 EUR / heure
(0 Commentaires)
0.0
matttai90

Highly qualified Data Engineer and Electrical Engineer based in the United States. Starting work in a few months and doing this to pass time and personally improve myself. Website: [login to view URL]

€17 EUR / heure
(0 Commentaires)
0.0
raviraj55055

Yes, i can do and my experience is 5+years.i have require more [login to view URL] with me. i am waiting for your message.

€13 EUR / heure
(0 Commentaires)
0.0
gheorghebalan

I have already build similar system to this one. Working experience with crawlers, python, Tor and SQLLite database storage.

€16 EUR / heure
(0 Commentaires)
0.0