Hello,
As usual I'm using python/scrapy/django for such jobs.
I was writing scripts for data scraping including file donwloading (images,pdf etc.) and database storage, using django ORM. Also I can do file generator scripts, it can pull data from db and store it in needed format (csv,json,xlsx as examples). I can show you some examples if needed.
Here are some sites from where I was getting a data:
- [login to view URL]
- [login to view URL] (3 different jobs)
- [login to view URL]
- [login to view URL]
And about 10 more. For some of them I can show you scripts or parts of data.
Sometimes sites are secure (some ajax pagination with dinamic token), so in this case I'm writting chrome extension scripts using jQuery only, cuz it has all needed functionality.
In some cases I'm asking client to provide me list of proxy servers and captcha solver API credentials,
security can be different. It is hard, or even is not hard, anyway it will take a lot of time to write capcha recognition script, so it is better to pay some money to capcha solving API. The same if for proxy servers list, it is better to buy a lits of proxies then use free lists, cuz 95% of free list proxies are useless.
Regards,
Ivan