En cours

Scraping of Topsy, Google, Zacks (only todays EPS/SALES)

Dear Mr/Mrs,

I would like to be able to scrape certain values from the web page [url removed, login to view] on demand via the windows command line prompt. Then should these scraped values be stored into a csv file.

This script should I be able to use different input parameters so I can control the scraping.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

A.) TOPSY SCRAPING

1. Scraping mode (probably best is to use a input txt file)

The different input parameters are (see [url removed, login to view]

for the attributes on the page):

- Past 1 Hour

- Past 1 Day

- Past 30 Day

- All Time

(Search)

- Everything

- Links

- Tweets

- Experts

(Network)

- Google Plus

- Twitter

(Language)

- All Languages

- Different languages

The attributes are used in the HTTP of [url removed, login to view]

e.g. [url removed, login to view] is attribute "Past 7 Day"

The attributes to be scraped are:

- Number of hits

- 10 result details

these attributes are written to CVS file and with the current date

if there are already entries then are the result appended.

2. Scraping mode

@Input (probably best is to use a input txt file)

- Be able to use all the different attributes for scraping on this page

[url removed, login to view]

with these different ways:

scrapingPart

- 24 hour

- 12 hour

- 6 hour

- 2 hour

- 1 hour

Time Period: how far back time the scraping should be done up till which date

@Result

Retrieve these output results

The attribute to be scraped is:

- Number of hits

for every ScrapingPart(24 hours, 12 hours ....) is a scraping done and the @Result is saved as a record in the CSV file

For example

selected Time Period: [url removed, login to view] (start) - [url removed, login to view]

Are the number of days 365 and there are 365 scrapings with the specified keyword(s)

the output is written to a CVS file.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

B.) Scraping of [url removed, login to view] / [url removed, login to view]

The same scraping we would like to have as well for [url removed, login to view]

or [url removed, login to view]

Where user can specify last hour, last 24 hours

and use keywords and scraping input like this

test site:.de

for scraping on specific domain.

The attributes to be scraped are:

- Number of hits

- 10 result details

These attributes are written to CVS file and with the current date

If there are already entries then is the result appended.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

C.) Scraping Zacks Earnings

The page to scraped is [url removed, login to view]

The tables

TODAY'S EPS SURPRISES

- Positive Surprises

- Negative Surprises

Should be scraped with colon values and merged into one CVS file with current date as additional colon

TODAY'S SALES SURPRISES

- Positive Surprises

- Negative Surprises

Should be scraped with colon values and merged into one CVS with the current date additional colon

Then there are two different output CVS files [url removed, login to view] and SALES_suprises.csv.

This script must be possible to run from the command line. Every result will be appended. In case there are the same record (Same Company & Time) should no action be performed.

I should be able to run this script to create new records every day.

I will be able on Skype every day for the project support.

The code will be belong to the project requester and is not allowed to be distributed to third party.

I am looking forward to quality fast coding.

Regards,

Thomas

Compétences : Perl, Script Install, Architecture Logicielle, SQL

Voir plus : windows action script, type use case, hour coding, google support number, coding window, code site web, topsy script, twitter experts, thomas, thomas i, mrs, mr y, mr best, google plus like, google news, google ch, domain specific language, control attribute, negative keywords, skype command line, windows 2012 perl, attribute mode, perl keyword, selected records, date input control

Concernant l'employeur :
( 0 commentaires ) Stockholm, Switzerland

N° du projet : #1635439