The program must be in perl. The program scrapes historic Financial Earnings Announcement data from the website [login to view URL] and outputs the data into a series of comma delimited text files.
Arguments can be supplied on the command line:
-c country.
This specifies the country and is “GB” by default, but could also be US, DE, JP, AU, BR. E.g. it specifies the country code within the search e.g. [login to view URL]
-start
Start date for data to be scraped. Default is 20120101.
-end
End date for data to be scraped. Default is todays date.
The script then processes through each date for the specified exchange,. from the start date to the end date - e.g. starting with (by default):
[login to view URL]
It then cycles through every date until the end date. The script looks at each page/date, and determines what companies released earnings on this date. E.g. on this page:
[login to view URL]
we can see two releases. The companies releasing earnings are then stored according to their ticker, rather than their name. On the above page, we thus have “TWL” and “SNCL” releasing earnigns. The appropriate ticker can be found within the link for each company. The country code (e.g. “[login to view URL]” is not necessary, and should be removed).
Thus, the information to be stored for each date is
1) the ticker name, and
2) the associated release date.
Also, please note that when there are many earnings announcements on a single day, a few pages will need to be processed. For example, look at the page:
[login to view URL]
There are several more pages of releases on this date. These can be manually looked at by pressing the “next” button at the bottom of each page. All the earnings for every date are required.
After processing all the required dates/pages, the data is outputted to a series of files. Each file represents a separate ticker, and contain the earnings announcement dates for the related ticker. They will be named “[login to view URL]” (e.g. for our example, it will be [login to view URL]). Each file is a comma separated file with the following format. The first line is a header “Date, Time” , followed by each earnings announcement. The time is always “0800” by default. E.g for TWL, the first line will be:
Date, Time
20120104, 0800
This will be followed by a line for each other date that this company released earnings on.
Hello, I'm an IT professional with decades of experience in different IT fields, including PERL scripts. I'm interested in your project, and have sent you some details in a private message. Regards, Solt