We are working on an exciting data driven project to better understand weather patterns leading to medium-term extreme weather events in the U.S.
We have access to a vast amount of weather variable data in digital format (temperature, precipitation, wind speed, etc.) from thousands of individual weather stations around the world for the last 100 years, updated on an hourly basis.
At the core, we want to build a big data platform that is extremely flexible and scalable to be able to analyze (ultimately) pentabytes of data across an extremely wide and ever increasing wealth of weather variables in order to predict medium-term likelihood of extreme weather events initially for our clients in the insurance / reinsurance industry.
To do so most efficiently, we would like to use Apache Hadoop as the framework, and Amazon EMR as the web service running on web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
For the dashboard UI we would consider Pentaho for visualization possibly in conjunction with Jasper, or possibly custom build the UI and ESRI ArcGIS for graphic representation.
For our statistical engine which will power a self-learning predictive algorithm we will likely use R-Project.
As such, we need to create a facility to download this data from our external sources into a sophisticated Hadoop / AWS data warehouse we manage, better structure this data and continually update our data set, as well as create self-learning statistical algorithms for future event predictive analysis.
Therefore, there are three essential components to the project:
1. Data analysis and Hadoop / AWS platform, DB construction & data import
2. Statistical analysis, self-learning algorithm construction
3. Weather event probability representation in GIS, Pentaho, Jasper, Palantir, etc. for internal and client use.
Current data set available via FTP here:
ftp://[url removed, login to view]
Future daily update data with the API found here:
[url removed, login to view]
(we will provide our access token to the winning bidder)
Please see the readme files in the FTP site, as well as this link for weather station information:
[url removed, login to view]
and the attached document for a comprehensive list of available variables.
We are looking to work with a web application and big data design and development team with substantial statistical mathematics experience and expertise over the long term to initially assist with our data extraction, storage and export needs, and to further develop a more fully featured online weather data predictive application.
The project may be bid on in its entirety, or for 1 or more of the 3 components. There may be more than 1 winning bidder. It is unlikely that one team can manage all three components. Bidding teams must be able to manage end-to-end the component(s) they bid on. No subcontracting allowed. Here are the requirements for each component of the project:
1. DATA ANALYSIS: Extensive and demonstrable Hadoop / AWS / Pentaho experience with verifiable client references completing enterprise grade, big data projects
2. STATISTICAL ANALYSIS: Extensive experience building robust statistical predictive algorithms using a range of sophisticated, contemporary statistical models, tied into big data applications. Your team must have verifiable experience building and testing algorithms.
3. UI: UI experience including building complex dashboards with Pentaho, Jasper, etc.; experience rendering data in GIS format using providers such as ERISA.
After carefully reviewing this project summary, and attached documents and sampling the initial data set, please submit your bid for project component(s) described above accompanied by a detailed timeline, if you and your team meet our conditions and have the capacity to take on substantial follow on work.
5 freelance font une offre moyenne de $54/heure pour ce travail
i am keen to discuss further about this project, i have more than 12 years of experience in Data Warehouse looking forward to connect and discusss further over IM