We want to find a partner to help us get started in building a big data / datawarehousing system in Hadoop and Hive (or suggested alternative) to run alongside our operational system. This new big data store will provide api, reporting and datawarehousing functions, the latter to drive a tool like Tableau. This data store will then develop to receive streamed and historical batch data and generate metrics from map-reduce calls.
We are a real time vehicle tracking solutions provider collecting information about vehicle positions and their adherence to scheduled journeys.
We are new to the Hadoop world and want to get up to speed ourselves during this development. Therefore we want a full end to end development including initial system setup, development of a data loading/streaming method and provision of a small set of data output methods (RESTful API calls, reports and an initial datawarehouse structure ).
We envisage the following tasks :-
1. Deploy Hadoop/Hive (or suggested alternative) on a virtual Debian server which we will provide access to. We want this performed in such a way that it can easily be expanded into new nodes and would want to see data distributed across more than 1 system/node.
2. Develop a data load process to pull information from our transactional system and load into Hadoop/Hive. This initial data set will be a block of data per day containing a couple of metrics but with quite a few decriptive fields as a vehicle code, location_code. Timestamp, customer code. If this was a traditional star schema then there would be about dimensions. This data has a time based aspect and has a geospatial aspect. We would be able to provide this in Csv format or from an API call. In
3. Develop some map reduce functions to generate some useful metrics and agregations which we can agree.
4. Make some of the metrics available using the existing HADOOP/Hive/RESTful technologies in order to provide an API.
5. It would be nice from us to access the datastore from PHP using perhaps a Hive/ODBC driver not sure if this is possible but it would be good to try this.
6. Organize the data so that an OLAP tool can be used against it. For example, use Pentaho or Tableau to generate some queries to be able to pivot ad drill down. Especially important is to be able to show aggregate data for say a year and drill into month, day etc. Also would be good to be able to show geographical data.
We are interested in working with someone who can recommend the paths to take to make this system expandible, fast and easily accessible and to help us make the best choices. For example help in deciding which database to use would be welcome.
Please bid only if yu have experience of this in the past. If you interested in bidding for this week we would like to hear about your experience in similar projects and your views on whether this is a sensible approach.
40 freelance ont fait une offre moyenne de 2693 £ pour ce travail
Hi, Mark here, I would be interested in discussing this project with you. Thanks for the consideration, I hope to hear from you soon Thanks, Mark
Hi, I am a senior developer with the skills that you need for the job. I can partner with you in this project. I hope we can talk better soon. Best regards, Norberto