hadoop map reduce project

Fermé Publié le il y a 2 ans Paiement à la livraison
Fermé Paiement à la livraison

Phase 2: Implement MR programs to solve unstructured data problems on the HDFS set up. In this phase you will implement the word co-occurrence MR algorithm

discussed in the Lin and Dyer’s book. You’ll select a data set from publications in any subject area you

are familiar with and prepare co-occurrence or co-author information from the publications. The stripes

method for co-occurrence may be better suited for this application. Map will have to parse and drop the

extra text in the publications. We need only the first author as key and rest of the authors as value and

number of occurrences in a given corpus.

Input: Many publications from an author.

Output: Author as the key and value is the associated array with the co-authors along with number of

occurrences as entry in the associated array.

Mandatory requirement: Every team has to have its own data set and cannot copy each other.

Hadoop Python Map Reduce

Nº du projet : #29921518

À propos du projet

Projet à distance Actif il y a 2 ans