Hadoop/Cloudera Search /Web crawling/Twitter&Facebook API
$750-1500 USD
Paiement à la livraison
We have a Hadoop Cluster running CDH with cloudera Manager Opensource
We need to use HDFS in our hadoop cluster installation with 6 nodes, to make a web crawler, to search for words sentences and post in different web pages /ads web pages and other post (most in web pages like ebay, and other spanish add's web pages) to crawl the web and search for some patterns (the idea is to work like a spider in more than 50 website with out stop)
If the pattern it the same we need to store the HTML of the web in the hadoop cluster (cloud be different websites in time with different word search's ) just the web page that the pattern that we are searching exist.
Also connect to Twitter API and facebook and correlate different Users and hashtags with Flume (could be different twitter accounts, or hashtags of facebook hashtags or users comments to download )
ones all the info is in the hadoop cluster we need to put all the info in the Cloudera search to search word's and index And index all the info in HIVE or PIG For extract the Info in [url removed, login to view] library
the idea is to centralise some information and search in sorl and report in [url removed, login to view] if we find some word over time and the frequency of the words.
Nº du projet : #4916994
À propos du projet
7 freelances font une offre moyenne de 2366 $ pour ce travail
I have 8+ years of experience in J2EE programming and is familiar with Struts/Spring/Hibernate framework. Please kindly check your PMB.
We are aggregated with innovation, creativity, ability, Knowledge, experience, convergent thinker, skills, professionalism etc. It'll be our pleasure to work with you. Check PM for more.
Hello! I am an experienced Hadoop developer. I am sorry for bothering but do you accept hourly payment? Thanks
I have been CTO of many well known companies, and I am Chief Data Scientist for an up and coming company. I have most of the pieces for your project ready in my personal library of code that I have been building up ov Plus