En cours

Data Mining From Digg (by script)

Data Mining Task from Digg

You'll be supplied with a list of movie titles.

Your task is to gather the following data

- List of Digg Submissions related to the movie based on the following search terms:

a. search for ||movie_name movie||

b. search for ||movie_name film||

c. search for ||movie_name trailer||

d. search for ||movie_name watch||

e. search for ||movie_name see||

IE for the movie "The Eye" you will run the following separates searches

"The Eye movie"

"The Eye film"

"The Eye Trailer"

"The Eye watch"

"The Eye see"

All *without* the double quotes!!

All searches should be combined and duplicates deleted (delete only exact duplicates, that leads to the same digg submission, not the same external URL)!

Digg search settings: "Title, Description, and URL", "All Stories". "Including burried: NO"

The results should be saved in a table (preferably excel, CSV is also possible) with the following data

ID (auto increment Serial Number), Date Submitted (dd/mm/yyyy), Title, Full URL of DIGG Item, FULL URL ITEM IS LINKED TO, number of diggs, number of comments, Made Popular(YES/NO)

Please note that the date appears on digg as a relative date (ie 2 years 34 days ago). This should of course be converted to the exact data).

Made Popular: Regular diggs (not popular) shows the following text on search result: "username" submitted "342 days ago"

Popular items shows the following text instead: "Username" made popular "342 days ago"

Sample data attached. Please make sure you understand the requirements before posting your bid.

I expect this to be done, as accurately as possible by script (automatically) and in 2-3 days.

Compétences : Traitement de Données, Python, Recherche, Ruby on Rails

en voir plus : digg data mining, trailer text, quotes submission, from quotes, data processing requirements, data processing in data mining, combined result, excel incrementserial number, digg example data, make a trailer, delete searches, d python, quotes leads, trailer, sample data, python task, python excel, python data, python auto, Movie Script

Concernant l'employeur :
( 5 commentaires ) Netanya, United Kingdom

Nº du projet : #324462