This project's main aim is to build a website which shows users the positive and negative feedbacks of the selected hawker centres and cafes in Singapore. The data needs to be presented interestingly (use visualisations like graph, bar chart etc). This website will help users in choosing an eatery to eat by looking at the feedbacks.
To get the data for the feedbacks, we need to crawl tweets from Twitter using twitter4j and scrap data from Trip Advisor. I have done a twitter crawler and a trip advisor crawler programmes, but there are still things to be done like topic detection and semantic analysis which I'm having problem in completing.
Task 1 :
1. Build a twitter crawler programme and crawl 200 hashtags. I chose to talk about 10 hawker centres and 10 cafes in Singapore (20 eateries in total).
Hence, I am crawling 10 hash tags for each eatery. 10 x 20 = 200 hashtags. (hashtags uploaded in word document)
2. Build a web scraper for trip advisor website. Same as twitter, I need to crawl as many information as I can for the 20 chosen eateries.
Task 1 Status: Done, but codes can definitely be improved. Sourcecode has been uploaded.
Task 2 : Insert all the data into MongoDB
- Check for duplicates
- Set a combined primary key
- One eatery is considered as one collection. Each tweet is one document.
Task 3 : Semantic Analysis
- Do a semantic analysis for all the tweets and data from trip advisor. I have uploaded the database for the data crawled last month till this week.
- Semantic analysis: To determine whether a phrase is positive, neutral, negative neutral or negative. They are indicated by 1, 2, 3, 4 (4 being positive). It is recommended to get a lot of 3 and 4 to help in the website's visualisation afterwards.
Task 4 : Topic Detection
- Now, to detect the topic (subject) of the tweet/phrases to have a rough gauge of what the sentence is about.
- Remember that all this needs to be stored inside MongoDB as well (create another field for each collection or create a separate database, depending on which is better).
Task 5 : Create a website to visualise the results
- All the data needs to be visualised into a website
- Create meaningful stories to show why this eatery is good or bad
- I need help in converting the table and displaying the data in the website. I will come our with possible stories later, you do not need to worry about the story part.