Sources and further steps
The dataset on which we based this datastory is the CMU Movie Summary Corpus.
Additional datasets
As we are computing a success rating for each movie, we need to take into account the reviews, the inflation adjusted profits to be able to compare the profitability of each movie and the Oscars nominations and awards. This data is not available in the provided datasets. We use these additional datasets in order to get them:
- Database online: IMDb free database
- Data scrapping of IMDb reviews (if we have time): IMDb scores
- Data for movie budgets : TMDB dataset
- Oscars nominations and wins dataset : Oscars
- Dataset used to compute inflation adjusted profits : US inflation data
Additional ideas
These are some ideas that we wanted to implement, but we need more ressources and time.
- Extract sentiment analysis score of a movie from plot.
- Extract sentiment score of movie reviews and tweets.
- Multiple analysis on the Stanford CoreNLP dataset.