Sources and further steps

The dataset on which we based this datastory is the CMU Movie Summary Corpus.

Additional datasets

As we are computing a success rating for each movie, we need to take into account the reviews, the inflation adjusted profits to be able to compare the profitability of each movie and the Oscars nominations and awards. This data is not available in the provided datasets. We use these additional datasets in order to get them:

Database online: IMDb free database
Data scrapping of IMDb reviews (if we have time): IMDb scores
Data for movie budgets : TMDB dataset
Oscars nominations and wins dataset : Oscars
Dataset used to compute inflation adjusted profits : US inflation data

Additional ideas

These are some ideas that we wanted to implement, but we need more ressources and time.

Extract sentiment analysis score of a movie from plot.
Extract sentiment score of movie reviews and tweets.
Multiple analysis on the Stanford CoreNLP dataset.