Challenge: https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset
Answer the following question: "Can you predict the movie IMDB rating"
You will be expected to explain why you used a specific tool set like SparkML, ScikitLearn, Keras or XGBoost to name a few. Tools below are suggested, but feel free to use any method.
The submission would be graded on the following parameters:
- Ease of use: We should be able to run your script easily
- Communication: Explain why it’s performant and works well in your readme.md file
- Feature Engineering: Deriving keys insights from the sample data
- Creating a visual data narrative: charts and plots must be created from code, and not external standalone software like Tableau or Excel
- Use of best coding practises: for the language and tools used
Code:
- Python: Pandas + Scikit learn Visualization:
- can use any python libraries like plotly, matplotlib, etc
You may submit via github link. Alternatively, send an email back to polymorph as a single ZIP file (< 20 MB) containing:
- Working code with documentation and a README file
- Any generated graphs and graphics
- Documentation including any metadata for any data created
- Brief description of the keys insights from visualization
- An explanation of tools used with a brief reason why you selected each one (in readme.md)