Investigation of a curated dataset sourced from Lahman's Baseball Database is done to study player-performance metrics for their significance in match-winning contributions & player's salary.
In this project, Lahman's Baseball Database is analyzed with the help of Python libraries like NumPy, Pandas & Matplotlib. The findings are reported in a Jupyter notebook.
Choose a dataset from the provided list.
Go-through the dataset and brainstorm the questions that can be answered using it.
Use NumPy, Pandas, and Matplotlib to answer these questions.
Summarize the findings.
The project helped me understand the steps involved in a typical data analysis process – mainly learning to pose questions that can be answered with a given dataset and then answering those questions.
On the technical front, I learned to use vectorized operations in NumPy and Pandas to speed up data analysis code, be familiar with Pandas' Series and DataFrame objects, and use Matplotlib to produce plots showing.
data
– Directory containing data.
Investigating_Lahman_Baseball_Database.ipynb
– Main project file.
Investigating_Lahman_Baseball_Database.html
– HTML export of the project notebook.
This project requires Python 3 with NumPy
, Pandas
, Matplotlib
& Seaborn
.
It is recommended to use Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.