# P2: Investigating Lahman's Baseball Database > Investigation of a curated dataset sourced from Lahman's Baseball Database is done to study player-performance metrics for their significance in match-winning contributions & player's salary. ## About In this project, Lahman's Baseball Database is analyzed with the help of Python libraries like NumPy, Pandas & Matplotlib. The findings are reported in a Jupyter notebook. #### The activities implemented in this project are: 1. Choose a dataset from the [provided list](https://docs.google.com/document/d/e/2PACX-1vTlVmknRRnfy_4eTrjw5hYGaiQim5ctr9naaRd4V9du2B5bxpd8FEH3KtDgp8qVekw7Cj1GLk1IXdZi/). 2. Go-through the dataset and brainstorm the questions that can be answered using it. 3. Use NumPy, Pandas, and Matplotlib to answer these questions. 4. Summarize the findings. ## Learning Outcome The project helped me understand the steps involved in a typical data analysis process – mainly learning to pose questions that can be answered with a given dataset and then answering those questions. On the technical front, I learned to use vectorized operations in NumPy and Pandas to speed up data analysis code, be familiar with Pandas' Series and DataFrame objects, and use Matplotlib to produce plots showing. ## Files - `data` – Directory containing data. - `Investigating_Lahman_Baseball_Database.ipynb` – Main project file. - `Investigating_Lahman_Baseball_Database.html` – HTML export of the project notebook. ## Requirements This project requires **Python 3** with `NumPy`, `Pandas`, `Matplotlib` & `Seaborn`. It is recommended to use [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. ## License [Modified MIT License © Pranav Suri](/License.txt)