# P4: Exploratory Data Analysis > This project is about investigating a dataset on chemical properties and quality ratings of wine samples by using exploratory data analysis techniques. The primary research target of the investigation was to find the chemical properties that affect the quality of red wines. ## About Exploratory Data Analysis (EDA) is the numerical and graphical examination of data characteristics and relationships before formal, rigorous statistical analyses are applied. In this project, exploratory data analysis is conducted to explore the variables, structure, patterns, oddities, and underlying relationships of factors that affect wine quality. #### The activities implemented in this project are: 1. Choose a dataset from the [provided list](https://docs.google.com/document/d/e/2PACX-1vRmVtjQrgEPfE3VoiOrdeZ7vLPO_p3KRdb_o-z6E_YJ65tDOiXkwsDpLFKI3lUxbD6UlYtQHXvwiZKx/pub). 2. Explore the dataset and plan the analysis. 2. Univariate, bivariate and multivariate analysis. 3. Documenting the analysis. ## Learning Outcome This project helped me learn to use plots to understand the distribution of a variable to check for patterns and their relationships with other variables. Moreover, I learned to create a logical flow when building up from single-variable analysis to multivariate analysis. ## Files - `wineQualityReds.csv` – This dataset is publicly available for research in the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/wine+quality). - `Red_Wine_Quality.rmd` – Main RMD project file containing the analysis. - `Red_Wine_Quality.html` – HTML file knitted from the project file. - `Red_Wine_Quality.R` - R code extract (with documentation). - `References.txt` – List of references. ## Requirements This project was developed using **RStudio** Version 1.0.153 – © 2009-2017 RStudio, Inc (**R** Version 3.4.2). The required packages are `ggplot2`, `gridExtra`, `GGally`, `ggthemes`, `dplyr`, `knitr` and `memisc`. ## License [Modified MIT License © Pranav Suri](/License.txt)