Data Visualization

Phase 3: Data Visualization (Month 4)

Goal: Tell Stories with Data

Why?

80% of DS interviews ask: "Walk me through your plot"

1 chart > 1000 rows

Land $10K+ in salary for storytelling

Week	Focus	Hours
1	Python Plotting (Matplotlib/Seaborn)	35
2	EDA + Storytelling	35
3	Tableau Public Mastery	35
4	Capstone: Executive Dashboard	30

Week 1: Python Plotting – Matplotlib & Seaborn

Core Libraries

pip install matplotlib seaborn plotly

Essential Plot Types

Plot	Use	Code
Line	Trends	`sns.lineplot(x, y)`
Bar	Compare categories	`sns.barplot(x, y)`
Histogram	Distribution	`sns.histplot(data)`
Box	Outliers, quartiles	`sns.boxplot(x, y)`
Scatter	Correlation	`sns.scatterplot(x, y)`
Heatmap	Correlation matrix	`sns.heatmap(corr)`

Pro Code Template

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load data
df = pd.read_csv("titanic.csv")

# Style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(data=df, x="Pclass", y="Survived", hue="Sex", ax=ax, errorbar=None)

# Labels
ax.set_title("Survival Rate by Class & Gender", fontsize=16, fontweight='bold')
ax.set_xlabel("Passenger Class", fontsize=12)
ax.set_ylabel("Survival Rate", fontsize=12)
ax.legend(title="Gender")

# Annotate
for p in ax.patches:
    ax.annotate(f'{p.get_height():.1%}', 
                (p.get_x() + p.get_width()/2, p.get_height()), 
                ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.savefig("survival_by_class_gender.png", dpi=300)
plt.show()

Resources:

Python Graph Gallery – python-graph-gallery.com
Seaborn Docs – seaborn.pydata.org

Week 2: EDA + Storytelling Framework

5-Second Rule: Can a busy exec understand in 5 sec?

Storytelling Framework (McKinsey Style)

graph TD
    A[Context] --> B[Insight]
    B --> C[Action]

Step	Example
Context	"Titanic had 2224 passengers"
Insight	"Women in 1st class: 97% survived"
Action	"Prioritize women & children in evacuation"

EDA Checklist

df.describe()
df.isnull().sum()
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
sns.pairplot(df, hue="Survived")

Project: Titanic Survival Story

3 plots + 1 insight per plot → eda_titanic.ipynb

Week 3: Tableau Public – Drag, Drop, Wow

Install: Tableau Public (Free)

Core Skills

Skill	How
Connect	CSV, Google Sheets
Calculated Field	`IF [Pclass] = 1 THEN "Rich" ELSE "Poor" END`
Parameters	Dynamic filters
Dashboard	3+ sheets + actions
Story	Sequence of insights

Build 3 Dashboards

#	Dashboard	Dataset
1	Sales Performance	Sample Superstore
2	Customer Segmentation	RFM Analysis
3	Funnel Analysis	E-commerce funnel

Publish: public.tableau.com → Share link

Week 4: Capstone – Executive Dashboard

Project: "Global Happiness Report 2023"

Dataset: World Happiness Report

Deliverables (GitHub: `yourname/data-viz-capstone`)

data-viz-capstone/
├── python/
│   ├── eda_happiness.ipynb
│   └── plots/
│       ├── happiness_vs_gdp.png
│       └── top10_happiest.png
├── tableau/
│   ├── Happiness_Dashboard.twb
│   └── Happiness_Dashboard.png
├── streamlit/
│   └── app.py
└── README.md

1. Python: Key Insights

# Top 10 happiest countries
top10 = df.nlargest(10, 'Happiness Score')
sns.barplot(data=top10, x='Happiness Score', y='Country', palette='viridis')
plt.title("Top 10 Happiest Countries (2023)")
plt.xlabel("Happiness Score")
plt.savefig("plots/top10_happiest.png", dpi=300, bbox_inches='tight')

2. Tableau: Interactive Dashboard

Sheets:

Map (Happiness by Country)
Scatter (GDP vs Happiness)
Bar (Top/Bottom 10)
Trend (Happiness over years)

Actions:

Filter: Region
Highlight: Click country

Publish: tableau.com/your-viz

3. Streamlit: Live App (Bonus)

# streamlit/app.py
import streamlit as st
import plotly.express as px

st.title("World Happiness Dashboard")
df = pd.read_csv("../data/happiness.csv")

region = st.selectbox("Select Region", df['Region'].unique())
filtered = df[df['Region'] == region]

fig = px.scatter(filtered, x="GDP per capita", y="Happiness Score",
                 size="Population", color="Country", hover_name="Country",
                 title=f"Happiness vs GDP in {region}")
st.plotly_chart(fig)

streamlit run streamlit/app.py

`README.md` (Portfolio Gold)

# World Happiness Dashboard

**Live**: [streamlit.app/happiness](https://yourname-happiness.streamlit.app)  
**Tableau**: [public.tableau.com](https://public.tableau.com/views/WorldHappiness2023/Dashboard)  
**Python EDA**: [notebook](python/eda_happiness.ipynb)

## Key Insights
| Insight | Action |
|-------|--------|
| GDP explains 75% of happiness | Invest in economy |
| Social support > Freedom | Build community programs |
| Nordic countries dominate top 10 | Study their policies |

## Tech
- Python: Matplotlib, Seaborn, Plotly
- Tableau Public: Interactive dashboard
- Streamlit: Live web app

Interview-Ready Plots

Question	Your Plot
"Show correlation"	`sns.heatmap(corr, annot=True)`
"Outliers?"	`sns.boxplot()`
"Trend over time?"	`sns.lineplot()`
"Compare groups?"	`sns.catplot()`

Assessment: Can You Build This?

Task	Yes/No
Python: 5-plot EDA	☐
Tableau: Interactive dashboard	☐
Streamlit: Live filter	☐
3 insights with actions	☐
Published + shared	☐

All Yes → You’re visualization-ready!

Free Resources Summary

Tool	Link
Python Graph Gallery	python-graph-gallery.com
Seaborn Examples	seaborn.pydata.org/examples
Tableau Public	public.tableau.com
Sample Superstore	tableau.com/sample-data
Streamlit Docs	docs.streamlit.io

Pro Tips

Never use default colors → sns.set_palette("colorblind")
Annotate everything → %, n=, p<0.01
Export high-res → dpi=300
Tell a story → Context → Insight → Action
Add to resume:

"Built interactive Tableau dashboard with 10K+ views"

Next: Phase 4 – Machine Learning Core

You can show data → now predict it.

Start Now:

Download World Happiness Report
Open Jupyter:

import seaborn as sns
df = pd.read_csv("happiness.csv")
sns.scatterplot(data=df, x="GDP per capita", y="Happiness Score", hue="Region")

Save plot → Push to GitHub

Tag me when you publish your Tableau viz!
You now communicate like a senior analyst.