본문 바로가기
Study Note/Python

Seaborn plot (Scatter, Hist, and Box)

by jhleeatl 2024. 5. 16.

 

Today, I plan to practice again by using Seaborn to recreate the plots I made last time.

 

 

Seaborn is a Python package for data visualization, built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics.

Seaborn is used to visualize statistical data, particularly to explore relationships between variables. It offers a wide range of plotting functions that are simpler to use compared to Matplotlib, allowing users to create complex visualizations with minimal code.

Key features of Seaborn include:

Providing various types of plots for visualizing datasets, such as scatter plots, histograms, box plots, heatmaps, and line plots.
Offering visualization functions for statistical models, including regression lines, kernel density estimation plots, and more.
Being compatible with Matplotlib, and enhancing its functionality.
Offering aesthetically pleasing default color palettes and style settings, resulting in visually appealing visualizations.


Seaborn facilitates the exploration and understanding of data by providing a high-level interface for creating informative and visually appealing statistical graphics.

 


 

I will create scatter plots, histograms, and box plots using the Iris dataset.

 

Scatter plots.

seaborn.scatterplot(data=None, *, x=None, y=None, hue=None, size=None, style=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, legend='auto', ax=None, **kwargs)

 

 

Make a scatter plot depicting the relationship between 'Sepal Length' and 'Sepal Width', using different colors for each species.

 

I created a scatter plot using Seaborn, but it appears too simple, and I cannot distinguish the different colors for each species.

import seaborn as sns

x = iris['Sepal Length']
y = iris['Sepal Width']


sns.scatterplot(x=x, y=y)

 

checking 'Species'

 

iris['Species'].unique()

#result
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

 

 

 

After making this scatterplot, I realized it is much easier than using Matplotlib. To include all three different species in the plot, I can simply add 'hue

 

 

import seaborn as sns

sns.scatterplot(iris, x='Sepal Length', y='Sepal Width', hue='Species')

 

 

sns.scatterplot(iris, x='Sepal Length', y='Sepal Width', hue='Species', style='Species')

 

 


 

Histogram

seaborn.histplot(data=None, *, x=None, y=None, hue=None, weights=None, stat='count', bins='auto', binwidth=None, binrange=None, discrete=None, cumulative=False, common_bins=True, common_norm=True, multiple='layer', element='bars', fill=True, shrink=1, kde=False, kde_kws=None, line_kws=None, thresh=0, pthresh=None, pmax=None, cbar=False, cbar_ax=None, cbar_kws=None, palette=None, hue_order=None, hue_norm=None, color=None, log_scale=None, legend=True, ax=None, **kwargs)

 

Make a histogram representing the distribution of 'Sepal Length', using different colors for each species.

 

After using 'hue', it was quite easy to create it

 

sns.histplot(iris, x='Sepal Length', hue='Species')

 

 

 

sns.histplot(iris, y='Sepal Length', hue='Species')

 

 

if i add kde, i can see the lines as well

sns.histplot(iris, x='Sepal Length', kde=True, hue='Species')

 

 

 

 


Box Plot

 

seaborn.boxplot(data=None, *, x=None, y=None, hue=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, fill=True, dodge='auto', width=0.8, gap=0, whis=1.5, linecolor='auto', linewidth=None, fliersize=None, hue_norm=None, native_scale=False, log_scale=None, formatter=None, legend='auto', ax=None, **kwargs)

 

 

Make box plots representing the distribution of 'Petal Length' for each species.

 

sns.boxplot(iris, x='Species', y='Petal Length')

 

 

Added color

sns.boxplot(iris, x='Species', y='Petal Length', hue='Species')

 

 

 

Changed x and y

 

sns.boxplot(iris, x='Petal Length', y='Species', hue='Species')

 

 

I've tried using the Seaborn library in various ways. Personally, I feel like Seaborn's automation is better than Matplotlib's, and the colors also seem more appealing.