Member-only story

Week 2 and Useful Pandas Techniques

Sean Turner
3 min readAug 8, 2017

--

In week two of General Assembly’s Data Science Immersive course, the cohort covered basic statistic principals (such as the Central Limit Theorem, Confidence Intervals, and Z-scores), in addition to covering Pandas, Data Visualization, and sqlite3 / postgreSQL.

Instead of doing a full recap of the week (which could only be extensive), I think it would be best to go through some of the useful things I picked up with pandas. These were my solutions to problems I encountered while processing data using what I learned in class.

Print the head and tail of a DataFrame with one command using np.r_

Feeding np.r_ slices of the first and last five indices generates a single array of integers. This array can be passed to iloc, the pandas integer based indexer, which gives us the head and tail of our DataFrame.

I find this to be useful when looking at the data for the first time, or when chaining methods such as value_counts() and sort_values().

Generate multiple plots in one cell using loops and matplotlib’s subplot function.

In the below, I loop through the columns in the iris dataset (excluding species), and plot each column to a violinplot by species.

--

--

Sean Turner
Sean Turner

Written by Sean Turner

Devops Engineer and Golang Enthusiast

No responses yet