Member-only story
Week 2 and Useful Pandas Techniques
In week two of General Assembly’s Data Science Immersive course, the cohort covered basic statistic principals (such as the Central Limit Theorem, Confidence Intervals, and Z-scores), in addition to covering Pandas, Data Visualization, and sqlite3 / postgreSQL.
Instead of doing a full recap of the week (which could only be extensive), I think it would be best to go through some of the useful things I picked up with pandas. These were my solutions to problems I encountered while processing data using what I learned in class.
Print the head and tail of a DataFrame with one command using np.r_
Feeding np.r_ slices of the first and last five indices generates a single array of integers. This array can be passed to iloc, the pandas integer based indexer, which gives us the head and tail of our DataFrame.
I find this to be useful when looking at the data for the first time, or when chaining methods such as value_counts() and sort_values().
Generate multiple plots in one cell using loops and matplotlib’s subplot function.
In the below, I loop through the columns in the iris dataset (excluding species), and plot each column to a violinplot by species.