Data Wrangling
Importing Data Select, Drop & Rename Filter, Sort & Sample Add Columns Cleaning Data Dates & Time Join Data Aggregate & Transform
Data Analysis
Exploring Data Plotting Continuous Variables Plotting Discrete Variables
Machine Learning
Data Preparation Linear Models
Other Tutorials & Content
Learn Python for Data Science Learn Alteryx Blog



Aggregating & Transforming Data with Python Pandas

GroupBy and Count

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Order_Date and ProductID columns and counts the Order_No column:

orders_by_day = orders.groupby(['Order_Date', 'ProductID'],as_index=False) ['Order_No'].count()

GroupBy and Sum

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Order_Date and ProductID columns and sums the units column:

orders_by_day = orders.groupby(['Order_Date', 'ProductID'],as_index=False) ['units'].sum()

GroupBy and Agg

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Genre column and determines the mean, maximum, minimum and standard deviation of each value:

genre_aggregations = products.groupby(['Genre'], as_index=False)['Cost'].agg(['mean','max','min','std']).reset_index()

Running Total Column

Creates a column in the orders DataFrame that contains a running total for each ProductID over time:

orders.sort_values(by=['Order_Date'], inplace=True)
orders['running_total'] = orders.groupby(['ProductID'],as_index=False) ['units'].cumsum()

Pivot Data

Creates a new DataFrame that pivots the orders Dataframe so the rows are product IDs, the columns are order dates and the cell values are units sold. Any NAs are filled with 0s.

units_by_day = orders.groupby(['Order_Date','ProductID'],as_index=False)['units'].sum()
units_by_day_pivot = units_by_day.pivot(index='ProductID', columns='Order_Date', values='units').reset_index()
units_by_day_pivot.fillna(0, inplace=True)