Aggregating & Transforming Data with Python Pandas

Follow @AnalyseUp

GroupBy and Count

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Order_Date and ProductID columns and counts the Order_No column:

orders_by_day = orders.groupby(['Order_Date', 'ProductID'],as_index=False) ['Order_No'].count()

GroupBy and Sum

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Order_Date and ProductID columns and sums the units column:

orders_by_day = orders.groupby(['Order_Date', 'ProductID'],as_index=False) ['units'].sum()

GroupBy and Agg

Create a new DataFrame that groups the Orders DataFrame by the distinct values in the Genre column and determines the mean, maximum, minimum and standard deviation of each value:

genre_aggregations = products.groupby(['Genre'], as_index=False)['Cost'].agg(['mean','max','min','std']).reset_index()

Running Total Column

Creates a column in the orders DataFrame that contains a running total for each ProductID over time:

orders.sort_values(by=['Order_Date'], inplace=True)

orders['running_total'] = orders.groupby(['ProductID'],as_index=False) ['units'].cumsum()

Pivot Data

Creates a new DataFrame that pivots the orders Dataframe so the rows are product IDs, the columns are order dates and the cell values are units sold. Any NAs are filled with 0s.

                              units_by_day = orders.groupby(['Order_Date','ProductID'],as_index=False)['units'].sum()
                          
                              units_by_day_pivot = units_by_day.pivot(index='ProductID', columns='Order_Date', values='units').reset_index()
                          
                              units_by_day_pivot.fillna(0, inplace=True)

Data Wrangling

Data Analysis