Other Tutorials & Content
Learn Alteryx Learn Python for Data Science Python Data Science Reference

Join Types

How Joins Work

We use a join when we want to blend fields from one data source with another to create a new dataset with fields from both original data sources side by side. We can create joins when there is a relationship between one or more fields in both data sources. In practice, this means that at least one field in the first data source contains values that also appear in a field in the second source. For Excel users, a join serves a similar purpose to a lookup function but is far more expansive and flexible.

Let’s look at an example. In the two data sources below (order data and customer data) we can see that the Customer ID field appears in both sets of data and that they contain values that appear in both sources such as 4002. Therefore there is a relationship between the two sources through the “Customer_ID” fields that we can call the relational fields. In this case, joining the two data sources on the Customer ID field would result in matches between four out of the five records in each source.

Order_No Order_Date Customer_ID ProductID
01259 22/11/17 4002 376401
14716 01/04/17 3979 218178
23441 21/11/17 3382 218178
88155 19/11/17 3552 355904
42078 10/05/17 1654 227774

Customer_ID Age Gender Location
4002 60 M Non-US
1681 37 M US
3382 20 F US
3552 40 F Non-US
3979 52 F US

We now know that we can join these two sources to create a new data set but the records contained in this new data will depend on the type of join we use. Let’s look at join types now.

Inner Joins

An inner join returns a data set that includes ONLY records where the values in the relational field(s) match. In the example below we have used an inner join to blend our order and customer data sources. As we saw above, there are four records from each source with matching customer IDs and therefore an inner join between the two data sources creates the following data set:

Order_No Order_Date Customer_ID ProductID Customer_ID Age Gender Location
01259 22/11/17 4002 376401 4002 60 M Non-US
23441 21/11/17 3382 218178 3382 20 F US
88155 19/11/17 3552 355904 3552 40 F Non-US
14716 01/04/17 3979 218178 3979 52 F US

Outer Joins

Outer joins have three variation; left, right and outer.

A full outer join creates a data set that contains all records from both data sources but where there is aisn’t a matching value in the relational field(s) in either of the then fields from this record will return as Nulls. As we can see below customer_ID 1681 only appears in table 2 and as such table 1 fields for this record are Null. The same is true for customer_ID 1654 in table 1.

Order_No Order_Date Customer_ID ProductID Customer_ID Age Gender Location
01259 22/11/17 4002 376401 4002 60 M Non-US
23441 21/11/17 3382 218178 3382 20 F US
88155 19/11/17 3552 355904 3552 40 F Non-US
14716 01/04/17 3979 218178 3979 52 F US
42078 10/05/17 1654 227774 NULL NULL NULL NULL
NULL NULL NULL NULL 1681 37 M US

If we take the order data source as the left and join this to the customer data source as the right data source then performing a left outer join means that we take all the records from order data regardless of whether there is a matching record in the customer data source. Any records that appear in the order data without a match in the customer data will appear with NULLs for the customer data fields in the joined data set. In the below example customer_ID 1654 doesn’t appear in the customer data, therefore performing a left outer join creates the following table with NULLs in the customer data fields for the records that includes the non-matching value in the relational field.

Order_No Order_Date Customer_ID ProductID Customer_ID Age Gender Location
01259 22/11/17 4002 376401 4002 60 M Non-US
23441 21/11/17 3382 218178 3382 20 F US
88155 19/11/17 3552 355904 3552 40 F Non-US
14716 01/04/17 3979 218178 3979 52 F US
42078 10/05/17 1654 227774 NULL NULL NULL NULL

A right outer join does the same as the left outer join but with the logic reversed in that all records from the right data sources are retained and non-matching records from the left table are discarded as shown below.

Order_No Order_Date Customer_ID ProductID Customer_ID Age Gender Location
01259 22/11/17 4002 376401 4002 60 M Non-US
23441 21/11/17 3382 218178 3382 20 F US
88155 19/11/17 3552 355904 3552 40 F Non-US
14716 01/04/17 3979 218178 3979 52 F US
NULL NULL NULL NULL 1681 37 M US

Union

The joins we have seen so far create new data sets by joining row-wise (side-by-side). A union joins the data column-wise and basically appends any rows from one data set to another where there are common fields. In our example, if we performed a union the only common field is Customer_ID and we would create the following data set.

Order_No Order_Date Customer_ID ProductID Age Gender Location
01259 22/11/17 4002 376401 NULL NULL NULL
14716 01/04/17 3979 218178 NULL NULL NULL
23441 21/11/17 3382 218178 NULL NULL NULL
88155 19/11/17 3552 355904 NULL NULL NULL
42078 10/05/17 1654 227774 NULL NULL NULL
NULL NULL 4002 NULL 60 M Non-US
NULL NULL 1681 NULL 37 M US
NULL NULL 3382 NULL 20 F US
NULL NULL 3552 NULL 40 F Non-US
NULL NULL 3979 NULL 52 F US