Joins Types

Follow @AnalyseUp

One of Alteryx main strengths is its ability to quickly blend data from multiple data sources to build one complete data set. We can do this through the use of joins. If you come from a SQL background you should already be familiar with the theory behind joins and may wish to move on to the next section on implementing joins in Alteryx. If not, then let’s carry on and explore how joins work.

How Joins Work

We use a join when we want to blend fields from one data source with another to create a new dataset with fields from both original data sources side by side. We can create joins when there is a relationship between one or more fields in both data sources. In practice, this means that at least one field in the first data source contains values that also appear in a field in the second source. For Excel users, a join serves a similar purpose to a lookup function but is far more expansive and flexible.

Let’s look at an example. In the two data sources below (order data and customer data) we can see that the Customer ID field appears in both sets of data and that they contain values that appear in both sources such as 4002. Therefore there is a relationship between the two sources through the “Customer_ID” fields that we can call the relational fields. In this case, joining the two data sources on the Customer ID field would result in matches between four out of the five records in each source.

Order_No	Order_Date	Customer_ID	ProductID
01259	22/11/17	4002	376401
14716	01/04/17	3979	218178
23441	21/11/17	3382	218178
88155	19/11/17	3552	355904
42078	10/05/17	1654	227774

Customer_ID	Age	Gender	Location
4002	60	M	Non-US
1681	37	M	US
3382	20	F	US
3552	40	F	Non-US
3979	52	F	US

We now know that we can join these two sources to create a new data set but the records contained in this new data will depend on the type of join we use. Let’s look at join types now.

Inner Joins

An inner join returns a data set that includes ONLY records where the values in the relational field(s) match. In the example below we have used an inner join to blend our order and customer data sources. As we saw above, there are four records from each source with matching customer IDs and therefore an inner join between the two data sources creates the following data set:

Order_No	Order_Date	Customer_ID	ProductID	Customer_ID	Age	Gender	Location
01259	22/11/17	4002	376401	4002	60	M	Non-US
23441	21/11/17	3382	218178	3382	20	F	US
88155	19/11/17	3552	355904	3552	40	F	Non-US
14716	01/04/17	3979	218178	3979	52	F	US

Outer Joins

Outer joins have three variation; left, right and outer.

A full outer join creates a data set that contains all records from both data sources but where there is aisn’t a matching value in the relational field(s) in either of the then fields from this record will return as Nulls. As we can see below customer_ID 1681 only appears in table 2 and as such table 1 fields for this record are Null. The same is true for customer_ID 1654 in table 1.

Order_No	Order_Date	Customer_ID	ProductID	Customer_ID	Age	Gender	Location
01259	22/11/17	4002	376401	4002	60	M	Non-US
23441	21/11/17	3382	218178	3382	20	F	US
88155	19/11/17	3552	355904	3552	40	F	Non-US
14716	01/04/17	3979	218178	3979	52	F	US
42078	10/05/17	1654	227774	NULL	NULL	NULL	NULL
NULL	NULL	NULL	NULL	1681	37	M	US

If we take the order data source as the left (either written first in a SQL script or connected to the “L” input tool in Alteryx) and join this to the customer data source as the right data source then performing a left outer join means that we take all the records from order data regardless of whether there is a matching record in the customer data source. Any records that appear in the order data without a match in the customer data will appear with NULLs for the customer data fields in the joined data set. In the below example customer_ID 1654 doesn’t appear in the customer data, therefore performing a left outer join creates the following table with NULLs in the customer data fields for the records that includes the non-matching value in the relational field.

Order_No	Order_Date	Customer_ID	ProductID	Customer_ID	Age	Gender	Location
01259	22/11/17	4002	376401	4002	60	M	Non-US
23441	21/11/17	3382	218178	3382	20	F	US
88155	19/11/17	3552	355904	3552	40	F	Non-US
14716	01/04/17	3979	218178	3979	52	F	US
42078	10/05/17	1654	227774	NULL	NULL	NULL	NULL

A right outer join does the same as the left outer join but with the logic reversed in that all records from the right data sources are retained and non-matching records from the left table are discarded as shown below.

Order_No	Order_Date	Customer_ID	ProductID	Customer_ID	Age	Gender	Location
01259	22/11/17	4002	376401	4002	60	M	Non-US
23441	21/11/17	3382	218178	3382	20	F	US
88155	19/11/17	3552	355904	3552	40	F	Non-US
14716	01/04/17	3979	218178	3979	52	F	US
NULL	NULL	NULL	NULL	1681	37	M	US

Union

The joins we have seen so far create new data sets by joining row-wise (side-by-side). A union joins the data column-wise and basically appends any rows from one data set to another where there are common fields. In our example, if we performed a union the only common field is Customer_ID and we would create the following data set.

Order_No	Order_Date	Customer_ID	ProductID	Age	Gender	Location
01259	22/11/17	4002	376401	NULL	NULL	NULL
14716	01/04/17	3979	218178	NULL	NULL	NULL
23441	21/11/17	3382	218178	NULL	NULL	NULL
88155	19/11/17	3552	355904	NULL	NULL	NULL
42078	10/05/17	1654	227774	NULL	NULL	NULL
NULL	NULL	4002	NULL	60	M	Non-US
NULL	NULL	1681	NULL	37	M	US
NULL	NULL	3382	NULL	20	F	US
NULL	NULL	3552	NULL	40	F	Non-US
NULL	NULL	3979	NULL	52	F	US

Having discussed how joins work, let’s go ahead and utilise these to blend the data in our workflow.

Getting Started

Data Preparation

Filters & Joins

Formulas & Aggregations

Sort & Output

Other Tutorials & Content

Joins Types

How Joins Work

Inner Joins

Outer Joins

Union