OLAP Constructs - ROLLUP, CUBE, and GROUPING SETS

Before you Begin

Use the following CFN template to launch the resources needed in your AWS account.

Launch

The stack will deploy Redshift serverless endpoint(workgroup-xxxxxxx) and also a provisioned cluster(consumercluster-xxxxxxxxxx). This demo works on both serverless and provisioned cluster.

Client Tool

This demo utilizes the Redshift web-based Query Editor v2. Navigate to the Query editor v2.

Below credentials should be used for both the Serverless endpoint (workgroup-xxxxxxx) as well as the provisioned cluster (consumercluster-xxxxxxxxxx).

User Name: awsuser
Password: Awsuser123

Challenge

Miguel has recently been tasked to perform multi-dimensional analysis on supplier data. He needs to write complex aggregation logic for finding totals, sub total of supplier account balances at nation level, region level and for each of the combinations.

ROLLUP, CUBE and GROUPING SETS are SQL aggregation extensions that allow her to perform multiple GROUP BY operations in the same statement. With these new SQL constructs GROUPTING SETS, ROLLUP and CUBE, he can avoid complex data processing code in the applications and also help improve your performance.

He took a sample from the result of the following query run on TPC-H dataset.

select s_suppkey supp_id, r.r_name region_nm,n.n_name nation_nm, s.s_acctbal acct_balance
from supplier s, nation n, region r
where
s.s_nationkey = n.n_nationkey
and n.n_regionkey = r.r_regionkey

Create table and load data

SayDoShow

Miguel needs to create the table structures that will hold the sales data. Create a supplier sample table and insert sample data.

Create the tables as below:

DROP TABLE IF EXISTS supp_sample;

CREATE TABLE IF NOT EXISTS supp_sample (
        supp_id integer, 
        region_nm char(25), 
        nation_nm char(25), 
        acct_balance numeric(12,2)
        );

INSERT INTO public.supp_sample (supp_id,region_nm,nation_nm,acct_balance) 
VALUES  
(90470,'AFRICA                   ','KENYA                    ',1745.57),  
(99910,'AFRICA                   ','ALGERIA                  ',3659.98),  
(26398,'AMERICA                  ','UNITED STATES            ',2575.77),  
(43908,'AMERICA                  ','CANADA                   ',1428.27),  
(3882,'AMERICA                  ','UNITED STATES            ',7932.67),  
(42168,'ASIA                     ','JAPAN                    ',343.34),  
(68461,'ASIA                     ','CHINA                    ',2216.11),  
(89676,'ASIA                     ','INDIA                    ',4160.75),  
(52670,'EUROPE                   ','RUSSIA                   ',2469.40),  
(32190,'EUROPE                   ','RUSSIA                   ',1119.55),  
(19587,'EUROPE                   ','GERMANY                  ',9904.98),  
(1134,'MIDDLE EAST              ','EGYPT                    ',7977.48),  
(35213,'MIDDLE EAST              ','EGYPT                    ',737.28),  
(36132,'MIDDLE EAST              ','JORDAN                   ',5052.87);

Let us start by first reviewing the sample data before running the SQLs using GROUPING SETS, ROLLUP, and CUBE extensions. It has list of suppliers , region, nation and their account balances.

select * from supp_sample;

Analyze data

SayDoShow

GROUPING SETS Group data multiple different ways. GROUPING SETS can be used instead of performing multiple select queries with different GROUP BY keys and merge (i.e., Union) their results. Below query groups data by both region_nm and nation_nm.In this section, we will show how to find out

  • account balances aggregated for each region

  • account balances aggregated for each nation

  • merge results of both aggregations

by running single SQL statement using GROUPING SETS.

SELECT region_nm, nation_nm, sum(acct_balance) as total_balance
FROM supp_sample
GROUP BY GROUPING SETS (region_nm, nation_nm);

ROLLUP Group data by particular columns and get extra rows that represent the subtotals. ROLLUP assumes a hierarchy among the GROUP BY columns.

In this section, we will show how to find out

  • account balances for each combination of region_nm and nation_nm

  • rolled up account balances for each region_nm

  • rolled up account balances for all regionsin single SQL statement using ROLLUP.

The rows with a value for region_nm and NULL value for nation_nm represent the subtotals for the region (marked in green). The rows with NULL value for both region_nm and nation_nm has the grand total—the rolled-up account balances for all regions (marked in red).

SELECT region_nm, nation_nm, sum(acct_balance) as total_balanceFROM supp_sampleGROUP BY ROLLUP (region_nm, nation_nm)ORDER BY region_nm,nation_nm;

CUBE

Similar to ROLLUP, CUBE will group data by particular columns and get extra rows that represent the subtotals. Also, it will generate subtotals for all combinations of grouping columns.

In this section, we will show how to find out

  • account balance sub totals for each nation_nm

  • account balance sub totals for each region_nm

  • account balance sub totals for each group of region_nm and nation_nmcombination

  • Overall total account balance for all regions

in single SQL statement using CUBEYou can see the subtotals at region level (marked in green). These subtotal records are the same records generated by ROLLUP. Additionally, CUBE generated subtotals for each nation_nm (marked in yellow). Finally, you can also see the grand total for all three regions mentioned in the query (marked in red)

SELECT region_nm, nation_nm, sum(acct_balance) as total_balance
FROM supp_sample
WHERE region_nm in ('AFRICA','AMERICA','ASIA') 
GROUP BY CUBE(region_nm, nation_nm)ORDER BY region_nm, nation_nm;

NULL is a valid value in a column that participates in a GROUPING SET/ROLLUP/CUBE and it is not aggregated with the NULL values added explicitly to the result set to satisfy the schema of returning tuples. Below is an example with a ROLLUP, where the NULL value of an input column and the NULL values added explicitly to the output are highlighted in green and red, respectively.

– Create example orders table and insert sample records

CREATE TABLE orders(item_no int,description varchar,quantity int);

INSERT INTO orders(item_no,description,quantity) 
VALUES(101,'apples',10),(102,null,15),(103,'banana',20);

–View the data

SELECT * FROM orders;

Run rollup query on the data with NULLs. Observe that there are two output rows for item_no 102 .

SELECT item_no, description, sum(quantity) 
FROM orders 
GROUP BY ROLLUP(item_no, description) ORDER BY 1,2;

Before you Leave

Execute the following to clear out any changes you made during the demo to reset it.

DROP TABLE IF EXISTS supp_sample;

If you are done using your cluster, please think about deleting the CFN stack or to avoid having to pay for unused resources do these tasks:

  • pause your Redshift Cluster
  • stop the Oracle database
  • stop the DMS task