New Feature Demo - ALF-DataSharing

Amazon Redshift data sharing allows you to securely share live, transactionally consistent data in one Amazon Redshift data warehouse with another Amazon Redshift data warehouse within the same AWS account, across accounts, and across Regions, without needing to copy or move data from one cluster to another. As the organization grows and democratizes the data, administrators want the ability to manage the datashare centrally for governance and auditing, and to enforce fine-grained access control. Amazon Redshift data sharing integration with AWS Lake Formation, enables Amazon Redshift customers to centrally manage access to their Amazon Redshift datashares using Lake Formation. We say/do/show how to centrally manage access and permissions for Amazon Redshift data sharing with Lake Formation in this demo guide.

Objective:

The data mesh is a decentralized, domain-oriented architecture that emphasizes separating data producers from data consumers via a centralized, federated Data Catalog. Typically, the producers and consumers run within their own account.

Use-Case Description:

  • Data producers – Data producers own their data products and are responsible for building their data, maintaining its accuracy, and keeping their data product up to date. They determine what datasets can be published for consumption and share their datasets by registering them with the centralized data catalog in a central governance account. You might have a producer steward or administrator persona for managing the data products with the central governance steward or administrators team.
  • Central governance account – Lake Formation enables fine-grained access management on the shared dataset. The centralized Data Catalog offers consumers the ability to quickly find shared datasets, allows administrators to centrally manage access permissions on shared datasets, and provides security teams the ability to audit and track data product usage across business units.
  • Data consumers – The data consumer obtains access to shared resources from the central governance account. These resources are available inside the consumer’s AWS Glue Data Catalog, allowing fine-grained access on the database and table that can be managed by the consumer’s data stewards and administrators.

Demo Steps:

The following steps provide an overview of how Amazon Redshift data sharing can be governed and managed by Lake Formation in the central governance pattern of a data mesh architecture:

  1. In the producer account, data objects are created and maintained in the Amazon Redshift producer cluster. A data warehouse admin creates the Amazon Redshift datashare and adds datasets (tables, views) to the share.
  2. The data warehouse admin grants and authorizes access on the datashare to the central governance account’s Data Catalog.
  3. In the central governance account, the data lake admin accepts the datashare and creates the AWS Glue database that points to the Amazon Redshift datashare so that Lake Formation can manage it.
  4. The data lake admin shares the AWS Glue database and tables to the consumer account using Lake Formation cross-account sharing.
  5. In the consumer account, the data lake admin accepts the resource share invitation via AWS Resource Access Manager (AWS RAM) and can view the database listed in the account.
  6. The data lake admin defines the fine-grained access control and grants permissions on databases and tables to IAM users (for this post, consumer1 and consumer2) in the account.
  7. In the Amazon Redshift cluster, the data warehouse admin creates an Amazon Redshift database that points to the Glue database and authorizes usage on the created Amazon Redshift database to the IAM users.
  8. The data analyst as an IAM user can now use their preferred tools like the Amazon Redshift query editor to access the dataset based on the Lake Formation fine-grained permissions.

Demo Video

Create Amazon Redshift Data Producer Cluster

SayDoShow
Create cluster subnet group in the data producer account.

On the Amazon Redshift console, create a cluster subnet group. For more information, refer to Managing cluster subnet groups using the console.

Create cluster in the data producer account

Create an Amazon Redshift cluster using the RA3 node type with encryption enabled.

  1. On the Amazon Redshift console, choose Create cluser.
  2. For Cluster identifier, provide the cluster name of your choice.
  3. For Preview track, choose preview_2022.
  4. For Node type, choose one of the RA3 node types as this feature is only supported on the RA3 node type.
  5. For Number of nodes, enter the number of nodes that you need for your cluster.

  1. Under Database configurations, choose the admin user name and admin user password.
  2. Under Cluster permissions, you can select the IAM role and set it as the default.
  1. Turn off the Use defaults option next to Additional configurations to modify the default settings.
  2. Under Network and security, specify the following:
  3. For Virtual private cloud (VPC), choose the VPC you would like to deploy the cluster in.
  4. For VPC security groups, either leave as default or add the security groups of your choice.
  5. For Cluster subnet group, choose the cluster subnet group you created.
  1. Under Database configuration, in the Encryption section, select Use AWS Key Management Service (AWS KMS) or Use a hardware security module (HSM). Encryption is disabled by default.
  2. For Choose an AWS KMS key, you can either choose an existing AWS Key Management Service (AWS KMS) key, or choose Create an AWS KMS key to create a new key. For more information, refer to Creating keys.15. Choose Create cluster.

Create tables in the producer Amazon Redshift cluster using Query editor v2 from Redshift console and logging into the database using admin credentials defined during cluster creation.

create table customer_address
(
 ca_address_sk int4 not null ,
  ca_address_id char(16) not null ,
  ca_street_number char(10) ,      
  ca_street_name varchar(60) ,   
  ca_street_type char(15) ,     
  ca_suite_number char(10) ,    
  ca_city varchar(60) ,         
  ca_county varchar(30) ,       
  ca_state char(2) ,            
  ca_zip char(10) ,             
  ca_country varchar(20) ,      
  ca_gmt_offset numeric(5,2) ,  
  ca_location_type char(20)     
  ,primary key (ca_address_sk)
) distkey(ca_address_sk);

create table customer_demographics
(
  cd_demo_sk int4 not null ,   
  cd_gender char(1) ,          
  cd_marital_status char(1) ,   
  cd_education_status char(20) , 
  cd_purchase_estimate int4 ,   
  cd_credit_rating char(10) ,   
  cd_dep_count int4 ,             
  cd_dep_employed_count int4 ,    
  cd_dep_college_count int4       
  ,primary key (cd_demo_sk)
)distkey (cd_demo_sk);

create table customer
(
  c_customer_sk int4 not null ,                 
  c_customer_id char(16) not null ,             
  c_current_cdemo_sk int4 ,   
  c_current_hdemo_sk int4 ,   
  c_current_addr_sk int4 ,    
  c_first_shipto_date_sk int4 ,                 
  c_first_sales_date_sk int4 ,
  c_salutation char(10) ,     
  c_first_name char(20) ,     
  c_last_name char(30) ,      
  c_preferred_cust_flag char(1) ,               
  c_birth_day int4 ,          
  c_birth_month int4 ,        
  c_birth_year int4 ,         
  c_birth_country varchar(20) ,                 
  c_login char(13) ,          
  c_email_address char(50) ,  
  c_last_review_date_sk int4 ,
  primary key (c_customer_sk)
) distkey(c_customer_sk);

create view customer_view 
as
select
  c_customer_sk,
  c_customer_id,
  c_birth_year,
  c_birth_country,
  c_last_review_date_sk,
  ca_city,
  ca_state,
  ca_zip,
  ca_country,
  ca_gmt_offset,
  cd_gender,
  cd_marital_status,
  cd_education_status
from customer c, customer_address ca, customer_demographics cd
where
c.c_current_addr_sk=ca.ca_address_sk
and c.c_current_cdemo_sk=cd.cd_demo_sk;

Load data into the producer Amazon Redshift cluster

copy customer_address from 's3://redshift-downloads/TPC-DS/2.13/3TB/customer_address/' iam_role default gzip delimiter '|' EMPTYASNULL region 'us-east-1';

copy customer_demographics from 's3://redshift-downloads/TPC-DS/2.13/3TB/customer_demographics/' iam_role default gzip delimiter '|' EMPTYASNULL region 'us-east-1';

copy customer from 's3://redshift-downloads/TPC-DS/2.13/3TB/customer/' iam_role default gzip delimiter '|' EMPTYASNULL region 'us-east-1';

Central Governance Account Login

SayDoShow

Sign in to the Central Governance account.

Set up Lake Formation permissions

  1. Sign in to the Lake Formation console as admin.
  2. In the navigation pane, under Data catalog, choose Settings.
  3. Deselect Use only IAM access control for new databases.
  4. Deselect Use only IAM access control for new tables in new databases.
  5. Choose Version 2 for Cross account version settings.
  6. Choose Save.

Set up an IAM user as a data lake administrator:

If you’re using an existing data lake administrator user or role, add managed policies

Add the following managed policies AWSGlueServiceRole AmazonRedshiftFullAccess

If you’re not using an existing data lake administrator user or role, set up an IAM user as a data lake administrator

Complete the following steps to set up an IAM user:

  1. On the IAM console, choose Users in the navigation pane.
  2. Select the IAM user who you want to designate as the data lake administrator.
  3. Choose Add an inline policy on the Permissions tab.

  1. Replace with your own account ID and add the following policy:
{ "Version": "2012-10-17", "Statement": [ { "Condition": {"StringEquals": { "iam:AWSServiceName":"lakeformation.amazonaws.com"}}, "Action":"iam:CreateServiceLinkedRole", "Resource": "*", "Effect": "Allow"}, {"Action": ["iam:PutRolePolicy"], "Resource": "arn:aws:iam::<AccountID>:role/aws-service role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess", "Effect": "Allow" },{ "Effect": "Allow", "Action": [ "ram:AcceptResourceShareInvitation", "ram:RejectResourceShareInvitation", "ec2:DescribeAvailabilityZones", "ram:EnableSharingWithAwsOrganization" ], "Resource": "*" }] }
  1. Provide a policy name.
  2. Review and save your settings.

  1. Choose Add permissions, and choose Attach existing policies directly.

  1. Add the following policies: AWSLakeFormationCrossAccountManager AWSGlueConsoleFullAccess AWSGlueServiceRole AWSLakeFormationDataAdmin AWSCloudShellFullAccess AmazonRedshiftFullAccess
  1. Choose Next: Review and add permissions.

Data Producer Account Login

SayDoShow

Sign in to the Data Producer account. Create a datashare using the console

  1. On the Amazon Redshift console, choose the cluster to create the datashare.
  2. On the cluster details page, navigate to the Datashares tab.

  1. Under Datashares created in my namespace, choose Connect to database.

  1. Choose Create datashare.
  1. For Datashare type, choose Datashare.
  2. For Datashare name, enter the name (for this post, demotahoeds).
  3. For Database name, choose the database from where to add datashare objects (for this post, dev).
  4. For Publicly accessible, choose Turn off (or choose Turn on to share the datashare with clusters that are publicly accessible).
  1. Under DataShare objects, choose Add to add the schema to the datashare (in this post, the public schema). Under Tables and views, choose Add to add the tables and views to the datashare (for this post, we add the table customer and view customer_view).
  2. Under DataShare objects, choose Add to add the schema to the datashare (in this post, the public schema). Under Tables and views, choose Add to add the tables and views to the datashare (for this post, we add the table customer and view customer_view).
  1. Under Data consumers, choose Publish to AWS Data Catalog.
  2. For Publish to the following accounts, choose Other AWS accounts.
  3. Provide the AWS account ID of the consumer account. For this post, we provide the AWS account ID of the Lake Formation central governance account.
  4. To share within the same account, choose Local account.
  5. Choose Create datashare.
  6. After the datashare is created, you can verify by going back to the Datashares tab and entering the datashare name in the search bar under Datashares created in my namespace.
  7. Choose the datashare name to view its details.

Authorize the datashare access to the consumer account data catalog

  1. Under Data consumers, you will see the consumer status of the consumer data catalog account as Pending Authorization.
  2. Choose the checkbox against the consumer data catalog which will enable the Authorize option.
  3. Click Authorize to authorize the datashare access to the consumer account data catalog, consumer status will change to Authorized.

Central Governance Account : Data Lake admin accepts and registers the data share with Lake Formation

SayDoShow

Sign in to the Central Governance account. The data lake admin accepts and registers the datashare with Lake Formation in the central governance account

  1. Sign in to the console as the data lake administrator IAM user or role.
  2. If this is your first time logging in to the Lake Formation console, select Add myself and choose Get started.
  3. Under Data catalog in the navigation pane, choose Data sharing and view the Amazon Redshift datashare invitations on the Configuration tab.
  1. Select the datashare and choose Review Invitation. A window pops up with the details of the invitation.
  2. Choose Accept to register the Amazon Redshift datashare to the AWS Glue Data Catalog.
  1. Provide a name for the AWS Glue database and choose Skip to Review and create.
  1. Review the content and choose Create database.
  1. After the AWS Glue database is created on the Amazon Redshift datashare, you can view them under Shared Databases.

You can also use the AWS CLI to register the datashare and create the database

  1. Describe the Amazon Redshift datashare that is shared with the central account: aws redshift describe-data-shares

  2. Accept and associate the Amazon Redshift datashare to Data Catalog: aws redshift associate-data-share-consumer \ --data-share-arn 'arn:aws:redshift:<producer-region>:<producer-aws-account-id>:datashare:<producer-cluster-namespace>/demotahoeds' \ --consumer-arn arn:aws:glue:us-east-1:<central-aws-account-id>:catalog

  3. Register the Amazon Redshift datashare in Lake Formation: aws lakeformation register-resource \ --resource-arn arn:aws:redshift:<producer-region>:<producer-aws-account-id>:datashare:<producer-cluster-namespace>/demotahoeds

  4. Create the AWS Glue database that points to the accepted Amazon Redshift datashare: aws glue create-database --region <central-catalog-region> --cli-input-json '{ "CatalogId": "<central-aws-account-id>", "DatabaseInput": { "Name": "demotahoedb", "FederatedDatabase": { "Identifier": "arn:aws:redshift:<producer-region>:<producer-aws-account-id>:datashare:<producer-cluster-namespace>/demotahoeds", "ConnectionName": "aws:redshift" } } }'

Grant datashare access to the data consumer

Now the data lake administrator of the central governance account can view and share access on both the database and tables to the data consumer account using the Lake Formation cross-account sharing feature.

  1. On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
  2. Choose Grant.
  3. Under Principals, select External accounts.
  4. Provide the data consumer account ID (for this post, 665544332211).
  5. Under LF_Tags or catalog resources, select Named data catalog resources.
  6. For Databases, choose the database demotahoedb.
  7. Select Describe for both Database permissions and Grantable permissions.
  8. Choose Grant to apply the permissions.

Grant the data consumer account permissions on tables

  1. On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
  2. Choose Grant.
  3. Under Principals, select External accounts.
  4. Provide the consumer account (for this post, we use 665544332211).
  5. Under LF-Tags or catalog resources, select Named data catalog resources.
  6. For Databases, choose the database demotahoedb.
  7. For Tables, choose All tables.
  8. Select Describe and Select for both Table permissions and Grantable permissions.
  9. Choose Grant to apply the changes.

Data Consumer Account: Consumer admin will receive the shared resources and delegate access to other users.

SayDoShow

Sign in to the Data Consumer account and accept the resources shared with the consumer account from the central governance account.

  1. Sign in to the console as the data lake administrator IAM user or role.
  2. If this is your first time logging in to the Lake Formation console, select Add myself and choose Get started.
  3. Sign in to the AWS RAM console.
  1. In the navigation pane, under Shared with me, choose Resource shares to view the pending invitations. You will receive 2 invitations.
  2. Choose the pending invitations and accept the resource share.
  3. On the Lake formation console, under Data catalog in the navigation pane, choose Databases to view the cross-account shared database.

Grant access to the data analyst and IAM users using Lake Formation

Now the data lake admin in the data consumer account can delegate permissions on the shared database and tables to consumer1 and consumer2 users in the consumer account.

Grant database permissions to consumer1 and consumer2:

  1. On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
  2. Select the database demotahoedb and on the Actions menu, choose Grant.
  1. Under Principals, select IAM users and roles.
  2. Choose the IAM users consumer1 and consumer2.
  3. Under LF-Tags or catalog resources, demotahoedb is already selected for Databases.
  4. Select Describe for Database permissions.
  5. Choose Grant to apply the permissions.

Grant table permissions to consumer1

To grant the IAM user consumer1 permissions on table public.customer, follow these steps:

  1. Under Data catalog in the navigation pane, choose Databases.
  2. Select the database demotahoedb and on the Actions menu, choose Grant.
  3. Under Principals, select IAM users and roles.
  4. Choose IAM user consumer1.
  5. Under LF-Tags or catalog resources, demotahoedb is already selected for Databases.
  6. For Tables, choose public.customer.
  7. Select Describe and Select for Table permissions.
  8. Choose Grant to apply the permissions.

Grant column permissions to consumer2

To grant the IAM user consumer2 permissions on non-sensitive columns inpublic.customer_view, follow these steps:

  1. Under Data catalog in the navigation pane, choose Databases.
  2. Select the database demotahoedb and on the Actions menu, choose Grant.
  3. Under Principals, select IAM users and roles.
  4. Choose the IAM user consumer2.
  5. Under LF-Tags or catalog resources, demotahoedb is already selected for Databases.
  6. For Tables, choose public.customer_view.
  7. Select Select for Table permissions.
  8. Under Data Permissions, select Column-based access.
  9. Select Include columns and choose the non-sensitive columns (c_customer_id, c_birth_country, cd_gender, cd_marital_status, and cd_education_status).
  10. Choose Grant to apply the permissions.

Consume the datashare from the data consumer account in the Amazon Redshift cluster

In the Amazon Redshift consumer data warehouse, log in as the admin user using Query Editor V2 and complete the following steps:

  1. Create the Amazon Redshift database from the shared catalog database using the following SQL command:
CREATE DATABASE demotahoedb FROM ARN 'arn:aws:glue:<central-region>:<central-aws-account-id>:database/demotahoedb' WITH DATA CATALOG SCHEMA demotahoedb ; 
  1. Run the following SQL commands to create and grant usage on the Amazon Redshift database to the IAM users consumer1 and consumer2:
CREATE USER IAM:consumer1 password disable; 
CREATE USER IAM:consumer2 password disable; 
GRANT USAGE ON DATABASE demotahoedb TO IAM:consumer1; 
GRANT USAGE ON DATABASE demotahoedb TO IAM:consumer2;

Use a federated identity to enforce Lake Formation permissions

Follow below steps to configure Query Editor v2.

  1. Choose the settings icon in the bottom left corner of the Query Editor v2, then choose Account settings.
  2. Under Connection settings, select Authenticate with IAM credentials.
  3. Choose Save.

Query the shared datasets as a consumer user- validate that the IAM user consumer1 has datashare access from Amazon Redshift

  1. Sign in to the console as IAM user consumer1 using the ConsoleIAMLoginURL and LFUsersCredentials values saved when consumer1 user was created using cloudformation.
  1. On the Amazon Redshift console, choose Query Editor V2 in the navigation pane.
  2. To connect to the consumer cluster, choose the consumer cluster in the tree-view pane.
  3. When prompted, for Authentication, select Temporary credentials using your IAM identity.
  4. For Database, enter the database name (for this post, dev).
  5. The user name will be mapped to your current IAM identity (for this post, consumer1).
  6. Choose Save.
  1. Once you’re connected to the database, you can validate the current logged-in user with the following SQL command:
select current_user;

  1. To find the federated databases created on the consumer account, run the following SQL command:
SHOW DATABASES FROM DATA CATALOG ACCOUNT 'Central Governance account'

  1. To validate permissions for consumer1, run the following SQL command:
select * from demotahoedb.public.customer limit 10;

Validate that the IAM user consumer2 doesn’t have access to the datashare tables public.customer on the same consumer cluster.

  1. Log out of the console and sign in as IAM user consumer2.
  2. Follow the same steps to connect to the database using the query editor.
  3. Once connected, run the same query:
select * from demotahoedb.public.customer limit 10;

The user consumer2 should get a permission denied error

Validate the column-level access permissions of consumer2 on public.customer_view view

Connect to Query Editor v2 as consumer2 and run the following SQL command:

select c_customer_id,c_birth_country,cd_gender,cd_marital_status from demotahoedb.public.customer_view limit 10;

You can see that consumer2 is only able to access columns that were granted access by Lake Formation.