Masking Sensitive Data with UDF

Demo Video

Before you Begin

Capture the following parameters from the launched CloudFormation template as you will use them in the demo.

https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false
  • LambdaUDF

Client Tool

For this demo we will use the Query Editor v2 to connect to the Redshift cluster. Be sure to complete the SQL Client Tool setup before proceeding.

Challenge

Marie, the data engineer, has been tasked to ensure that PII data in the DW is masked and that users who should not have access to that information should see an encrypted value and users who should have access to that data can execute a function to securely decrypt that data. Marie determines the specification for her use-case. She will create two functions kms_encrypt and kms_decrypt. These functions will allow users who have access to this function to encrypt and decrypt any data. To solve this challenge Marie will do the following:

  1. Create a Lambda Function
  2. Create Redshift UDFs

Create a lambda function

SayDoShow
First, Marie will determine a KMS key to use for the encryption process. Let’s identify a key using the KMS service. Navigate to: https://console.aws.amazon.com/kms/home?region=us-east-1#/kms/keys note the Key ID
Next, Marie will need to create lambda function which references the key and can encrypt and decrypt data that is passed to it by Redshift. Navigate to: https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions and click on the <LambdaUDF> function name identified earlier
Before we look at the code, notice the environment variable kmskeyid is set to the KMS key identified earlier. Scroll to the Environment Variables
Let’s inspect the code. Notice the reference to event['external_function'] this references the name of the Redshift UDF; either kms_encrypt or kms_decrypt. The code will parse this to determine if the action should be to ‘encrypt’ or ‘decrypt’ the data. Highlight lines 7-9 of the code.
Next the code will loop through the event['arguments'] and for each input encrypt or decrypt the data. The response will be appended to the res array. It is important that the res array has the same number of records as the event['arguments'] and is returned in the same order. Highlight lines 10-12 of the code. Then highlight lines 16-22 of the code.
Note, that if the input is a None that implies a DB NULL and a None is appended. Also, if any individual row fails with an exception a None is also appended. Highlight lines 10-12 of the code. Then highlight lines 16-22 fo the code.
When the loop is complete, prepare the ret object with the success attribute set to True and the results attribute set to the response array. If the function doesn’t contain the word encrypt or decrypt, then set the success to False and populate an error_msg. Highlight lines 26-31 of the code.

Create Redshift UDFs

SayDoShow
Now that the Lambda function is in place, Marie can create the UDF mapped to the Lambda function. She will use the default IAM role to execute the lambda function. This role can execute the lambda function because previously it had been ammended to include the lambda:InvokeFunction permission.
Execute the following code replacing the values for <LambdaUDF>.

Amazon Redshift now supports attaching the default IAM role. If you have enabled the default IAM role in your cluster, you can use the default IAM role as follows.

CREATE OR REPLACE EXTERNAL FUNCTION kms_encrypt (value varchar)
RETURNS varchar(max) STABLE
LAMBDA '<LambdaUDF>'
IAM_ROLE default;

CREATE OR REPLACE EXTERNAL FUNCTION kms_decrypt (value varchar(max))
RETURNS varchar STABLE
LAMBDA '<LambdaUDF>'
IAM_ROLE default;

Let’s prepare a new table which will hold the encrypted email address field and update that field using the newly created kms_encrypt function. Execute the following code.

drop table if exists public.customer_pii;
create table public.customer_pii as
  select *, NULL::varchar(2000) as c_email_address_enc
  from public.customer limit 1000;
update public.customer_pii set c_email_address_enc = kms_encrypt(c_email_address);

Finally, Marie can query and see the at-rest encrypted data and leverage the new kms_decrypt function to retrieve the un-encrypted email address data. Execute the following code.

select c_email_address, kms_decrypt(c_email_address_enc), c_email_address_enc
from public.customer_pii;

Before you Leave

Execute the following to clear out any changes you made during the demo.

DROP TABLE public.customer_pii;

If you are done using your cluster, please think about deleting the CFN stack or to avoid having to pay for unused resources do these tasks:

  • pause your Redshift Cluster
  • stop the Oracle database
  • stop the DMS task