Salesforce Data Ingestion using Appflow

Demo Video

Before you Begin

Capture the following parameters from the launched CloudFormation template as you will use them in the demo.

https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false
  • RedshiftClusterEndpoint
  • RedshiftClusterPort
  • RedshiftClusterDatabase
  • RedshiftClusterUser
  • RedshiftClusterPassword
  • RedshiftClusterRoleName
  • TaskDataBucketName
  • EC2HostName
  • EC2DeveloperPassword

If you don’t have a free Salesforce developer account, please create one by following the below steps. Please note that you will need a “Developer” Account for this demo. Developer account supports API calls, whereas a basic salesforce account doesn’t.

Create a free Salesforce developer account by following the below steps

  1. Go to the link: https://developer.salesforce.com/form/signup/freetrial.jsp
  2. Enter the details as shown in the screenshot and click on Sign me Up
You will receive an email to verify your account.
  • Please click on Verify account
  • Change your password in the next screen
  • Enter your mobile phone number when prompted
  • You will receive a verification code on your mobile phone
  • Enter the verification code and complete verification
  • Capture the below details related to your Salesforce developer account:

    • Salesforce Developer Account Username
    • Salesforce Developer Account Password

    Client Tool

    The screenshots for this demo will leverage the SQL Client Tool of SQLWorkbench/J. Be sure to complete the SQL Client Tool setup before proceeding.

    Challenge

    Marie is Red Import’s data engineer. She has been assigned a task to ingest Salesforce data into Amazon Redshift Data Warehouse inorder to run analytics on top of it. While ingesting data into Amazon Redshift, Marie has to perform basic validations to ensure data quality. In addition to performing validations, Marie needs to mask PII data. Also, Marie needs to apply filters on salesforce data inorder to ingest current year data only.

    The architecture that Marie would follow is given below:

    To solve this challenge, Marie will primary use Amazon AppFlow service to ingest data into Redshift. She follow the below steps:

    1. Create AppFlow connectors to Salesforce and Redshift
    2. Create tables in Redshift
    3. Create Appflow DataFlows from Salesforce to Redshift
    4. Execute AppFlow DataFlows
    5. Validate data in Redshift

    Create Connectors

    SayDoShow
    Marie will go to AppFlow to create a connector to Salesforce Select Appflow from the All Services menu

    Click on the expand menu icon on the left

    From the Menu, click on Connections

    From the Connectors drop down, choose Salesforce

    Click on Create Connection

    While creating Salesforce connector, Marie will be redirected to Salesforce’s login page, where she will enter her Salesforce developer account credentials. She then allows salesforce to access AppFlow and creates the connection. Make the below selections/entries and click Continue

    SalesforceEnvironment: Production
    PrivateLink: Disabled
    ConnectionName: salesforce-free-trial-connection
    

    You will be re-directed to salesforce login URL. Enter the below details when prompted

    <<Salesforce Developer Account Username>> 
    and <<Salesforce Developer Account Password>>
    

    In the next page, choose Allow Access

    You will then see a message “The connection salesforce-free-trial-connection was successfully created”

    Next Marie will proceed to create a Connector to Redshift From the Connectors drop down, choose Redshift and

    Click on Create Connection

    Marie enters her Redshift cluster details when prompted and creates a connection Enter the below details and click Connect

    JDBCURL: jdbc:redshift://<<RedshiftClusterEndpoint>>:<<RedshiftClusterPort>>/<<RedshiftClusterDatabase>> 
    BucketDetails: <<TaskDataBucketName>>
    Role: RedshiftClusterRoleName
    Username: <<RedshiftClusterUser>>
    Password: <<RedshiftClusterPassword>> 
    

    Create Tables

    SayDoShow
    Marie now will create target tables in Redshift. In order to create tables, she needs SQLWorkbenchJ. Marie logs into her workstation, in which she has already installed SQLWorkbenchJ to connect to Redshift The main cloud formation template deploys a Windows EC2 instance, with required drivers and softwares for AWS SCT, SQL Workbench. Before the demo, please capture all relevant parameters from the CloudFormation output that you’ll need to use in the subsequent steps. Connect to that EC2 instance using Remote Desktop client.

    RemoteDesktopHost: <EC2HostName>  
    Username: developer
    Password: <EC2DeveloperPassword>
    

    Marie opens SQL Workbench IDE and connects to the database Open SQL workbench from the taskbar shortcut, which opens the new connection window. In that, click Manage Drivers in bottom left hand corner and select Redshift. Correct the driver path as

    C:\DMS Workshop\JDBC\RedshiftJDBC42-1.2.43.1067.jar
    

    Marie then provides the connectivity settings and connects to the database On the connectivity screen, select inputs:

    Driver: Amazon Redshift 
    URL: jdbc:redshift://<<RedshiftClusterEndpoint>>:<<RedshiftClusterPort>>/<<RedshiftClusterDatabase>> 
    Username: <<RedshiftClusterUser>>
    Password: <<RedshiftClusterPassword>> 
    Autocommit: tick mark
    

    Click Test, which should be successful then Click OK

    Marie creates the table by executing the table DDL Copy and paste the below DDL in SQL Workbench and execute it

    Create Dataflow

    SayDoShow
    Next, Marie will create AppFlow Data Flows to move data from Salesforce to Redshift From the menu on the left click on Flows and click Create Flow

    Enter Flow Name and Flow Description as given below

    FlowName: salesforce_to_redshift_account_flow
    FlowDescription: Dataflow to move data from Salesforce to Redshift
    

    Click Next

    Now, Appflow displays the Salesforce objects and Redshift tables to choose from. Marie makes necessary selections for the Account Object in Salesforce Enter source details as shown below:

    SourceName: Salesforce
    ChooseSalesforceConnection: salesforce-free-trial-connection
    Select SalesforceObjects radio button
    ChooseSalesforceObject: Account
    

    Enter destination details as shown below:

    DestinationName: Redshift
    ChooseAmazonRedshiftConnection: redshift-connection
    ChooseAmazonRedshiftObject: salesforce
    ChooseAmazonRedshiftTable: account
    

    Leave all other default options selected and click Next

    In the next screen, Marie can see all the available fields from Account Object in Salesforce and salesforce.account table in Redshift. Marie can either choose to do the mapping using the UI or upload a mapping file. In this demo, we are going to upload a mapping file Choose Upload a .csv file with mapped fields option for Mapping method

    Click on Choose file button and select the mapping file below:

    Next, Marie wants to add some validations to ensure that bad data doesn’t flow into Redshift. Specifically, she wants to ignore records with NULL Account ID. Alternatively, she can also choose to Terminate Flow if a validation fails Expand Validations section at the bottom of the page and Click on Add Validation

    Enter the below details:

    FieldName: Account ID
    Condition: Values are missing or null
    Action: Ignore record
    

    Now, Marie wants to mask PII data. Red Import has defined Street Address and PhoneNumber as PII. Marie will find those fields and masks them. She can choose to mask either all values or first # characters or last # characters In the Mapped Fields search box, search for Street

    Select Billing Street and Shipping Street fields

    Similarly, search for Phone

    Select Account Phone field

    Click on Modify Values

    In the window that pops up, choode Mask for Modification

    Select Mask all values radio button

    Click Save

    Click Next

    As a next Step Marie wants to apply filters to source data in order to fetch current year data only Click on Add Filter button

    Choose Create Date from the Field Name drop down

    Choose is on or after as condition

    Choose Current year’s Jan 1 date in Criteria 1

    Click Next

    Marie does a final review and creates the dataflow Review and click on Create Flow

    Execute DataFlow

    SayDoShow
    Marie now executes the Data flow she created to load data from Salesforce to Redshift. Once the dataflow is successfully executed, marie can see the number of records processed and size of data transferred Go to Flows > salesforce_to_redshift_account_flow

    Click Run Flow

    Navigate to Run History tab

    Validate Data

    SayDoShow
    Finally, Marie will connect to Redshift and runs queries to validate the data loaded into Redshift. She validates that the fields billingstreet, shippingstreet, phone are masked. She validates that the id field doesn’t have any null values and createddate is current year date Using SQL Workbench, run the below query

    select id
    ,type
    ,industry
    ,annualrevenue
    ,createddate
    ,phone
    ,billingstreet
    ,shippingstreet
     from salesforce.account;
    

    Before you Leave

    If you want to re-run the demo, make sure you run the following cleanup steps:

    1. Delete Redshift tables/views
    drop schema salesforce cascade;
    
    1. Navigate to S3 and empty your <TaskDataBucketName> bucket.

    If you are done using your cluster, please think about deleting the CFN stack or Pausing your Redshift Cluster to avoid having to pay for unused resources.