Redshift: How to build a successful demo for your customers > Data Engineer Persona Demo > SQL Server Migration

SQL Server Migration

Demo Video

Before you Begin

Capture the following outputs from the launched CloudFormation template as you will use them in the demo.

https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false

EC2DeveloperPassword
EC2HostName
RedshiftClusterDatabase
RedshiftClusterEndpoint
RedshiftClusterPassword
RedshiftClusterPort
RedshiftClusterUser
SourceSQLServerUser
SourceSQLServerPassword
SourceSqlServerPort

Client Tool

If you do not have a Remote Desktop client installed, you can use the instructions at this link to do so.

Challenge

Marie has been asked to migrate Red Imports’ on-premises SQL Server data warehouse to Amazon Redshift in their AWS Account. The SQL Server data warehouse hosts business critical applications in T-SQL. She also needs to make sure that both SQL Server and Redshift data warehouses are in sync until all legacy applications are migrated to Redshift. Marie will need to convert DDL structures and the T-SQL code into Redshift and perform a full data load from SQL Server to Redshift. Marie will also need to setup ongoing updates from SQL Server to Redshift to keep the data in synch. Marie will accomplish these tasks by doing the following:

Convert the Schema
Migrate the data
Keep the data in sync

Architecture

Convert the Schema

Say	Do	Show
Marie logs into her workstation, in which she has already installed the required tools like the AWS Schema Conversion Tool for the migration, SQL Workbench/J to connect to Redshift, SQL Server Management Studio to connect to SQL Server. Replace the value for <EC2HostName> and <EC2DeveloperPassword> with the value previously determined.	The cloud formation template deploys a Windows EC2 instance, with the required drivers and software for AWS SCT, SQL Workbench/J, and SQL Server Management Studio already installed. Before the demo, please note down all relevant parameters from the CloudFormation output that you’ll need to use in the subsequent steps. `Remote Desktop Host: <EC2HostName> Username: developer Passowrd: <EC2DeveloperPassword>`
Once logged in, she opens AWS Schema Conversion Tool and creates a new project. She will select `OLAP` followed by `Microsoft SQL Server DW` for this migration.	Open AWS SCT using the taskbar shortcut and click `File> New project> Data Warehouse (OLAP) > OK`
Now, she needs to configure the connectivity for the SQL Server source system. Replace the value for <SourceSQLServerPort> ,<SourceSQLServerUser> and <SourceSQLServerPassword> with the values previously captured.	Click `Connect to Microsoft SQL Server DW` and Input parameters below: `Server Name: localhost Server port: <SourceSQLServerPort> Instance name: <leave this blank> Authentication: SQL Server Authentication Username: <SourceSQLServerUser> Password: <SourceSQLServerPassword> DriverPath: C:\DMS Workshop\JDBC\sqljdbc_7.4\enu\mssql-jdbc-7.4.1.jre8.jar` Click `Test`. It should show connection successful. Then click `OK`
Now she needs to configure the connectivity for the target system, Redshift. Replace the value for <RedshiftClusterEndpoint> ,<RedshiftClusterPort> ,<RedshiftClusterDatabase> , <RedshiftClusterUser> and <RedshiftClusterPassword> with the value previously determined.	Input the parameters below in the connection Tab: `Servername: <RedshiftClusterEndpoint> Port: <RedshiftClusterPort> Database: <RedshiftClusterDatabase> Username: <RedshiftClusterUser> Password: <RedshiftClusterPassword>` Click `Test`. It should show connection successful. Then click `OK`.
On the left hand side, she can see all objects in SQL Server and on the right hand side, all Redshift objects. She may now convert the SQL Server schema `dms_sample` from SQL Server to Redshift.	Right click `dms_sample` schema from the list of SQL Server database objects (on the left side of your screen) and click `Convert schema`. Click the default options `Yes` and `Continue` when prompted on the next screens about already existing objects and source database statistics.
AWS SCT has now converted all SQL Server objects that it could, to Redshift syntax. There are some objects marked with a red exclamation point. This means that SCT couldn’t fully migrate these objects. Marie can view an assessment summary for the migration to get more information on the SQL Server to Redshift conversion.	Click `View > Assessment Report view`
AWS SCT produces an executive summary describing the overall migration effort. This summary describes which objects could and couldn’t be migrated successfully using SCT. If she clicks on the `Action Items` page, it also shows what action Marie needs to take to convert the objects SCT could not do.	Click `Action Items` tab to show the action items.
Marie would like to see how AWS SCT handles converting a stored procedure. She clicks on the ‘generate_tickets’ procedure. AWS SCT shows SQL Server/Redshift code side by side and highlights all code that could not be migrated. On the top action items screen, she can also see the automated recommendations on how to resolve these issues. For example, she can see that the SQL Server CHECKSUM function is used in this stored procedure. Redshift does not support the CHECKSUM function so AWS SCT could not convert that procedure. Marie would need to manually fix this code to convert this SQL Server specific function to Redshift.	Select the first stored procedure `generate_tickets` and explain various tabs shown on the screen.
Now, to actually create the successfully converted objects in Redshift, she’ll navigate to the Redshift object list (right side of her screen). She notices that SCT automatically adds the database name to the target schema. The Redshift generated name combines the database name with the schema name from the source. Thus, she clicks on `dms_sample_dbo` on the Redshift object list (right side of her screen) and clicks `Apply to database`. This will create these objects in Redshift.	Right click `dms_sample_dbo` on the Redshift objects list on the right side tab and click `Apply to database`. When prompted with “Are you sure?", on the following screen, click the default options `Yes`.
Now, Marie would like to verify that the objects were created in Redshift using a SQL client such as SQL Workbench/J. She opens SQL Workbench/J and creates a new profile. If needed, she can modify the driver path settings for Redshift prior to connection. She then enters in the connectivity settings. Replace the value for <RedshiftClusterEndpoint> ,<RedshiftClusterPort> ,<RedshiftClusterDatabase> , <RedshiftClusterUser> and <RedshiftClusterPassword> with the value previously determined.	Make sure you’re in the ‘Redshift Database’ default group. On the connectivity screen, select inputs: `Driver: Amazon Redshift URL: jdbc:redshift://<RedshiftClusterEndpoint>:<RedshiftClusterPort>/<RedshiftClusterDatabase> Username: <RedshiftClusterUser> Password: <RedshiftClusterPassword> Autocommit: tick mark` Click `Test`, which should be successful; then Click `OK`
Marie would not like to see what are the tables in the `dms_sample_dbo` schema. All these tables got successfully created in this Redshift schema, but they don’t have any data in them. AWS SCT took care of code conversion from SQL Server syntax to Redshift syntax but did not migrate the data. Next, she’ll migrate data from SQL Server to Redshift using a service called AWS Database Migration Service (DMS).	Open SQL Workbench/J and execute the query below to get a list of all tables in `dms_sample_dbo` schema along with the row count for each table: `select t.table_schema, t.table_name,i.tbl_rows from svv_tables t left join svv_table_info i on t.table_schema = i.schema and t.table_name = i.table where t.table_type = 'BASE TABLE' and t.table_schema='dms_sample_dbo' order by 1,2`
Before she migrates the data, she connects to SQL Server and verifies the dataset.	Open SQL Server Management Studio from the taskbar shortcut and click new connection button on left hand top side. Input below parameters: `Server type: Database Engine Server name: localhost Authentication: Windows Authentication` Click `Connect` and then click `New Query` option
She can see a decent amount of records for the ticketing dataset. To migrate these records, she’ll be using AWS Database Migration Service.	In SQL Server Management Studio, click `New Query` and execute this query: `use dms_sample; SELECT SCHEMA_NAME(t.[schema_id]) AS [table_schema], OBJECT_NAME(p.[object_id]) AS [table_name],SUM(p.[rows]) AS [row_count] FROM [sys].[partitions] p INNER JOIN [sys].[tables] t ON p.[object_id] = t.[object_id] WHERE p.[index_id] < 2 GROUP BY p.[object_id],t.[schema_id] ORDER BY 3`
Before going to AWS DMS, she needs to provide access for Amazon Data Migration Service (DMS) to use MS Replication, so that it can perform ongoing synch using Change Data Capture. She does not need to do this extra step if she only needs a one time full load of the dataset.	In SQL Server Management Studio, open the script from location `C:\DMS Workshop\Scripts\execute_sqlserver.sql` and click `execute` to enable MS Replication If you execute the script more than once, you may see an error, which you may safely ignore: `The server is already defined as Distributor/Publisher`

Migrate the data

Say	Do	Show
To prepare for the following steps with AWS DMS, a transformation rule was applied to add the prefix `dms_sample` on the schema name. Now, Marie opens her browser and logs into the AWS console and navigates to the DMS page. Here she has already created a task to replicate all existing data and as well as ongoing changes from SQL Server to Redshift.	Open your browser and login to the AWS console. Navigate to AWS DMS page and click `Database migration tasks`. You should be able to see a sqlserver-replication task in Ready state.
Now, she will run the migration, which will migrate all existing data from SQL Server to Redshift. It will also use Change data capture (CDC) to continuously monitor SQl Server for ongoing changes and apply those changes to Redshift. This should take about 10 minutes to complete.	Select the replication task > click `Action > restart/resume`
If Marie wants to view the statistics of what data is getting transferred, she can go to the summary page to view some details. The `Table statistics` tab allows her to view the statistics of how many records are getting transferred via DMS.	Click full load task and click `Table statistics`.	You may see some errors for tables having identity column, which is a known DMS issue and being enhanced in Release 3.4.1. Please ignore that errors for now. Alternatively, you may change all `IDENTITY` columns as `GENERATED BY DEFAULT AS IDENTITY` in Redshift to avoid this error
She can also verify that the data is getting migrated to Redshift using SQL Workbench/J. Using this query, she can see all these tables are now loaded with data in Redshift.	Execute the same command executed earlier in SQL Workbench/J to view the record count. `select t.table_schema, t.table_name,i.tbl_rows from svv_tables t left join svv_table_info i on t.table_schema = i.schema and t.table_name = i.table where t.table_type = 'BASE TABLE' and t.table_schema='dms_sample_dbo' order by 1,2`

Keep the data in sync

Say	Do	Show
Now, Marie wants to verify the change data capture functionality. She has a table called `sport_type` that has 2 rows in it. She will insert a few rows in this table in the source data warehouse (SQL Server) and see if it gets migrated to Redshift.	Execute the command below in SQL Workbench/J to view the data in the `sport_type` table. `select * from dms_sample_dbo.sport_type order by 1;`
She can view this table in SQL Server as well. This also shows 2 records only.	Execute the command below in SQL Server Management Studio to view the data in the `sport_type` table. `use dms_sample; select * from dms_sample.dbo.sport_type order by 1;`
Marie wants to confirm that the CDC process is operational. For that she will insert some records in the `sport_type` table in SQL Server.	Execute below command in SQL Server Management Studio to insert five additional records in `sport_type` table. use dms_sample; insert into dbo.sport_type(name,description) values ('hockey','A sport in which two teams play against each other by trying to more a puck into the opponents goal using a hockey stick'), ('basketball','A sport in which two teams of five players each that oppose one another shoot a basketball through the defenders hoop'), ('soccer','A sport played with a spherical ball between two teams of eleven players'), ('volleyball','two teams of six players are separated by a net and each team tries to score by grounding a ball on the others court'), ('cricket','A bat-and-ball game between two teams of eleven players on a field with a wicket at each end');
Whe she views the table again, she should see seven records in total now.	Execute below command in SQL Server Management Studio to view data in the `sport_type` table. `use dms_sample; select * from dms_sample.dbo.sport_type order by 1;`
When she goes back to Redshift, she should see the seven newly inserted records there as well.	Execute below command in SQL Workbench/J to view the data in the `sport_type` table. `select * from dms_sample_dbo.sport_type order by 1;`
If she wants to view the logs of the CDC process, she can used the DMS console to see a table of metrics that shows a number of key statistics for each table.	Navigate to DMS page in the AWS console and select the `Ongoing Replication` task and click `table statistics`.

Before you Leave

To reset your demo, drop the tables created by executing the following SQL statement in Redshift.

drop table if exists dms_sample_dbo.person cascade;
drop table if exists dms_sample_dbo.ticket_purchase_hist cascade;
drop table if exists dms_sample_dbo.sporting_event_ticket cascade;
drop table if exists dms_sample_dbo.sporting_event cascade;
drop table if exists dms_sample_dbo.seat cascade;
drop table if exists dms_sample_dbo.sport_type cascade;
drop table if exists dms_sample_dbo.sport_location cascade;
drop table if exists dms_sample_dbo.sport_team cascade;
drop table if exists dms_sample_dbo.sport_league cascade;
drop table if exists dms_sample_dbo.sport_division cascade;
drop table if exists dms_sample_dbo.seat_type cascade;
drop table if exists dms_sample_dbo.player cascade;
drop table if exists dms_sample_dbo.name_data cascade;
drop table if exists dms_sample_dbo.nfl_stadium_data cascade;
drop table if exists dms_sample_dbo.nfl_data cascade;
drop table if exists dms_sample_dbo.mlb_data cascade;

Execute the following SQL on SQL Server to delete the inserted records:

delete from dms_sample_dbo.sport_type where name in ('hockey', 'basketball', 'soccer', 'volleyball', 'cricket');

If you are done with your demo, please consider deleting the CFN stack. If you would like to keep it for use in the future, you may do the following to avoid having to pay for unused resources:

Pause your Redshift Cluster
Stop the DMS task
Stop the EC2 instance