Machine Learning in Redshift SQL using XGBoost with SageMaker Notebook

Demo Video

Coming Soon

Client Tool

This demo will use Amazon Sagemaker notebook.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

Please go through Steps 1 and 2 from Demo preparation first. Then open notebook xgboost-model.ipynb

Challenge

Kim is a Data Scientist at High Street Bank and she is tasked to employ ML methodologies for predicting an original note from a forged one based on the spectral features like variance, skewness, kurtosis, and entropy (of an image) from banknote specimens obtained using Image recognition and the Wavelet Transform tool. Kim needs to use these spectral features and measurements as the input parameters to train her model and predict an original note from a forged one for fighting against the counterfeiting of banknotes.

To accomplish this Kim will have to do the following steps.

Data Preparation

SayDo Show
We have a Redshift Cluster already created. First Kim needs to connect to the Redshift cluster from the Jupyter notebook. Kim also needs to perform some preparatory work on setup to be able to run SQL function using Redshift Data API to get SQL query output directly into pandas dataframe.
Go to xgboost-model notebook, put the right Redshift Endpoint and execute step 1 and 2.
Kim has data from banknote specimens obtained using Image recognition and the Wavelet Transform tool and stored on s3. She creates a data preparation script and then run the same to load the data into Redshift tables - banknoteauthentication_train and banknoteauthentication_test from s3. Go to xgboost-model notebook, update the IAM roles attached to the Redshift cluster and execute step 3 and 4.
Kim selects the data from banknoteauthentication_train and creates a new Machine Learning model and monitors the progress of model creation by running step 6 intermittently and moves to the next step when the Model State is ‘Ready’. Go to xgboost-model notebook and execute step 5 and 6. And step 6 intermittently until Model State is ‘Ready’.

Model Evaluation

SayDo Show
Next, Kim check the accuracy of the model she just created.

Kim compares the accuracy of the model she created by comparing the inferences of the class (Original Vs. Counterfeit) against the test data (data in banknoteauthentication_test table)

Go to xgboost-model notebook and execute step 7.

Model Inference

SayDo Show
Next, Kim run the model for prediction.

Kim now wants to use the model to predict the count of original vs. counterfeit banknotes

Go to xgboost-model notebook and execute step 8.

Before you Leave

Please execute the Cleanup step to clear out any changes you made to Redshift database during the demo.

If you are done using your cluster, please think about deleting the CFN stack or to avoid having to pay for unused resources do these tasks:

  • pause your Redshift Cluster