Redshift: How to build a successful demo for your customers > Data Scientist Persona Demo > Energy forecasting using SageMaker

Energy forecasting using SageMaker

Demo Video

Client Tool

This demo will use Amazon Sagemaker.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

The screenshots for this demo will leverage the BI Tool of QuickSight. Be sure to complete the BI Tool setup before proceeding.

Please go through Demo preparation first. Then open notebook smartmeter.ipynb

Challenge

Kim is a Data Scientist at Dextera Energy and she is tasked with forecasting energy consumption based on the smartmeter data collected in the past few years. Kim needs to use the historical meter data to train her model and forecast energy consumption so her company can achieve intelligent scheduling and planning of electricity delivery, and improve customer satisfaction.

To accomplish this Kim will have to do the following steps.

Data Preparation
Model Training
Model Evaluation
Build ML pipeline
Transform and share results in Dashboards

SageMaker - ML WorkFlow

Data Preparation

Say Do Show

We have years of history meter data stored on s3. First Kim loaded the data into Data warehouse stored in table meter_data. then extract categorical data from it and stored in acorn_data table Go to smartmeter notebook and execute step 2, 3, 4.

Kim randomly select meter data from 50 households to build a ML model. The data is originally recorded in 30min interval, which we could use directly. Here we want to forecast longer periods (one week) and need to reduce the data granularity to hourly. So Kim uses Redshift to aggregate the meter data to hourly time series data and load them into Pandas DataFrame. Go to smartmeter notebook and execute step 5.

Say	Do	Show
We have years of history meter data stored on s3. First Kim loaded the data into Data warehouse stored in table `meter_data`. then extract categorical data from it and stored in `acorn_data` table	Go to smartmeter notebook and execute step 2, 3, 4.
Kim randomly select meter data from 50 households to build a ML model. The data is originally recorded in 30min interval, which we could use directly. Here we want to forecast longer periods (one week) and need to reduce the data granularity to hourly. So Kim uses Redshift to aggregate the meter data to hourly time series data and load them into Pandas DataFrame.	Go to smartmeter notebook and execute step 5.
Next, Kim will split this data into train and test sets for training and evaluating the ML model. In forecasting it is important to do this train/test split based on time rather than by time series. In this demo, we will use data from Jan 2013 to Nov 2013 for training, and data from Dec 2013 for testing and module evaluation. The data is stored in JSON format and uploaded to S3.	Go to smartmeter notebook and execute step 6 and 7.

Next, Kim will split this data into train and test sets for training and evaluating the ML model.

In forecasting it is important to do this train/test split based on time rather than by time series. In this demo, we will use data from Jan 2013 to Nov 2013 for training, and data from Dec 2013 for testing and module evaluation. The data is stored in JSON format and uploaded to S3.

Go to smartmeter notebook and execute step 6 and 7.

Model Training

Say	Do	Show
With 5000+ household meter data, each is a time series, Kim doesn’t want to build one ML model per individual time series which will make the machine learning pipeline complicated. So she chose the DeepAR model which can train a single model jointly over all of the time series. With hundreds of related time series, DeepAR outperforms the standard ARIMA and ETS methods. You can find more details about DeepAR algorithm and its hyperparameters	Go to smartmeter notebook and execute step 8, explain the marked code, those are ML algorithm used “forecasting-deepar”, model output location, and prediction length.
We will use DeepARPredictor class shown in this screenshot to encode request and decode response.	Go to smartmeter notebook and execute step 9.
Now with the dataset ready, Kim can start the training and deploy the model. Since this step can take 15-30 minutes, suggest to skip this step, later step will use a pre-trained model so it’s safe to skip it. If you do run this step, do it before the demo.	Go to smartmeter notebook and execute step 10. Show the training result from the cell output and the training job completed from Sagemaker console

Say

Show

With 5000+ household meter data, each is a time series, Kim doesn’t want to build one ML model per individual time series which will make the machine learning pipeline complicated. So she chose the DeepAR model which can train a single model jointly over all of the time series. With hundreds of related time series, DeepAR outperforms the standard ARIMA and ETS methods.

You can find more details about DeepAR algorithm and its hyperparameters

Go to smartmeter notebook and execute step 8, explain the marked code, those are ML algorithm used “forecasting-deepar”, model output location, and prediction length.

We will use DeepARPredictor class shown in this screenshot to encode request and decode response.

Go to smartmeter notebook and execute step 9.

Now with the dataset ready, Kim can start the training and deploy the model. Since this step can take 15-30 minutes, suggest to skip this step, later step will use a pre-trained model so it’s safe to skip it. If you do run this step, do it before the demo.

Go to smartmeter notebook and execute step 10.

Show the training result from the cell output and the training job completed from Sagemaker console

Model Evaluation

Say	Do	Show
Now the model is ready, let’s deploy it and evaluate it. We will use the pre-trained model. This step will take 6-8 minutes, suggested to do it before the demo.	Go to smartmeter notebook and execute step 11. Go to Sagemaker console to show the deployed endpoint.
Kim randomly picked one household data to evaluate the model by comparing the forecasting to the actual meter data. Note: It’s safe to ignore the warning here.	Go to smartmeter notebook and execute step 12.

Say

Show

Now the model is ready, let’s deploy it and evaluate it. We will use the pre-trained model. This step will take 6-8 minutes, suggested to do it before the demo.

Go to smartmeter notebook and execute step 11.

Go to Sagemaker console to show the deployed endpoint.

Kim randomly picked one household data to evaluate the model by comparing the forecasting to the actual meter data.

Note: It’s safe to ignore the warning here.

Go to smartmeter notebook and execute step 12.

Build ML pipeline

Say	Do	Show
Satisfied with the model, Kim wants to build an ML pipeline to do forecasting for all household data weekly. She uses SageMaker batch transform job. This step will take about 4 minutes.	Go to smartmeter notebook and execute step 13 and 14. Once the job is done, go to SageMaker console to show the transform job created.
Batch job output will be stored on S3 in JSON format, you can visualize it or load it into Data warehouse for further analysis. Note: It’s safe to ignore the warning here.	Go to smartmeter notebook and execute step 15.

Say

Show

Satisfied with the model, Kim wants to build an ML pipeline to do forecasting for all household data weekly. She uses SageMaker batch transform job.

This step will take about 4 minutes.

Go to smartmeter notebook and execute step 13 and 14.

Once the job is done, go to SageMaker console to show the transform job created.

Batch job output will be stored on S3 in JSON format, you can visualize it or load it into Data warehouse for further analysis.

Note: It’s safe to ignore the warning here.

Go to smartmeter notebook and execute step 15.

Say	Do	Show
To analyze the forecast result further and build dashboard on it, Kim stores the result back into Data warehouse. This can be done by using Pandas DataFrame directly if the resultset is small. DataFrame can insert the result to target database in batch. In this demo, We will choose 90% confidence interval for 20 households (meter_id)	Go to smartmeter notebook and execute step 16.
Now Kim can visualize the meter forecast and share results with her team. The first step in building a Quicksight Dashboard is to setup the dataset.	Navigate to Quicksight and Create a new Data Set using the source created earlier.
Choose the “ml” schema and the table “meter_forecast” to visualize the meter forecast.	Select the “ml” schema from the dropdown list and select table “meter_forecast” under the Tables section. Choose “Use custom SQL”
We will use “Custom SQL” to select the data required and add weekly, monthly buckets	Name the Query as “DS MeterForecast SQL”, enter the SQL code below into “Custom SQL” section, and click “Confirm query”; On the next screen click on Visualize button `select meterid, "datetime" , DATE_PART(y, "datetime") as year, DATE_PART(w, "datetime") as week, to_date (DATE_PART(y, "datetime") \|\| '/' \|\| DATE_PART(mon, "datetime"), 'YYYY/MM') as month, DATE_PART(qtr, "datetime") as quarter, AVG(kwh), sum(kwh) as Kwh from ml.meter_forecast group by meterid, "datetime", week, month, quarter order by "datetime"`
Next we will add a visual for 7-day Meter data forecast. “Energy predication by household”	Choose the following parameters for this visual: Title: “Energy Predictions by household” Dataset: DS Sales History SQL x-axis = datetime value = kwh color = meterid ChartType: Line Chart
Kims also wants to show history energy consumption and forecast in one view, then she needs to combine the history and forecast data. For this we can create a view from Redshift data warehouse.	In Redshift Query Editor, create a view using below SQL `create or replace view ml.v_meter_data_with_forecast as select meterid, "datetime", kwh from ml.meter_data where meterid in (select distinct meterid from ml.meter_forecast) and "datetime" < '2014-01-01' union select * from ml.meter_forecast;`
We will add a second data set called “meter_data_with_forecast”.	Enter the SQL code below into “Custom SQL” section, and click Finish, and then click on “Save” and name the dataset “meter_data_with_forecast” `select meterid, DATE_TRUNC('day', "datetime") as day, DATE_PART(y, "datetime") as year, DATE_PART(w, "datetime") as week, to_date (DATE_PART(y, "datetime") \|\| '/' \|\| DATE_PART(mon, "datetime"), 'YYYY/MM') as month, DATE_PART(qtr, "datetime") as quarter, sum(kwh) as Kwh from ml.v_meter_data_with_forecast group by meterid, day, year, week, month, quarter ;`
Add visualizaton to analyze by month, “Total energy consumption + prediction by month”.	Choose the following parameters for this visual: Title: “Total Energy Predictions by Month” Dataset: DS Sales History SQL x-axis = datetime value = kwh color=meterid ChartType: Vertical Bar Chart