AWS-Azure Site-to-Site VPN with Synapse Analytics

Multi-cloud is a fact of live at many Enterprises that have adopted Cloud Computing. However the segregation of workloads over multiple cloud providers happens, there will always be a need for cross-cloud integration.
Some workloads are deployed only once e.g. an Enterprise Data Warehouse. Hence the need to cater for connectivity in reliable and performant connectivity in such scenarios.

To this end, I created a demo setup at geekzter/synapse-performance: Testing Synapse Analytics Network Performance (github.com). This repo can be used to demonstrate performance of connectivity between AWS & Azure regions. The workload used in this demo is Synapse Analytics (formerly known as SQL Data Warehouse) populated with the New York Taxicab dataset. For connectivity Site-to-Site VPN (aws-azure-vpn module) is used, which implements the AWS — Azure S2S VPN described in this excellent blog post by Jonatas Baldin.

Pre-requisites

  • To get started you need Git, Terraform (to get that I use tfenv on Linux & macOS, Homebrew on macOS or chocolatey on Windows)
  • A SSH public key (default location is ~/.ssh/id_rsa.pub). This key is also used to create secrets for EC2 instances, which requires the private key to be in PEM format
  • There are some scripts to make life easier, you’ll need PowerShell to execute those

If you create a GitHub Codespace for this repository, you’ll get the above set up - including a generated SSH key pair.

AWS

You need an AWS account. There are multiple ways to configure the AWS Terraform provider, I tested with static credentials:

AWS_ACCESS_KEY_ID="AAAAAAAAAAAAAAAAAAAA" 
AWS_DEFAULT_REGION="eu-west-1"
AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Azure

You need an Azure subscription. The identity used needs to have the subscription contributor role in order to create resource groups.
Authenticate using Azure CLI:

az login

or use a Service Principal:

ARM_CLIENT_ID="00000000-0000-0000-0000-000000000000"
ARM_CLIENT_SECRET="00000000-0000-0000-0000-000000000000"

Make sure you work with the right subscription:

ARM_SUBSCRIPTION_ID="00000000-0000-0000-0000-000000000000"

A SSH public key (default location is ~/.ssh/id_rsa.pub) is required. This key is also used to create secrets for EC2 instances, which requires the private key to be in PEM format. Create a key pair if you don’t have one set up:

ssh-keygen -m PEM -f ~/.ssh/id_rsa

You can then provision resources by first initializing Terraform:

terraform init

And then running:

terraform apply

Take note of configuration data generated by Terraform.

To populate Synapse Analytics, run this script:

./scripts/load_data.ps1

If the script fails, you can run it multiple times — it will only load tables not populated yet. Alternatively, follow the manual steps documented here.

Now you can log on the the AWS VM with the generated client.rdp file. The username is Administrator. Use configuration data from Terraform to get the password:

terraform output aws_windows_vm_password 

Connect to Synapse Analytics using SQL Server Management Studio. You can use the desktop shortcut to connect directly to the Synapse pool:

Desktop shortcuts are created automatically

The Synapse Analytics password be fetched using:

terraform output user_password 

The VM should already have SQL Server Management Studio installed, and the hosts file edited with a line to resolve Synapse Analytics to the Private Endpoint in the Azure Virtual Network.

Within SQL Server Management Studio, run a query e.g.

select top 100000000 * from dbo.Trip

This query simulates an ETL of 100M rows and completes in ~ 30 minutes, when executed from AWS Ireland to Synapse Analytics with DW100c in Azure West Europe (Amsterdam). Using the public endpoint instead of S2S VPN and private endpoint yields the same results, as both paths are taking a direct route.

When you want to destroy resources, run:

terraform destroy

I’m a Cloud Solution Architect at Microsoft, focusing on Azure. You can find me on GitHub here: https://github.com/geekzter. Opinions are my own.