AWS CloudFormation
The deployment consists of a few docker containers, database for storing metadata, and persistent file storage for storing files
This document contains instructions to deploy Datagrok using CloudFormation on AWS ECS cluster with AWS RDS and AWS S3.
We considered a lot of typical security nuances during the CloudFormation template development. As a result, you will create a Datagrok infrastructure in AWS that applies to all standard security policies.
More information about Datagrok design and components:
Prerequisites
-
Check that you have required permissions on AWS account to perform CloudFormation deployment to ECS.
-
Create a secret in AWS Secret Manager with Docker Hub credentials
-
Create a secret in Secret Manager with Docker Hub username as 'Username' and Access Token as 'Password'
-
Copy AWS ARN for the created secret. It should look like this:
arn:aws:secretsmanager:<region>:<account-id>:secret:<secret_name>-<random_id>
.-
To get ARN from the command line using AWS CLI run:
aws secretsmanager list-secrets --filters Key=name,Values=<secret_name> --output text --query 'SecretList[].ARN'
-
-
Come up with two endpoints:
DATAGROK_DNS
,CVM_DNS
. Datagrok requires two endpoints:DATAGROK_DNS
andCVM_DNS
. Users will useDATAGROK_DNS
to access Datagrok Web UI, and requestsCVM_DNS
will be sent automatically by Datagrok Client. -
Create RSA SSL certificate(s) for
DATAGROK_DNS
,CVM_DNS
.- If you use AWS ACM service for SSL certificates
- Generate ACM certificate in AWS
which will be valid for both endpoints:
DATAGROK_DNS
,CVM_DNS
. - Copy AWS ARN for the created certificate. It should look like
this:
arn:aws:acm:<region>:<account_id>:certificate/<certificate_id>
.
- Generate ACM certificate in AWS
which will be valid for both endpoints:
- If you do not use AWS ACM service for SSL certificates, you can create a certificate(s) for
DATAGROK_DNS
,CVM_DNS
endpoints any way you are already using.- Upload certificate(s) to AWS ACM
- Copy AWS ARN for the created certificate(s). It should look like
this:
arn:aws:acm:<region>:<account_id>:certificate/<certificate_id>
.
- If you use AWS ACM service for SSL certificates
Deploy Datagrok components
-
Download CloudFormation Template in YAML or JSON format as you prefer.
-
Create a CloudFormation stack using AWS Console or AWS CLI
- Use CloudFormation Template downloaded on the first step as stack template
- Specify stack name . It must be shorter than ten symbols to meet AWS naming requirements
- Specify parameters
for the stack:
ArnCvmCertificate
: Specify AWS ACM ARN forCVM_DNS
from the 4th prerequisites step. It can be the same as for theArnDatagrokCertificate
.ArnDatagrokCertificate
: Specify AWS ACM ARN forDATAGROK_DNS
from the 4th prerequisites step. It can be the same as forArnCvmCertificate
.ArnDockerHubCredential
: Specify AWS Secret Manager ARN for Docker Hub authorization from 3rd prerequisites step.CreateDemoData
: Datagrok provides demo databases with demo data for the full experience. Choosetrue
to create demo databases near Datagrok.LaunchType
: It is an optional argument. The default value isFARGATE
. It will suit best in most cases. Also, the template supports theEC2
launch type, which will use EC2 instances and reduce the price of the stand. More information about EC2 launch type is described below. We strongly recommend usingFARGATE
launch type.Ec2PublicKey
: It is an optional argument. It is only required forEC2
LaunchType
. By default, you can skip it. More information about EC2 launch type is described below.- All other parameters are for Datagrok Docker image tags. The default value is
latest
.
- You can skip stack options; the default values suit the needs.
-
CloudFormation Stack creation takes around 10 minutes. It will create RDS, S3, ECS Cluster, and other required resources. Cloudformation template is large, so you need to upload it, for example, to the S3 bucket and launch it from there in AWS CLI:
aws cloudformation create-stack \
--stack-name STACK_NAME \
--template-url https://s3.amazonaws.com/BUCKET_NAME/TEMPLATE_NAME.yaml \
--parameters ParameterKey=KeyName,ParameterValue=my-key \ -
After the Datagrok container starts, the Datagrok server will deploy the database. You can check the status by checking the running task log in CloudWatch
-
Create
DATAGROK_DNS
andCVM_DNS
DNS records which will route to the newly created Internet-facing Application Load Balancers.
Configure Datagrok settings
-
Go to the web browser to
DATAGROK_DNS
, login to Datagrok using usernameadmin
and passwordadmin
. -
Edit settings in the Datagrok platform (Tools -> Settings...). Remember to click Apply to save new settings.
- Scripting:
- Api Url:
https://<DATAGROK_DNS>
- Cvm Url:
https://<CVM_DNS>
- H2o Url:
https://<CVM_DNS>:54321
- Cvm Url Client:
https://<CVM_DNS>
- Api Url:
- Admin:
- Web Root:
https://<DATAGROK_DNS>
- Api Root:
https://<DATAGROK_DNS>/api
- Web Root:
- Scripting:
-
Reload the page and re-login. Now you are good to go.
Optional: Cost reduction stand
AWS stack uses FARGATE
instances for deployment by default. To reduce
infrastructure costs, you can use EC2 instances. To do so, follow the instructions above with additions below in
the prerequisites and parameters.
EC2 Prerequisites
-
Before deploying the Datagrok Stand in addition to Prerequisites, create RSA key pair. It is required to get access to the instances that will be created, you need to have SSH key pair: a private key and a public key.
- If you already have an RSA key pair, you can use the existing one.
- If you have a Linux-based OS or macOS, type in terminal
ssh-keygen
and hit Enter. You'll be asked to enter a passphrase. Hit Enter to skip this step. It will createid_rsa
andid_rsa.pub
files in the~/.ssh
directory. - If you have Windows open the Settings panel, then click Apps.
Under the Apps and Features heading, click Optional Features.
Scroll down the list to see if OpenSSH Client is listed.
If it's not, click the plus-sign next to Add a feature.
Scroll through the list to find and select OpenSSH Client. Finally, click Install.
Press the Windows key, type cmd under Best Match, and right-click Command Prompt.
Click Run as Administrator.
If prompted, click Yes in the Do you want to allow this app to make changes to your device? pop-up.
In the command prompt, type
ssh-keygen
and hit Enter. You'll be asked to enter a passphrase. Hit Enter to skip this step. By default, the system will save the keys toC:\Users\your_username/.ssh/id_rsa
.
-
Copy the content of the public part of the key pair:
id_rsa.pub
. It will be placed in the EC2 instances using theEc2PublicKey
parameter to access machines.
EC2 Parameters
-
Change the
LaunchType
parameter toEC2
. -
Paste your public key content from EC2 prerequisites to the parameter
Ec2PublicKey
.