Docker Swarm

The deployment consists of a few docker containers, database for storing metadata, and persistent file storage for storing files

Datagrok requires PostgreSQL database to store metadata. We recommend using scalable and highly reliable solutions, such as AWS RDS.

For persistent file storage, Datagrok supports a lot of options, including cloud solutions, for example AWS S3 and Local File System storage.

This document contains instructions to deploy Datagrok using Docker Swarm on virtual machines or dedicated hosts with Local File System for persistent storage.

More information about Datagrok design and components:

Prerequisites

We use native Docker compose commands to run applications on machines. It simplifies multi-container application development and deployment.
1. Platform requires 2 hosts: 1 for Datagrok, 1 for CVM with hardware parameters:
  - Minimal: 30 GB free disk space, 2 CPUs, 4 GB RAM.
  - Recommended: Over 30 GB free disk space, 4 CPUs, 8 GB RAM, or higher.
2. Download and install the latest version of Docker Compose to your environment (virtual machines, dedicated hosts, etc.)

Preparations

Activate docker swarm mode on your manager node (it should be a datagrok host):

docker swarm init

The shell will print a token for adding worker nodes something like that:

Swarm initialized: current node (bvz81updecsj6wjz393c09vti) is now a manager.

To add a worker to this swarm, run the following command:

 docker swarm join \
 --token SWMTKN-1-testetsttestetst-qwewsdwqdwdqwddwqdwq XX.XX.XX.XX:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

    docker swarm join --token SWMTKN-1-testetsttestetst-qwewsdwqdwdqwddwqdwq XX.XX.XX.XX:2377

From the manager node (datagrok) set labels to all nodes in the swarm cluster.
1. List all nodes. We will need ids of the datagrok and CVM nodes for the next step:
```
docker node ls
```
2. Add labels role==datagrok to datagrok node which is the manager and role==cvm to CVM node which is the worker in the cluster:
```
docker node update --label-add  role==datagrok <datagrok_node_id>
docker node update --label-add  role==cvm <cvm_node_id>
```
Download a Docker Compose YAML file: link

Run the docker stack

docker stack deploy -c ./swarm,docker-compose.yaml datagrok

Check if Datagrok started successfully: http://<DATAGROK_VM_IP_ADDRESS>:8080, login to Datagrok using the username "admin" and password "admin".
Edit settings in the running Datagrok platform (Tools -> Settings...). Do not forget to click Apply to save new settings.
- Scripting:
  - CVM Url: http://<CVM_VM_DNS_ADDRESS>:8090
  - CVM URL Client: http://<CVM_VM_DNS_ADDRESS>:8090
  - H2o Url: http://<CVM_VM_DNS_ADDRESS>:54321
  - API Url: http://<DATAGROK_VM_DNS_ADDRESS>:8080/api
  - Cvm Split: true
- Dev:
  - CVM Url: http://<CVM_VM_DNS_ADDRESS>:8090
  - Cvm Split: true
  - API Url: http://<DATAGROK_VM_DNS_ADDRESS>:8080/api

Users access

Both Compute and Datagrok engines should be accessible by users. The easiest way is to create DNS endpoints pointing to public IPs or load balancers in front of the services: datagrok.example and cvm.example.

Useful links

Server configuration properties

Prerequisites​

Preparations​

Users access​

Useful links​

Prerequisites

Preparations

Users access

Useful links