Deployment
Datagrok runs as a set of Docker containers on top of a PostgreSQL metadata database and persistent file storage. The same containers ship across every deployment path; what changes is where they run and what manages their lifecycle.
Components
Every Datagrok stand runs the following services. The same images are used on every deployment path; the Helm chart is the canonical reference for service configuration, and Images and versions tracks the latest pinned tags.
| Service | Image | Role |
|---|---|---|
| Datagrok | datagrok/datagrok | Core REST API, web client, authn/authz, metadata persistence, Nginx reverse proxy. |
| PostgreSQL | pgvector/pgvector:pg17 | Metadata store (users, projects, packages, queries, scripts, file index). RDS / Cloud SQL / in-cluster. |
| grok_pipe | datagrok/grok_pipe | WebSocket multiplexer for streaming DataFrames and script results between clients and Jupyter workers. |
| grok_spawner | datagrok/grok_spawner | Manages plugin container lifecycle on Docker / ECS / Kubernetes (selected per deployment). |
| grok_connect | datagrok/grok_connect | JDBC bridge for 30+ external databases. |
| JupyterKernelGateway | datagrok/jupyter_kernel_gateway | Server-side script execution (Python, R, Julia, JavaScript, Octave). |
| RabbitMQ | rabbitmq | AMQP broker for the call queue (script and function execution). Independent release cadence. |
| grok_registry_proxy | datagrok/grok_registry_proxy | Optional. Proxies plugin image pulls from a backing registry (ECR, Docker Hub) using Datagrok JWT auth, so users never see registry credentials. |
For object storage, use AWS S3, Google Cloud Storage, Azure Blob, or a local volume — see the
chart's storage.type value or File storage.
Deployment paths
Five paths are supported. They share images and configuration parameters; pick by where the Datagrok stand will live.
| Path | Use when |
|---|---|
| Local Docker Compose | Single machine — laptop or VM — for evaluation, demos, or development. Self-contained PostgreSQL inside Compose. |
| Advanced Docker Compose | Single-machine deployments that need separate data volumes, the JS-API debug stack, or other custom topology. |
| Kubernetes Helm chart | Any Kubernetes cluster: on-prem, GKE, AKS, kind, k3s, MicroK8s, or a pre-existing EKS. The EKS CFN template uses the same chart. |
| AWS CloudFormation (EKS) | Recommended for new AWS stands. Provisions EKS, RDS, S3, IAM with IRSA, and installs the Helm chart automatically. |
| AWS CloudFormation (ECS) | Existing ECS stacks. Same RDS / S3 logical IDs as the EKS template, so an in-place stack-template swap migrates without re-creating data. Targeted for deprecation. |
The AWS Marketplace listing wraps the EKS template for one-click, infrastructure-isolated installs.
Terraform on AWS and Terraform on GCP are available for teams that integrate Datagrok into existing infrastructure-as-code pipelines.
Bare-metal / VM is the manual Docker-on-host path for environments without container orchestration.
EKS or Helm directly?
The EKS CloudFormation template calls the Helm chart — it is not a separate deployment. Pick based on what infrastructure you already manage:
- Use CloudFormation (EKS) if you want one stack to provision the cluster, RDS, S3, IAM, and the
application together. The template can also target a pre-existing EKS cluster
(
UseExistingCluster=true). - Use the Helm chart directly if your cluster, database, and object storage already exist, or
if the cluster is not on AWS (GKE, AKS, on-prem). The chart ships ready-made overlays for EKS
(
values-eks.yaml) and GKE (values-gke.yaml).
Complete the setup
After the platform is reachable, configure cross-cutting concerns:
- Authentication (LDAP, OAuth, SAML, IAP, etc.)
- SMTP
- Install packages
- S3 backups (cloud deployments)