Skip to main content

Deployment

Datagrok runs as a set of Docker containers on top of a PostgreSQL metadata database and persistent file storage. The same containers ship across every deployment path; what changes is where they run and what manages their lifecycle.

Components

Every Datagrok stand runs the following services. The same images are used on every deployment path; the Helm chart is the canonical reference for service configuration, and Images and versions tracks the latest pinned tags.

ServiceImageRole
Datagrokdatagrok/datagrokCore REST API, web client, authn/authz, metadata persistence, Nginx reverse proxy.
PostgreSQLpgvector/pgvector:pg17Metadata store (users, projects, packages, queries, scripts, file index). RDS / Cloud SQL / in-cluster.
grok_pipedatagrok/grok_pipeWebSocket multiplexer for streaming DataFrames and script results between clients and Jupyter workers.
grok_spawnerdatagrok/grok_spawnerManages plugin container lifecycle on Docker / ECS / Kubernetes (selected per deployment).
grok_connectdatagrok/grok_connectJDBC bridge for 30+ external databases.
JupyterKernelGatewaydatagrok/jupyter_kernel_gatewayServer-side script execution (Python, R, Julia, JavaScript, Octave).
RabbitMQrabbitmqAMQP broker for the call queue (script and function execution). Independent release cadence.
grok_registry_proxydatagrok/grok_registry_proxyOptional. Proxies plugin image pulls from a backing registry (ECR, Docker Hub) using Datagrok JWT auth, so users never see registry credentials.

For object storage, use AWS S3, Google Cloud Storage, Azure Blob, or a local volume — see the chart's storage.type value or File storage.

Deployment paths

Five paths are supported. They share images and configuration parameters; pick by where the Datagrok stand will live.

PathUse when
Local Docker ComposeSingle machine — laptop or VM — for evaluation, demos, or development. Self-contained PostgreSQL inside Compose.
Advanced Docker ComposeSingle-machine deployments that need separate data volumes, the JS-API debug stack, or other custom topology.
Kubernetes Helm chartAny Kubernetes cluster: on-prem, GKE, AKS, kind, k3s, MicroK8s, or a pre-existing EKS. The EKS CFN template uses the same chart.
AWS CloudFormation (EKS)Recommended for new AWS stands. Provisions EKS, RDS, S3, IAM with IRSA, and installs the Helm chart automatically.
AWS CloudFormation (ECS)Existing ECS stacks. Same RDS / S3 logical IDs as the EKS template, so an in-place stack-template swap migrates without re-creating data. Targeted for deprecation.

The AWS Marketplace listing wraps the EKS template for one-click, infrastructure-isolated installs.

Terraform on AWS and Terraform on GCP are available for teams that integrate Datagrok into existing infrastructure-as-code pipelines.

Bare-metal / VM is the manual Docker-on-host path for environments without container orchestration.

EKS or Helm directly?

The EKS CloudFormation template calls the Helm chart — it is not a separate deployment. Pick based on what infrastructure you already manage:

  • Use CloudFormation (EKS) if you want one stack to provision the cluster, RDS, S3, IAM, and the application together. The template can also target a pre-existing EKS cluster (UseExistingCluster=true).
  • Use the Helm chart directly if your cluster, database, and object storage already exist, or if the cluster is not on AWS (GKE, AKS, on-prem). The chart ships ready-made overlays for EKS (values-eks.yaml) and GKE (values-gke.yaml).

Complete the setup

After the platform is reachable, configure cross-cutting concerns:

  1. Authentication (LDAP, OAuth, SAML, IAP, etc.)
  2. SMTP
  3. Install packages
  4. S3 backups (cloud deployments)