Skip to main content

Regular machine

Datagrok runs as a set of Docker containers on top of a PostgreSQL metadata database and persistent file storage. This page covers manual on-host installs — bare-metal servers, on-prem VMs, single EC2 / GCE instances, or any other host you manage directly with Docker Compose. The same Datagrok services run on every deployment — see Components for the canonical list.

Use container orchestration when you can

For new AWS stands prefer the CloudFormation (EKS) or CloudFormation (ECS) templates — they automate everything on this page and are the AWS path under active development. For Kubernetes (on-prem, GKE, AKS, existing clusters) use the Helm chart. This page is for hosts without an orchestrator.

Prerequisites

  • A Linux host (or a Linux VM on Windows / macOS) with at least 4 CPUs, 8 GB RAM, and 60 GB free disk for the full stack including server-side scripting.
  • Docker Engine and Docker Compose v2 installed; the user that will run the stack added to the docker group.
  • A PostgreSQL 17 database. The bundled compose stack runs Postgres in-cluster; for production prefer a managed instance (AWS RDS, GCP Cloud SQL, Azure Database for PostgreSQL, or an on-prem cluster).
  • Object storage. The bundled compose stack uses a local volume; production stands typically use S3, GCS, Azure Blob, or an S3-compatible service.
  • DNS or load-balancer pointing at the host on port 8080 (Datagrok) — direct port exposure works for evaluation, but production stands should sit behind a TLS-terminating reverse proxy.

Install

  1. Clone the public repository on the host (it ships the canonical compose file):

    git clone https://github.com/datagrok-ai/public.git
    cd public/docker
  2. Open localhost.docker-compose.yaml and edit the GROK_PARAMETERS JSON on the datagrok service. Replace the values inline with your database and storage details (drop the amazonStorage* block if you're using local file storage):

    {
    "dbServer": "<DATABASE_HOST>",
    "dbPort": "5432",
    "db": "datagrok",
    "dbLogin": "datagrok",
    "dbPassword": "<DB_PASSWORD>",
    "dbAdminLogin": "<POSTGRES_ADMIN_USER>",
    "dbAdminPassword": "<POSTGRES_ADMIN_PASSWORD>",
    "amazonStorageRegion": "us-east-2",
    "amazonStorageBucket": "<S3_BUCKET>"
    }

    See Server configuration for every supported key — including GCS, Azure, RDS IAM auth, and TLS options.

  3. Pull the images and start the full stack:

    docker compose -f localhost.docker-compose.yaml --project-name datagrok \
    --profile all up -d

    Use the --profile flags from Local machine: advanced to skip optional services (e.g., drop server-side scripting).

  4. After about a minute the server is ready at http://<HOST>:8080. Sign in as admin / admin and change the admin password on first login.

Multi-host topologies

For multi-host installs (Datagrok services on one host, scripting / Jupyter Kernel Gateway on another, or larger), use the Helm chart on Kubernetes. A single-node K8s distribution like k3s or kind is enough if you don't already run a cluster.

On AWS EC2

For a single EC2 instance with RDS and S3 attached, follow this page and supply the RDS endpoint and S3 bucket details in GROK_PARAMETERS — see AWS EC2 specifics. For multi-AZ, autoscaling, or load-balanced production stands on AWS, use the CFN ECS or CFN EKS template instead — they provision the host fleet, RDS, S3, and ALBs end-to-end.

See also