Backup and restore
This guide shows you how to create comprehensive backups of your Infrahub deployment and restore them when needed. You'll learn to backup the Neo4j graph database, object storage, and task management data to ensure complete data recovery capabilities.
For Neo4j cluster deployments, see Cluster backup and restore.
Prerequisites​
- Running Infrahub deployment (Docker Compose or Kubernetes)
- Administrative access to the Neo4j database
- Access to the object storage location (S3 or local filesystem)
- Sufficient storage space for backup files
- For cluster deployments: Understanding of your cluster topology
Create a full backup​
Step 1: Install the backup tool​
- infrahub-backup CLI (Recommended)
- Kubernetes Helm
- Docker Compose
- Remote Database
Install the infrahub-backup CLI tool:
curl https://infrahub.opsmill.io/ops/$(uname -s)/$(uname -m)/infrahub-backup -o infrahub-backup
chmod +x infrahub-backup
For Kubernetes deployments using Helm, see the dedicated backup guide:
If you prefer manual control, proceed to backup each component individually as described in the following steps.
Alternatively, you can use the legacy tool to backup a remote Neo4j database.
Step 2: Backup the databases​
- infrahub-backup CLI
- Kubernetes Helm
- Docker Compose
- Remote Database
Create a backup of your running Infrahub instance:
./infrahub-backup create
The tool automatically:
- Checks for running tasks before starting (use
--forceto skip) - Creates a timestamped backup archive (for example,
infrahub_backup_20250129_153045.tar.gz) - Backs up Neo4j database with metadata (configurable with
--neo4jmetadata) - Backs up Prefect/PostgreSQL task management database
- Calculates SHA256 checksums for integrity verification
We plan to add object storage backup in a future release. Handle object storage backups separately for now.
For Kubernetes deployments using Helm, see the dedicated backup guide:
Connect to your Neo4j container and create a backup:
# Connect as neo4j user to avoid permission issues
docker exec -it -u neo4j infrahub-database-1 bash
# Create backup directory and run backup
mkdir -p backups
neo4j-admin database backup --to-path=backups/
# Verify backup creation
ls backups/
# Output: neo4j-2025-03-24T19-57-18.backup
Backup the Prefect PostgreSQL database containing task logs and execution history:
# Export Prefect database (using default credentials)
docker compose exec -T task-manager-db \
pg_dump -Fc -U postgres -d prefect > prefect.dump
For remote database backups using the Python utility:
# Clone the repository or use Docker image
python -m utilities.db_backup neo4j backup \
--database-url=172.28.64.1 \
/infrahub_backups
# If network access issues occur, use host network
python -m utilities.db_backup neo4j backup \
--host-network \
--database-url=172.28.64.1 \
/infrahub_backups
Step 3: Backup the object storage​
The object storage layer holds all file content (file objects and artifacts) outside of the graph database. The graph database references this content through storage_id values, so both must be backed up together to maintain consistency.
- S3 Storage
- Local Filesystem
If using S3 for object storage, use AWS CLI or your preferred S3 backup tool:
# Sync S3 bucket to local backup directory
aws s3 sync s3://your-infrahub-bucket /backup/object_store/
For local filesystem storage, copy the object storage directory:
# Copy object storage directory to backup location
docker compose cp infrahub-server:/opt/infrahub/storage/. /backup/object_store/
Restore from backup​
Step 1: Prepare the environment​
Ensure Infrahub services are running before starting the restore process. You can start from a scratch/blank deployment.
- infrahub-backup CLI
- Kubernetes Helm
- Manual Process
Restore from a backup archive:
./infrahub-backup restore infrahub_backup_20250129_153045.tar.gz
The tool automatically:
- Validates backup integrity using checksums
- Wipes cache and message queue data
- Stops application containers
- Restores PostgreSQL database first
- Restores Neo4j database with metadata
- Restarts all services in correct order
For Kubernetes deployments using Helm, see the dedicated restore guide:
If restoring manually, follow the steps below for each component.
Step 2: Restore the databases​
- infrahub-backup CLI
- Kubernetes Helm
- Docker Compose
- Remote Database
This is automatically handled by infrahub-backup.
For Kubernetes deployments using Helm, see the dedicated restore guide:
# Stop app services
docker compose stop task-worker infrahub-server task-manager
# Copy backup directory to container
docker cp database-backup infrahub-database-1:/tmp/backup
# Connect to container as neo4j user
docker exec -it -u neo4j infrahub-database-1 bash
# Drop existing database
cypher-shell -d system -u neo4j
DROP DATABASE neo4j;
exit;
# Clean residual data
rm -rf /data/databases/neo4j
rm -rf /data/transactions/neo4j
# Restore from backup
neo4j-admin database restore \
--from-path=/tmp/backup neo4j \
--overwrite-destination=true
# Recreate database
cypher-shell -d system -u neo4j
CREATE DATABASE neo4j;
SHOW DATABASES;
Restore the task manager PostgreSQL database
# Restore Prefect database
docker compose exec -T task-manager-db \
pg_restore -d postgres -U postgres --clean --create prefect.dump
# Restart task manager to apply changes
docker compose restart task-manager
# Restore using Python utility
python -m utilities.db_backup neo4j restore \
/infrahub_backups \
--database-cypher-port=7687
Step 3: Restore the object storage​
- S3 Storage
- Local Filesystem
# Restore S3 bucket from backup
aws s3 sync /backup/object_store/ s3://your-infrahub-bucket
# Restore object storage directory into the container
docker compose cp /backup/object_store/. infrahub-server:/opt/infrahub/storage/
Step 4: Restart Infrahub services​
- infrahub-backup CLI
- Kubernetes Helm
- Docker Compose
This is automatically handled by infrahub-backup.
For Kubernetes deployments using Helm, see the dedicated restore guide:
Restart services in the correct order to ensure proper initialization:
# Restart API servers first
docker compose restart infrahub-server
# Then restart task workers
docker compose restart task-worker
Validation​
Verify your restoration was successful:
-
Check database status:
docker compose exec -T database cypher-shell -u neo4j \-c "SHOW DATABASES;"The Neo4j database should show as "online".
-
Verify Infrahub API:
curl http://localhost:8000/api/schema/summaryYou should receive a valid schema response.
-
Check task manager:
docker compose logs task-manager --tail 50Logs should show normal operation without errors.
-
Test artifact retrieval: Access the Infrahub UI and verify that stored artifacts (Transformations, queries) are accessible.
Advanced usage​
Using the Python-based backup utility​
The Python-based utility (utilities/db_backup) is still available in the main Infrahub repository but is being replaced by infrahub-backup. Use it only if infrahub-backup doesn't meet your specific requirements.
Use non-default ports​
If your deployment uses custom ports, specify them during backup and restore operations:
# Backup with custom backup port
python -m utilities.db_backup neo4j backup \
--database-backup-port=12345 \
/infrahub_backups
# Restore with custom Cypher port
python -m utilities.db_backup neo4j restore \
/infrahub_backups \
--database-cypher-port=9876
Run backup tool via Docker​
If you don't have the repository cloned locally, run the backup tool directly from the Infrahub Docker image:
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
registry.opsmill.io/opsmill/infrahub \
python -m utilities.db_backup
Related resources​
- Database backup overview - Architecture and backup strategy concepts
- Cluster backup and restore - Neo4j cluster-specific backup and restore
- infrahub-backup CLI reference - Command-line reference for the infrahub-backup tool