Skip to main content
Version: v24.1

Binary Backups

warning

Binary backups require a valid enterprise license.

Binary backups are full backups of Dgraph data that are written directly to cloud storage (such as Amazon S3 or MinIO) or to an on-premises network file system shared by all Alpha servers. Binary backups enable you to restore a Dgraph cluster to a previous state. Unlike exports, binary backups are Dgraph-specific and provide faster restore operations.

Create a Backup

To create a backup, send a GraphQL mutation to the /admin endpoint.

The BackupInput type defines the available options for backup operations:

input BackupInput {

"""
Destination for the backup: e.g. Minio or S3 bucket.
"""
destination: String!

"""
Access key credential for the destination.
"""
accessKey: String

"""
Secret key credential for the destination.
"""
secretKey: String

"""
AWS session token, if required.
"""
sessionToken: String

"""
Set to true to allow backing up to S3 or Minio bucket that requires no credentials.
"""
anonymous: Boolean

"""
Force a full backup instead of an incremental backup.
"""
forceFull: Boolean
}

Execute the following mutation on the /admin endpoint using any GraphQL-compatible client (such as Insomnia, GraphQL Playground, or GraphiQL).

Backup to NFS

mutation {
backup(input: {destination: "/path/to/local/directory"}) {
response {
message
code
}
taskId
}
}

A local filesystem works only if all Alpha servers have access to it (for example, when all Alpha servers run as normal processes on the same filesystem, not in Docker containers). Use an NFS mount to ensure backups work seamlessly across multiple machines and containers.

Backup to Amazon S3

mutation {
backup(input: {destination: "s3://s3.us-west-2.amazonaws.com/<bucketname>"}) {
response {
message
code
}
taskId
}
}

Configure Amazon S3 Credentials

To back up to Amazon S3, configure the Alpha server with the following AWS credentials using environment variables:

Environment VariableDescription
AWS_ACCESS_KEY_ID or AWS_ACCESS_KEYAWS access key with write permissions to the destination bucket
AWS_SECRET_ACCESS_KEY or AWS_SECRET_KEYAWS secret access key with write permissions to the destination bucket
AWS_SESSION_TOKENAWS session token (required for temporary credentials)

To configure IAM-based authentication:

  1. Create an IAM Role with an IAM policy that grants access to the S3 bucket.
  2. Attach the IAM role to your infrastructure:

Backup to Minio

mutation {
backup(input: {destination: "minio://127.0.0.1:9000/<bucketname>"}) {
response {
message
code
}
taskId
}
}

Configure MinIO Credentials

To back up to MinIO, configure the Alpha server with the following MinIO credentials using environment variables:

Environment VariableDescription
MINIO_ACCESS_KEYMinIO access key with write permissions to the destination bucket
MINIO_SECRET_KEYMinIO secret key with write permissions to the destination bucket

Directory Structures

A binary backup directory has the following structure:

backup
├── dgraph.20210102.204757.509
│ └── r9-g1.backup
├── dgraph.20210104.224757.707
│ └── r9-g1.backup
└── manifest.json

Backup using a MinIO Gateway

Azure Blob Storage

You can use Azure Blob Storage through the MinIO Azure Gateway. Configure a storage account and a container to organize the blobs.

For MinIO configuration, retrieve the storage account keys. The MinIO Azure Gateway maps MINIO_ACCESS_KEY to the Azure Storage Account AccountName and MINIO_SECRET_KEY to the AccountKey.

Once you have the AccountName and AccountKey, you can access Azure Blob Storage locally using one of these methods:

  • Run MinIO Azure Gateway using Docker
    docker run --publish 9000:9000 --name gateway \
    --env "MINIO_ACCESS_KEY=<AccountName>" \
    --env "MINIO_SECRET_KEY=<AccountKey>" \
    minio/minio gateway azure
  • Run MinIO Azure Gateway using the MinIO Binary
    export MINIO_ACCESS_KEY="<AccountName>"
    export MINIO_SECRET_KEY="<AccountKey>"
    minio gateway azure

Google Cloud Storage

You can use Google Cloud Storage through the MinIO GCS Gateway. Create storage buckets, create a Service Account key for GCS, and obtain a credentials file. See Create a Service Account key for detailed instructions.

Once you have a credentials.json, you can access GCS locally using one of these methods:

  • Run MinIO GCS Gateway using Docker
    docker run --publish 9000:9000 --name gateway \
    --volume /path/to/credentials.json:/credentials.json \
    --env "GOOGLE_APPLICATION_CREDENTIALS=/credentials.json" \
    --env "MINIO_ACCESS_KEY=minioaccountname" \
    --env "MINIO_SECRET_KEY=minioaccountkey" \
    minio/minio gateway gcs <project-id>
  • Run MinIO GCS Gateway using the MinIO Binary
    export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
    export MINIO_ACCESS_KEY=minioaccesskey
    export MINIO_SECRET_KEY=miniosecretkey
    minio gateway gcs <project-id>

Verify MinIO Gateway

MinIO Gateway includes an embedded web-based object browser. After starting the MinIO Gateway using one of the methods above, verify it is running by opening a web browser and navigating to http://127.0.0.1:9000. Confirm that the object browser is displayed and can access the remote object storage.

Disable HTTPS for S3 and MinIO Backups

By default, Dgraph assumes the destination bucket uses HTTPS. If the bucket uses HTTP, the backup operation fails. To back up to a bucket using HTTP (insecure), set the query parameter secure=false in the destination field:

mutation {
backup(input: {destination: "minio://127.0.0.1:9000/<bucketname>?secure=false"}) {
response {
message
code
}
taskId
}
}

Override Credentials

The accessKey, secretKey, and sessionToken parameters override the default credentials.

Note: Unless HTTPS is used, credentials are transmitted in plain text. Use these parameters with caution. Prefer environment variables for credential management; these parameters provide additional flexibility when needed.

Set the anonymous parameter to true to back up to an S3 or MinIO bucket that requires no credentials (a public bucket).

Force a Full Backup

By default, Dgraph creates an incremental backup if a full backup exists in the specified location. To create a full backup, set the forceFull field to true in the mutation. Each backup series is identified by a unique ID, and each backup in the series is assigned a monotonically increasing number. See the restore section for details on restoring a backup series.

mutation {
backup(input: {destination: "/path/to/local/directory", forceFull: true}) {
response {
message
code
}
taskId
}
}

List Backups

The GraphQL admin interface provides the listBackups query that lists backups in a specified location along with information from the manifest.json file. The following example lists backups in the /data/backup location:

query backup() {
listBackups(input: {location: "/data/backup"}) {
backupId
backupNum
encrypted
groups {
groupId
predicates
}
path
since
type
}
}

The ListBackupsInput type supports the following fields. Only the location field is required.

input ListBackupsInput {
"""
Destination for the backup: e.g. Minio or S3 bucket.
"""
location: String!

"""
Access key credential for the destination.
"""
accessKey: String

"""
Secret key credential for the destination.
"""
secretKey: String

"""
AWS session token, if required.
"""
sessionToken: String

"""
Whether the destination doesn't require credentials (e.g. S3 public bucket).
"""
anonymous: Boolean
}

The query returns an array of Manifest objects. The fields in the Manifest type correspond to the fields in the manifest.json file.

type Manifest {
"""
Unique ID for the backup series.
"""
backupId: String

"""
Number of this backup within the backup series. The full backup always has a value of one.
"""
backupNum: Int

"""
Whether this backup was encrypted.
"""
encrypted: Boolean

"""
List of groups and the predicates they store in this backup.
"""
groups: [BackupGroup]

"""
Path to the manifest file.
"""
path: String

"""
The timestamp at which this backup was taken. The next incremental backup will
start from this timestamp.
"""
since: Int

"""
The type of backup, either full or incremental.
"""
type: String
}

type BackupGroup {
"""
The ID of the cluster group.
"""
groupId: Int

"""
List of predicates assigned to the group.
"""
predicates: [String]
}

Convert Binary Backup to RDF export format

The export_backup tool converts a binary backup into an exported folder format.

Use this tool when upgrading between major Dgraph versions with incompatible changes. The tool enables you to apply changes to the exported .rdf file or schema file, then import the dataset into the new Dgraph version.

Ensure you have created a binary backup. A typical binary backup directory structure looks like this:

backup
├── dgraph.20210104.224757.709
│ └── r9-g1.backup
└── manifest.json

Then run the following command:

dgraph export_backup --location "<location-of-your-binary-backup>" --destination "<destination-of-the-export-dir>"

After completion, the export folder (in this example, dgraph.r9.u0108.1621) has the following structure:

dgraph.r9.u0108.1621
├── g01.gql_schema.gz
├── g01.rdf.gz
└── g01.schema.gz

Encrypted Backups

For encrypted backups, configure the Dgraph Alpha server with the --encryption key-file=value flag. You can alternatively configure the Alpha server to interface with a HashiCorp Vault server to obtain encryption keys.

note

The encryption key-file=value flag and vault superflag are used for both encryption-at-rest and encrypted backups.

Important: All backups in a series (full and incremental) must use the same encryption setting. You cannot mix encrypted and unencrypted backups within the same backup series. The Encrypted flag enforces this restriction.

The key size (16, 24, or 32 bytes) determines the AES cipher: AES-128, AES-192, or AES-256. Dgraph uses AES in CTR mode. Binary backups are already compressed with gzip; encryption is applied to the gzipped data.

During backup, a 16-byte initialization vector (IV) is prepended to the ciphertext after encryption.

Online Restore

To restore from a backup to a live cluster, execute a mutation on the /admin endpoint:

mutation{
restore(input:{
location: "/path/to/backup/directory",
backupId: "id_of_backup_to_restore"
}){
message
code
}
}

Online restore operations return immediately after the request is sent. The restore process updates UID and timestamp leases automatically. The backup being restored must contain the same number of groups in its manifest.json file as the target cluster.

note

When using backups made from a Dgraph cluster that uses encryption (so backups are encrypted), you need to use the same key from that original cluster when doing a restore process. Dgraph's Encryption at Rest uses a symmetric-key algorithm where the same key is used for both encryption and decryption, so the encryption key from that cluster is needed for the restore process.

Online restore can be performed from Amazon S3, MinIO, or a local directory. The RestoreInput type defines the available options:

input RestoreInput {

"""
Destination for the backup: e.g. Minio or S3 bucket.
"""
location: String!

"""
Backup ID of the backup series to restore. This ID is included in the manifest.json file.
If missing, it defaults to the latest series.
"""
backupId: String

"""
Number of the backup within the backup series to be restored. Backups with a greater value
will be ignored. If the value is zero or is missing, the entire series will be restored.
"""
backupNum: Int

"""
Path to the key file needed to decrypt the backup. This file should be accessible
by all Alpha servers in the group. The backup will be written using the encryption key
with which the cluster was started, which might be different than this key.
"""
encryptionKeyFile: String

"""
Vault server address where the key is stored. This server must be accessible
by all Alpha servers in the group. Default "http://localhost:8200".
"""
vaultAddr: String

"""
Path to the Vault RoleID file.
"""
vaultRoleIDFile: String

"""
Path to the Vault SecretID file.
"""
vaultSecretIDFile: String

"""
Vault kv store path where the key lives. Default "secret/data/dgraph".
"""
vaultPath: String

"""
Vault kv store field whose value is the key. Default "enc_key".
"""
vaultField: String

"""
Vault kv store field's format. Must be "base64" or "raw". Default "base64".
"""
vaultFormat: String

"""
Access key credential for the destination.
"""
accessKey: String

"""
Secret key credential for the destination.
"""
secretKey: String

"""
AWS session token, if required.
"""
sessionToken: String

"""
Set to true to allow backing up to S3 or Minio bucket that requires no credentials.
"""
anonymous: Boolean

"""
All the backups with num >= incrementalFrom will be restored.
"""
incrementalFrom: Int

"""
If `isPartial` is set to true then the cluster is kept in draining mode after
restore to ensure that the database is not corrupted by any mutations or tablet moves in
between two restores.
"""
isPartial: Boolean

}

Restore requests return immediately without waiting for the operation to complete.

Incremental Restore

Use incremental restore to restore a set of incremental backups on a cluster that has already been partially restored. The cluster enters draining mode during this process, which prevents mutations. Only admin requests to return the cluster to normal mode are accepted while in draining mode.

important

Before starting an incremental restore, ensure you set isPartial to true in your initial restore operation.

To perform an incremental restore, execute a mutation on the /admin endpoint:

mutation{
restore(input:{
incrementalFrom:"incremental_backup_from",
location: "/path/to/backup/directory",
backupId: "id_of_backup_to_restore"'
}){
message
code
}
}

Namespace-Aware Restore

Use namespace-aware restore to restore a single namespace from a backup that contains multiple namespaces. The restored data is available in the default namespace. For example, if you restore namespace 2 using the restoreTenant API, after the restore operation completes, the cluster contains only the default namespace with data from namespace 2. Namespace-aware restore supports incremental restore.

To perform a namespace-aware restore, execute a mutation on the /admin endpoint:

mutation {
restoreTenant(
input: {
restoreInput: {
incrementalFrom: "incremental_backup_from"
location: "/path/to/backup/directory"
backupId: "id_of_backup_to_restore"
}
fromNamespace: namespaceToBeRestored
}
) {
message
code
}
}

The RestoreTenantInput type defines the input parameters:

input RestoreTenantInput {
"""
restoreInput contains fields that are required for the restore operation,
i.e., location, backupId, and backupNum
"""
restoreInput: RestoreInput

"""
fromNamespace is the namespace of the tenant that needs to be restored into namespace 0 of the new cluster.
"""
fromNamespace: Int!
}

Offline Restore (Deprecated)

warning

Offline restore is deprecated. Use online restore instead.

The dgraph restore command is a standalone tool that restores the postings directory from a previously created backup to a directory in the local filesystem. This command restores a backup to a new Dgraph cluster and is not designed to restore to a currently running cluster. During a restore operation, a temporary Dgraph Zero server may run to fully restore the backup state.

Use the --encryption key-file=value flag to decrypt encrypted backups. The specified file must contain the same key used for encryption during backup. Starting with v20.07.0, you can use the vault superflag to restore encrypted backups.

Command Options

  • --location (-l): Specifies the source URI containing Dgraph backup objects. Supports all backup storage schemes.

  • --postings (-p): Sets the directory where restored posting directories are saved. This directory contains a posting directory for each group in the restored backup.

  • --zero (-z): Specifies a Dgraph Zero server address to update the start timestamp and UID lease using the restored version. If not specified, the command requires --force_zero=false, and you must manually update the timestamp and UID lease using the Dgraph Zero server's HTTP assign endpoint. Use the values printed at the end of the command output.

  • --backup_id: Specifies the ID of the backup series to restore. A backup series consists of a full backup and all incremental backups built on top of it. Each new full backup starts a new backup series with a different ID. The backup series ID is stored in each manifest.json file in each backup folder.

  • --encryption key-file=value: Required when restoring a backup from an encrypted cluster. The file path must point to the same key file used to run the original cluster.

  • --vault superflag: Specifies the HashiCorp Vault server address (addr), role ID file (role-id-file), secret ID file (secret-id-file), and the field containing the encryption key (enc-field) used to encrypt the backup.

The restore operation creates a cluster structure with as many groups as the original cluster had at the time of the last backup. For each group, dgraph restore creates a posting directory (p<N>) that corresponds to the backup group ID. For example, a backup for Dgraph Alpha group 2 (named .../r32-g2.backup) is loaded to posting directory p2.

After running the restore command, manually copy the directories from the postings directory to the machines or containers running the Dgraph Alpha servers before starting dgraph alpha. For example, in a cluster with two Dgraph Alpha groups and one replica each, copy p1 to the first Alpha node and p2 to the second Alpha node.

By default, Dgraph looks for a posting directory named p. Rename the directories after moving them, or use the -p option of the dgraph alpha command to specify a different path.

Restore from Amazon S3

dgraph restore --postings "/var/db/dgraph" --location "s3://s3.<region>.amazonaws.com/<bucketname>"

Restore from MinIO

dgraph restore --postings "/var/db/dgraph" --location "minio://127.0.0.1:9000/<bucketname>"

Restore from Local Directory or NFS

dgraph restore --postings "/var/db/dgraph" --location "/var/backups/dgraph"

Restore and Update Timestamp

Specify the Zero server address and port for the new cluster with --zero/-z to update the timestamp.

dgraph restore --postings "/var/db/dgraph" --location "/var/backups/dgraph" --zero "localhost:5080"