Implementing idempotent migration scripts for Neo4j

You are writing a migration that copies relational rows or JSON documents into Neo4j, and you need it to be safe to run more than once — because CI/CD retries it, because a network partition kills it mid-load, or because you deliberately re-run the same batch to pick up corrected source data. An idempotent script is one where the second, third, and hundredth execution converge on exactly the same graph as the first: no duplicate nodes, no conflicting relationships, no constraint violations. This page shows the three properties that make a script idempotent — a deterministic merge key, a bounded transaction scope, and a checksum you can both audit and roll back by — and gives you a single focused Python driver implementation you can adapt directly. It is the write discipline that everything in Error Handling & Rollback Mechanisms depends on.

Prerequisites

Neo4j 5.x with the neo4j Python driver 5.x installed (pip install "neo4j>=5,<6").
A uniqueness constraint on your merge key created before the first load — an idempotent MERGE is only race-safe when the constraint’s backing index is ONLINE.
A stable business key per record drawn from your node label taxonomy — never the internal element id, which is not stable across a re-run.
Source records already shaped by Relational Schema Mapping Strategies or JSON Document Flattening & Graph Conversion, so every row carries a populated key.
Permission to run DDL (CREATE CONSTRAINT) and a pre-load snapshot taken with neo4j-admin database dump.

Why migrations drift on re-run

The most common failure mode is treating CREATE as an upsert, or applying MERGE without an explicit, indexed match key. In a typical relational-to-graph conversion, developers flatten a JSON payload or map foreign keys straight into CREATE statements. When a batch of 50,000 records hits a transient network partition at record 42,311, the transaction rolls back. A naive retry reprocesses the whole chunk and duplicates the first 42,311 records. If constraints were only applied after the load, those duplicates bypass validation until a final integrity check fails — by which point the graph is already corrupt.

Root-cause analysis of drifting migrations points to three architectural gaps, and the fix for each is a property your script must hold:

Missing deterministic merge keys. Anchoring MERGE on composite or mutable properties produces phantom duplicates. Idempotency requires matching on one stable key that never changes across re-runs.
Unbounded transaction scopes. Loading millions of rows in a single transaction exhausts heap and forces a full rollback on any failure. Recovery granularity can never be finer than your commit boundary, so scope must be bounded and aligned with your batch-processing workflow.
No idempotency guard at the driver layer. Application-level retries without a payload checksum or transactional boundary create race conditions; the retry can re-run work a sibling already committed.

Core implementation

The whole discipline fits in one script: create the constraint, tag every write with a deterministic checksum, and issue the load as a bounded UNWIND … MERGE through a driver-managed write transaction. The comments mark the three load-bearing decisions.

python

import hashlib
from neo4j import GraphDatabase

# 0) CONSTRAINT FIRST. The uniqueness constraint (and its backing RANGE index)
#    must be ONLINE before the first MERGE. It gives MERGE an index-backed write
#    lock, so a re-run seeks the existing node instead of racing into a CREATE.
CONSTRAINT = """
CREATE CONSTRAINT entity_id_unique IF NOT EXISTS
FOR (n:Entity) REQUIRE n.id IS UNIQUE
"""

# 1) DETERMINISTIC MERGE KEY. MERGE anchors on n.id ONLY — a stable business key,
#    never a mutable attribute. ON CREATE and ON MATCH both apply the same
#    property set, so the node's end state is identical whether this is the first
#    execution or the fifth. That convergence IS idempotency.
# 2) CHECKSUM TAG. Every node written in this run carries the run's payload
#    checksum. It scopes validation and — critically — rollback to exactly the
#    rows this execution touched, without a MATCH (n) DELETE n sledgehammer.
BATCH = """
UNWIND $chunk AS row
MERGE (n:Entity {id: row.id})
ON CREATE SET n += row.properties, n.migration_checksum = $checksum
ON MATCH  SET n += row.properties, n.migration_checksum = $checksum
"""

def payload_checksum(chunk) -> str:
    # Deterministic over the chunk's content: the same rows always hash the same,
    # so a replayed chunk is recognisable and auditable after the fact.
    canonical = repr(sorted((r["id"], tuple(sorted(r["properties"].items())))
                            for r in chunk))
    return hashlib.sha256(canonical.encode()).hexdigest()

def load_chunk(session, chunk):
    checksum = payload_checksum(chunk)
    # 3) BOUNDED TRANSACTION + MANAGED RETRY. execute_write commits per chunk, so
    #    a failure rolls back only this chunk. It also retries transient errors by
    #    RE-RUNNING the whole function — which is safe precisely because MERGE is
    #    idempotent. Never put a bare CREATE inside a retried transaction function.
    session.execute_write(
        lambda tx: tx.run(BATCH, chunk=chunk, checksum=checksum).consume()
    )
    return checksum

def run_migration(uri, auth, chunks):
    with GraphDatabase.driver(uri, auth=auth) as driver:
        with driver.session(database="neo4j") as session:
            session.run(CONSTRAINT).consume()
            session.run("CALL db.awaitIndexes(300)").consume()  # gate on ONLINE
            for chunk in chunks:               # each chunk == one commit boundary
                load_chunk(session, chunk)

Three details are easy to get wrong. execute_write re-runs the entire lambda on a transient error, so the only safe write inside it is an idempotent MERGE — a CREATE here duplicates on every retry. The db.awaitIndexes gate is not optional: CREATE CONSTRAINT can return before its backing index is ONLINE on a non-empty database, and a MERGE issued in that window falls back to a label scan and can still race. And chunk size is your recovery granularity — smaller chunks mean more commits and faster recovery, but more round-trips; size them the same way you would for a batch-processing workflow.

The diagram below shows the idempotent retry and checkpoint flow for a single chunk.

A note on `CALL { … } IN TRANSACTIONS`

For very large offline loads you may prefer server-side batching with CALL { … } IN TRANSACTIONS, which commits in inner batches and applies native backpressure without the legacy APOC periodic procedures. It is idempotent-friendly with the same MERGE body, but it is an auto-commit clause: it cannot run inside an open explicit transaction, so it must be issued through session.run(...) directly, never inside execute_write. Reach for it when a single managed transaction per chunk is too coarse; keep the per-chunk execute_write pattern above when you want managed retries and precise per-chunk rollback.

Validation & verification

Idempotency is a claim you must prove, not assume. Enforce the merge key with a constraint up front, then verify the current run in isolation using its checksum.

First, confirm the constraint is live and index-backed — an absent row or null ownedIndex means MERGE was scanning, not seeking:

cypher

SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties, ownedIndex
WHERE type = 'UNIQUENESS';

Second, scope a count check to just this execution with the checksum, and surface any null-key rows that would have slipped past the merge:

cypher

MATCH (n:Entity) WHERE n.migration_checksum = $checksum
RETURN count(n) AS ingested_count,
       count(CASE WHEN n.id IS NULL THEN 1 END) AS null_key_violations;

The definitive idempotency test is behavioural: run the migration, record count { (n:Entity) }, run the identical batch again, and re-count. A truly idempotent script leaves the total unchanged on the second pass. Finally, run EXPLAIN over the load query and confirm the plan shows a NodeUniqueIndexSeek rather than a NodeByLabelScan with a Filter — a scan is the fingerprint of a missing or not-yet-online constraint. Structural reconciliation against the source belongs to Data Validation & Integrity Checks.

Edge cases & gotchas

1. MERGE anchored on a mutable property. If you merge on a value that changes between runs — a display name, a status, a derived label — the second run cannot find the first run’s node and creates a new one. This is a classic property graph anti-pattern. Anchor on the immutable key and set everything else:

cypher

// WRONG — name can change, so a re-run duplicates
MERGE (n:Entity {id: row.id, name: row.name})
// RIGHT — identity is the stable key; mutable props are applied after
MERGE (n:Entity {id: row.id}) SET n += row.properties

2. Relationships that orphan or duplicate. Merging an edge before both endpoints exist creates phantom nodes, and merging on the pattern without anchoring endpoints duplicates the edge on re-run. Always MERGE both endpoints on their stable keys first, then MERGE the relationship between the bound variables:

cypher

UNWIND $edges AS e
MERGE (a:Entity {id: e.from})
MERGE (b:Entity {id: e.to})
MERGE (a)-[:RELATED_TO]->(b)   // idempotent: one edge no matter how many replays

3. Rollback that cascades. When a chunk must be undone, MATCH (n) DELETE n risks removing relationships and neighbours the failed chunk never touched. Roll back by the checksum instead, so you delete only this run’s writes:

cypher

// Detach-delete ONLY the nodes this execution created, identified by its checksum
MATCH (n:Entity {migration_checksum: $checksum})
CALL (n) { DETACH DELETE n } IN TRANSACTIONS OF 10000 ROWS;

Parent context

Idempotent scripting is the foundation the rest of Error Handling & Rollback Mechanisms is built on — retries, dead-letter replay, and compensating rollback are all only safe when every write converges on re-run.

Up: Error Handling & Rollback Mechanisms — the transaction-boundary and recovery patterns this write discipline plugs into.
Resolving Duplicate Nodes During Parallel Batch Loads — what happens when idempotent MERGE meets concurrency, and how to remediate.
Batch Processing & Chunking Workflows — sizing the commit boundaries that set your rollback granularity.
Data Validation & Integrity Checks — proving the loaded graph matches the source after the run converges.