How to model hierarchical data in Neo4j without cycles

You are loading a hierarchy — an organizational chart, a bill-of-materials, a product taxonomy, a directory tree — into Neo4j, and you need the result to be a provable tree or directed acyclic graph (DAG): every ancestor query terminates, shortestPath never loops, and no application recursion runs away. This page shows exactly how to guarantee that: model the hierarchy with a single-direction relationship, back node identity with a uniqueness constraint, run a pre-commit cycle check in the Python driver before each edge is written, and detect-and-break any cycles that a prior unsafe import already committed. Neo4j has no native “no cycles” or “single parent” constraint, so the acyclic invariant is enforced at the ingestion layer — deliberately, not by hope.

Prerequisites

Neo4j 5.x with the neo4j Python driver 5.x installed (pip install "neo4j>=5,<6").
A stable business key per hierarchy node — a code, sku, or uuid drawn from your node label taxonomy, never the internal element id, which is not stable across a re-run.
One relationship type reserved for the hierarchy with a fixed direction policy from your relationship cardinality and directionality rules — this page uses (child)-[:CHILD_OF]->(parent).
Permission to run DDL (CREATE CONSTRAINT) and a verified pre-load snapshot taken with neo4j-admin database dump.
Records already shaped so every row carries both its own key and its parent key (or null for a root).

Why cycles appear in the first place

A cycle is never authored on purpose; it arrives through the import. Three sources account for almost all of them:

Bidirectional edge writing. A script that writes both (a)-[:CHILD_OF]->(b) and (b)-[:CHILD_OF]->(a) — often because it “helpfully” mirrors every relationship — turns a tree into a graph with a 2-node loop. Storing hierarchy edges in exactly one direction is the single most effective prevention.
Legacy relational artifacts. A self-referencing parent_id column that was hand-corrected over the years frequently contains a row whose ancestor chain points back at itself. A naive extract carries that loop straight into the graph. Cleaning it belongs to your relational schema mapping step, but the graph load must still refuse to trust it.
Unvalidated bulk MERGE. MERGE with no acyclicity check writes whatever edge it is handed. The damage is silent until a variable-length [:CHILD_OF*] query or a Graph Data Science path algorithm walks the loop and consumes heap until the transaction times out.

The fix has two layers: a fixed-direction relationship plus a uniqueness constraint at the database, and a pre-commit path check in the driver that rejects any edge that would close a loop.

Set up the constraint

Create the node-identity constraint before the first edge is written. It does not by itself forbid cycles — Neo4j has no relationship-cardinality or acyclicity constraint — but it makes MERGE on the key index-backed, guarantees one node per identity, and gives the cycle check below a fast, seekable anchor.

cypher

CREATE CONSTRAINT category_code_unique IF NOT EXISTS
FOR (n:Category) REQUIRE n.code IS UNIQUE;

For a strict tree, “each node has at most one parent” is also not expressible as a native constraint. Enforce it in the same ingestion layer that runs the cycle check — one MERGE on the CHILD_OF edge per node, plus a guard that a second, different parent is never written.

Core implementation

The prevention strategy is one function: before writing (child)-[:CHILD_OF]->(parent), verify that parent is not already a descendant of child. If it is, the edge would close a cycle, so it is rejected before it can be committed. The comments mark the three decisions that make this correct.

python

from neo4j import GraphDatabase

# 1) DIRECTION IS FIXED. Every hierarchy edge is (child)-[:CHILD_OF]->(parent),
#    written once, never mirrored. A single direction is what lets an ancestor
#    walk and a descendant walk be two distinct, terminating traversals.

def would_create_cycle(tx, child_code, parent_code):
    # Adding (child)-[:CHILD_OF]->(parent) closes a loop IFF parent is already a
    # descendant of child — i.e. a path parent ->..-> child already exists. The
    # constraint-backed seek on :Category(code) makes this check index-fast.
    result = tx.run(
        """
        MATCH (p:Category {code: $parent})-[:CHILD_OF*]->(c:Category {code: $child})
        RETURN count(*) > 0 AS creates_cycle
        """,
        parent=parent_code, child=child_code,
    )
    record = result.single()
    return bool(record and record["creates_cycle"])

def link_child_to_parent(driver, child_code, parent_code):
    if child_code == parent_code:
        raise ValueError(f"Self-loop rejected: {child_code}")
    with driver.session(database="neo4j") as session:
        # 2) CHECK IN A READ TX, WRITE IN A WRITE TX. The read proves the edge is
        #    safe; the write commits it. Both run through the same session so the
        #    driver's retry logic wraps each managed transaction independently.
        if session.execute_read(would_create_cycle, child_code, parent_code):
            raise ValueError(f"Cycle rejected: {child_code} -> {parent_code}")

        # 3) MERGE, NOT CREATE, on the edge. A replayed load converges on one
        #    parent instead of fanning out extra edges; combined with the unique
        #    node key this keeps the whole operation idempotent on re-run.
        session.execute_write(lambda tx: tx.run(
            """
            MATCH (c:Category {code: $child}), (p:Category {code: $parent})
            MERGE (c)-[:CHILD_OF]->(p)
            """,
            child=child_code, parent=parent_code,
        ).consume())

Two details are easy to get wrong. The check must run against committed state in a transaction that precedes the write; folding it into the write transaction with OPTIONAL MATCH re-introduces the race the constraint was meant to close under concurrency, so serialize edge writes per subtree or load the hierarchy top-down (parents before children), which makes a descendant-of-child path impossible by construction. And the check-then-write window means concurrent writers on overlapping subtrees can still interleave; if you fan the load out, route each root’s subtree to a single worker so no two writers touch the same ancestor chain. Keeping every write a MERGE on stable keys is the same idempotency contract described in implementing idempotent migration scripts for Neo4j.

The diagram below contrasts a valid acyclic tree with the forbidden edge that the check rejects.

Validation & verification

Never trust a hierarchy load blind. Confirm the constraint is live, then prove no cycle and no multi-parent node survived.

First, prove the identity constraint is index-backed. A null ownedIndex means MERGE was scanning, not seeking, and the cycle check ran slow:

cypher

SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties, ownedIndex
WHERE type = 'UNIQUENESS';

Second, prove the graph is acyclic. This returns the shortest cycles first, so it stays cheap even on a large hierarchy — a healthy load returns zero rows:

cypher

MATCH path = (n:Category)-[:CHILD_OF*1..]->(n)
RETURN [x IN nodes(path) | x.code] AS loop, length(path) AS cycle_length
ORDER BY cycle_length ASC
LIMIT 10;

Third, prove the single-parent invariant if you modeled a strict tree. Any row here is a node with two or more parents that leaked past the guard:

cypher

MATCH (c:Category)-[:CHILD_OF]->(p:Category)
WITH c.code AS node, count(p) AS parents
WHERE parents > 1
RETURN node, parents ORDER BY parents DESC;

If cycles did slip through from an earlier unsafe run, break them at the deepest safe point. Delete the single edge that closes each loop — the one whose child can already reach its parent by another path — and log every deletion for audit:

cypher

MATCH (child:Category)-[r:CHILD_OF]->(parent:Category)
WHERE (parent)-[:CHILD_OF*1..]->(child)
WITH r, child, parent LIMIT 1
DELETE r
RETURN child.code AS removed_child, parent.code AS from_parent;

Run it repeatedly (one edge per pass) until the acyclicity query returns empty, validating the business rule for each removed edge before committing it. Finally, EXPLAIN the cycle check and confirm the plan opens with a NodeUniqueIndexSeek on :Category(code), not a NodeByLabelScan — a scan is the fingerprint of a missing constraint.

Edge cases & gotchas

1. A legitimate DAG needs multiple parents — do not force a tree. A bill-of-materials where one bolt belongs to many assemblies is a DAG, not a tree: several parents are correct, cycles are still forbidden. Drop the single-parent guard but keep the exact same would_create_cycle check — acyclicity is the only invariant a DAG must hold. Storing a materialized depth or slash-delimited path property to “avoid” traversal here is a property graph anti-pattern: those values silently rot the moment a subtree is re-parented. Compute depth with length(path) on demand instead.

2. Self-loops bypass a variable-length check written as *1... A one-node cycle (n)-[:CHILD_OF]->(n) is created when a row lists itself as its own parent. The child_code == parent_code guard in the core function rejects it at write time; to sweep existing data, MATCH (n:Category)-[r:CHILD_OF]->(n) DELETE r.

3. Re-parenting under concurrent readers. Moving a subtree is delete-then-create, and a reader that traverses mid-move can momentarily see the node under both the old and new parent. Wrap the unlink and relink in one write transaction so the change is atomic, and version the move with valid_from / valid_to timestamps if consumers need point-in-time history — the temporal pattern covered under schema evolution and versioning.

Parent context

This task is one modeling decision within Node Label Taxonomy Design under the Neo4j Graph Schema Design & Architecture pillar — reach for it whenever a hierarchy must be a provable tree or DAG rather than an arbitrary graph.

Up: Node Label Taxonomy Design — how stable labels and keys give the acyclic check a fast, seekable anchor.
Relationship Cardinality & Directionality — the single-direction edge policy that makes ancestor and descendant walks terminate.
Property Graph Anti-Patterns — why materialized depth and path-string properties rot and should be computed on demand.
Schema Evolution & Versioning — temporal valid_from / valid_to modeling for safe subtree re-parenting.
Implementing idempotent migration scripts for Neo4j — the upsert discipline that keeps a replayed hierarchy load convergent.