Converting ER diagrams to property graph models step by step

You have a relational Entity-Relationship (ER) diagram — tables, primary keys, foreign keys, and junction tables — and you want a repeatable procedure that turns it into a native Neo4j property graph without hand-improvising the shape table by table. The two models pull in opposite directions: a relational schema optimizes for normalization, referential integrity, and set-based joins, while Neo4j optimizes for traversal locality, relationship-first querying, and flexible property attachment. A mechanical, one-to-one translation — one table becomes one node, every foreign key becomes a lookup node — reproduces the join-heavy shape of the source and throws away exactly the adjacency advantage you migrated for. This page gives you a deterministic conversion path: five ordered decisions, each with the Cypher or Python driver code that applies it, so the same ER diagram always yields the same graph and the migration stays version-controlled and reviewable.

Prerequisites

Neo4j 5.x reachable over Bolt, with the neo4j Python driver v5+ installed (pip install "neo4j>=5").
The source ER diagram (or live schema) in hand: table names, primary keys, foreign keys, and every junction/associative table identified.
A first-pass node label taxonomy decided, so each retained table maps to a stable domain label rather than one invented mid-load.
A relationship cardinality and direction policy, since every foreign key carries both a direction and a multiplicity that must survive the crossing.
Permission to CREATE CONSTRAINT on the target database (the conversion anchors every node key on a uniqueness constraint before loading edges).

Step 1: Decompose entities into a label taxonomy

Map each relational table that models a real domain entity to a node label. Primary keys become immutable node properties (typically id or uuid), while foreign keys are removed from node properties — they are not data on the node, they are edges waiting to be materialized. Apply your node label taxonomy here rather than deferring it: a frequent root cause of degraded traversal performance is over-segmentation, assigning a distinct label to every minor subtype. Use a base label for shared traversal patterns (for example Customer) and reserve subtype labels only when index routing, constraint enforcement, or a security boundary genuinely diverges.

Audit the result before you commit to it. CALL db.schema.nodeTypeProperties() reports label distribution and property heterogeneity; if one label holds the bulk of the node population with wildly divergent property sets, model the variation with a type property instead of extra labels to keep planner statistics accurate and avoid unnecessary label scans. Anchor every entity on a uniqueness constraint so later MATCH steps resolve through a range index:

cypher

CREATE CONSTRAINT customer_uuid_unique IF NOT EXISTS
FOR (c:Customer) REQUIRE c.uuid IS UNIQUE;

CREATE INDEX customer_region_idx IF NOT EXISTS
FOR (c:Customer) ON (c.region);

Avoid attaching sparse, subtype-specific properties to a high-cardinality base label. When property density on a candidate node drops below roughly 30%, that is a signal to split it into a separate node or fold it into a relationship — the same judgement covered under property graph anti-patterns.

Step 2: Resolve cardinality and pick a direction

Relational cardinality (1:1, 1:N, M:N) does not translate directly to graph semantics. In Neo4j every relationship is directed, and the direction dictates traversal efficiency, index utilization, and Cypher readability. The single most common conversion failure is preserving a junction table as an intermediate node instead of collapsing it into a direct edge. An M:N junction table must become one relationship type carrying the junction’s own columns as properties — not a (:NodeA)-[:LINKS]->(:Junction)-[:LINKS]->(:NodeB) chain that doubles hop count on every query.

The diagram below shows a relational junction table collapsing into a single directed graph relationship.

Choose direction by the dominant read path, following your relationship cardinality and directionality policy. (:Employee)-[:MANAGES]->(:Department) is optimal when queries originate from employee context; if department-centric aggregation dominates, point it the other way. Misaligned direction forces the planner into expensive undirected expansion — root-cause analysis of a slow MATCH (a)-[r]-(b) almost always reveals a missing directional hint or inverted relationship semantics. Collapse the junction with an idempotent MERGE so re-runs converge instead of duplicating edges:

cypher

// Collapse a junction table into a direct, property-carrying relationship
UNWIND $batch AS row
MERGE (a:Student {uuid: row.student_id})
MERGE (b:Course  {uuid: row.course_id})
MERGE (a)-[r:ENROLLED_IN]->(b)
  ON CREATE SET r.since = date(row.enrolled_on),
                r.grade = row.grade,
                r.status = 'active';

Step 3: Map relational types to native property types

Relational type systems (VARCHAR, DECIMAL, TIMESTAMP) need explicit mapping to Neo4j’s native property types, a decision covered in depth under graph data type selection. Graph storage rewards homogeneous, frequently accessed scalar properties on nodes and edges, and punishes large text blobs, binary payloads, and highly variable JSON crammed onto a node.

VARCHAR / TEXT → STRING (keep under ~10 KB; offload documents to object storage and store a reference).
INT / BIGINT → INTEGER (Neo4j integers are 64-bit signed natively).
DECIMAL / NUMERIC → INTEGER in minor units for currency, to avoid floating-point precision drift; FLOAT only for genuinely approximate quantities.
TIMESTAMP → DATETIME or LOCALDATETIME, aligned with the Neo4j Cypher Manual temporal functions.

PROFILE your hottest read against the migrated shape. If a NodeByLabelScan or RelationshipByTypeScan shows a high Rows count, back the filtered property with a range index rather than storing arrays of primitives that a dedicated relationship type could traverse and aggregate natively.

Step 4: Load through the Python driver in bounded batches

Drive the conversion with parameterized UNWIND batches through a managed write transaction. Consuming the result inside the transaction function matters: the cursor is invalid once the managed transaction commits. Sizing the commit unit is a batch-processing workflow concern — 500–2,000 rows for property-rich or multi-MERGE statements, up to 10,000 for simple scalar sets.

python

from neo4j import GraphDatabase

def load_entities(uri, auth, batch):
    # Context-manager session ensures the driver is closed on exit.
    with GraphDatabase.driver(uri, auth=auth) as driver:
        with driver.session(database="graph_prod") as session:
            def _load(tx):
                result = tx.run(
                    """
                    UNWIND $rows AS row
                    MERGE (n:Customer {uuid: row.uuid})
                    SET n += row.properties
                    RETURN count(n) AS loaded
                    """,
                    rows=batch,
                )
                # Read the record before the managed transaction commits.
                return result.single()["loaded"]

            return session.execute_write(_load)

For enterprise-scale conversions, partition by tenant, domain boundary, or temporal window into separate databases in one cluster — Neo4j has no table partitioning, so graph partitioning strategies rely on multi-database routing and index locality. Route each write with driver.session(database="finance_domain") and map relational row-level security onto label- and relationship-scoped GRANT READ / GRANT WRITE.

Step 5: Version the schema and track lineage

A relational migration is often a one-time event; a graph model needs iterative refinement, so treat the target schema as versioned from day one. This is where ongoing schema evolution and versioning begins: tag each deployment with a metadata node and record where converted data came from.

cypher

MERGE (v:SchemaVersion {version: '2.1.0'})
  ON CREATE SET v.deployed_at = datetime();

// Attach lineage so an ER-diagram change downstream has a traceable path
MATCH (c:Customer)
MERGE (c)-[:DERIVED_FROM]->(:SourceTable {name: 'customers', system: 'postgres'});

Neo4j has no ALTER CONSTRAINT or ALTER INDEX. To change an index type, create the new index, wait for it to reach ONLINE (SHOW INDEXES YIELD name, state), then drop the old one — never drop a live constraint before dependent Cypher and drivers have been updated.

Validation & verification

Confirm the conversion reconciled against the source before you cut traffic over. Three checks catch the common failures:

cypher

// 1. Every node key is present and unique — zero rows means no collisions slipped in.
MATCH (c:Customer)
WITH c.uuid AS id, count(*) AS n
WHERE n > 1
RETURN id, n;

// 2. Per-type edge counts should equal the source's non-null foreign-key counts.
MATCH ()-[r:ENROLLED_IN]->()
RETURN count(r) AS enrolled_edges;

// 3. Confirm the read path uses an index seek, not a label scan.
PROFILE
MATCH (c:Customer {uuid: $uuid})-[:ENROLLED_IN]->(course)
RETURN course.title;

A healthy PROFILE shows a NodeUniqueIndexSeek on Customer(uuid) feeding a directed Expand(Into) on ENROLLED_IN, with DbHits proportional to matched rows rather than to total label population. If you instead see NodeByLabelScan, the uniqueness constraint from Step 1 either was never created or has not reached ONLINE.

Edge cases & gotchas

Composite primary keys. A table keyed on two columns cannot anchor on a single scalar. Either synthesize a deterministic uuid from the tuple (so re-runs are stable) or create a node-key constraint over both properties: CREATE CONSTRAINT enrollment_key IF NOT EXISTS FOR (n:Enrollment) REQUIRE (n.student_id, n.course_id) IS NODE KEY;. Deriving the key non-deterministically breaks idempotency and duplicates nodes on replay.
Self-referential foreign keys. A manager_id pointing back into the same employee table becomes a relationship between two nodes of the same label. MERGE both endpoints with the same label and guard against a row where employee_id = manager_id, which would otherwise create a self-loop the source never intended.
Nullable foreign keys. A null FK is the absence of an edge, not an edge to a null node. Filter WHERE row.fk IS NOT NULL before the MERGE, or the loader silently anchors phantom endpoints with null keys that collide under the uniqueness constraint and abort the batch.

Parent context and next step

This conversion is one concrete task within Relationship Cardinality & Directionality, which sets the broader rules — direction as a planner pruning hint, 1:1/1:N/M:N enforcement, and dense-node fan-out limits — that Steps 1 and 2 apply to a whole ER diagram at once.

Up: Relationship Cardinality & Directionality — the direction and multiplicity rules this procedure enforces edge by edge.
Node Label Taxonomy Design — how to decide the labels Step 1 decomposes tables into.
Relational Schema Mapping Strategies — the full mapping contract for junction tables, composite keys, and nullable columns across an entire schema.
Property Graph Anti-Patterns — the modeling mistakes a mechanical table-to-node conversion tends to reproduce.