Migrating PostgreSQL foreign keys to Neo4j relationships automatically

You have a normalized PostgreSQL schema whose associations live in foreign-key columns, and you want a repeatable pipeline that turns every one of those keys into a typed, directed Neo4j relationship without hand-writing Cypher per table. The task sounds syntactic but is not: PostgreSQL enforces referential integrity at write time through B-tree indexes and constraint triggers, whereas Neo4j materializes associations as native adjacency between nodes that must already exist before an edge can attach. Drive a naive row-by-row CREATE at this mismatch and you get full label scans, transaction-log overflow, and silently orphaned relationships. This page gives you a deterministic, driver-based conversion — introspect the foreign keys, anchor the endpoints, load edges in bounded batches, and reconcile the result — so the same run always produces the same graph. It is the referential-integrity half of Relational Schema Mapping Strategies.

Prerequisites

Neo4j 5.x reachable over Bolt, with the neo4j Python driver v5+ installed (pip install "neo4j>=5").
Read access to the source PostgreSQL information_schema for foreign-key discovery.
A settled node label taxonomy so each source table maps to a stable domain label rather than one invented at load time.
A relationship cardinality and direction policy, since a foreign key carries both direction and multiplicity that must survive the crossing.
Uniqueness constraints planned for every node key you will MATCH on (created in the anchoring step below).

Why direct FK translation fails

Foreign keys in PostgreSQL are declarative pointers; a relationship in Neo4j is a first-class entity that requires both endpoints to exist before it can be attached. The diagram below contrasts a PostgreSQL foreign key with the equivalent directed Neo4j relationship.

Four failure modes recur in automated conversions, and the implementation below is structured to neutralize each one:

Missing anchor constraints. Without CREATE CONSTRAINT ... REQUIRE n.property IS UNIQUE, every MATCH during edge creation performs an O(N) label scan, collapsing throughput from tens of thousands of edges per second to a few hundred.
Nullable and polymorphic FKs. PostgreSQL permits NULL foreign keys and CHECK-guarded polymorphic references. Mapping these blindly produces edges to a null endpoint or a single overloaded relationship type where distinct types belong — a property-graph anti-pattern that is expensive to unwind later.
Transaction-boundary bloat. Loading 10M+ relationships in one transaction exhausts the Neo4j transaction log and triggers a JVM OutOfMemoryError.
Orphaned references. Rows whose parent was deleted leave dangling FKs; an unguarded MERGE will silently invent the missing endpoint instead of failing loudly.

Step 1 — Introspect the foreign keys

Query the source catalog to extract every foreign key deterministically, so the mapping is generated from the database rather than hand-maintained:

sql

SELECT
    tc.table_name  AS source_table,
    kcu.column_name AS source_col,
    ccu.table_name AS target_table,
    ccu.column_name AS target_col,
    tc.constraint_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
  ON tc.constraint_name = kcu.constraint_name
  AND tc.table_schema = kcu.table_schema
JOIN information_schema.constraint_column_usage AS ccu
  ON ccu.constraint_name = tc.constraint_name
  AND ccu.table_schema = tc.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY';

In Python, fold this result into a mapping keyed by source table — {source_table: (target_label, target_fk_property, relationship_type)} — and collect the source rows per table as {source_table: [rows]}. Assign the relationship type from your cardinality policy, not from the column name: a user_id FK on orders becomes :PLACED_BY, not :USER_ID. When the export also carries semi-structured payloads, resolve them first with the JSON document flattening contract so nested arrays are normalized into flat rows before FK resolution runs.

Step 2 — Anchor the endpoints

Before creating a single edge, give every label a uniqueness constraint. Neo4j’s planner needs the backing range index to resolve each MATCH in O(log N) instead of scanning the label:

cypher

CREATE CONSTRAINT user_id_unique IF NOT EXISTS
  FOR (u:User) REQUIRE u.id IS UNIQUE;
CREATE CONSTRAINT order_id_unique IF NOT EXISTS
  FOR (o:Order) REQUIRE o.id IS UNIQUE;

The IF NOT EXISTS guard keeps this step idempotent, so re-running the pipeline never errors on an existing constraint. Index population is asynchronous — confirm every index is live before loading:

cypher

SHOW INDEXES YIELD name, state WHERE state <> 'ONLINE';

An empty result means the graph is ready. Skip this gate and the loader falls back to full label scans, silently erasing the throughput the whole pipeline depends on.

Step 3 — Load edges with bounded batches

This is the core of the conversion. Relationship ingestion must respect Neo4j’s transactional memory limits, so drive it with a parameterized UNWIND and strict chunking (5,000–10,000 rows per transaction). Only the label and type names are interpolated into the query text; every data value travels through the $batch parameter, which keeps the plan cached and closes the door on Cypher injection.

python

from neo4j import GraphDatabase

CHUNK_SIZE = 5000

def load_relationships(uri, user, password, fk_mapping, source_rows_by_table):
    # Context-managed driver: closes the pool cleanly even on exception.
    with GraphDatabase.driver(uri, auth=(user, password)) as driver:
        with driver.session() as session:
            for table, (target_label, target_col, rel_type) in fk_mapping.items():
                rows = source_rows_by_table[table]
                # Split into bounded commit units so the tx log never overflows.
                chunks = [rows[i:i + CHUNK_SIZE]
                          for i in range(0, len(rows), CHUNK_SIZE)]

                for chunk in chunks:
                    # Backtick-quote dynamic label/type names; keep data in $batch.
                    query = f"""
                    UNWIND $batch AS row
                    MATCH (s:`{table}`        {{id: row.source_id}})
                    MATCH (t:`{target_label}` {{id: row.{target_col}}})
                    MERGE (s)-[r:`{rel_type}`]->(t)
                    ON CREATE SET r.created_at = timestamp()
                    """
                    # execute_write auto-retries TransientError / ServiceUnavailable.
                    session.execute_write(
                        lambda tx, q=query, c=chunk: tx.run(q, batch=c)
                    )

Two design choices make this safe to re-run. MATCH ... MATCH (rather than MERGE on the endpoints) means the loader never invents a missing node — a row pointing at a deleted parent produces zero edges instead of a phantom one. And MERGE on the relationship with an ON CREATE SET guard makes the write converge: replaying a chunk yields exactly one edge and never overwrites its original timestamp. This is the same idempotent MERGE discipline the rest of the pipeline relies on. For sizing the commit unit against your heap and edge cardinality, see batch-processing and chunking workflows.

Validation & verification

Two checks bracket the load. Before migrating, find orphaned foreign keys in the source so you can quarantine them rather than discover them mid-run:

sql

SELECT child.id, child.fk_col
FROM child_table child
LEFT JOIN parent_table parent ON child.fk_col = parent.id
WHERE child.fk_col IS NOT NULL AND parent.id IS NULL;

After the load, reconcile the graph against the source. The relationship count per type must equal the count of non-null FK values that had a resolvable parent:

cypher

MATCH (:Order)-[r:PLACED_BY]->(:User)
RETURN count(r) AS placed_by_edges;

Compare that to SELECT count(*) FROM orders WHERE user_id IS NOT NULL minus the quarantined orphans. A PROFILE of the loader query should show a NodeUniqueIndexSeek on both MATCH operators — if you see NodeByLabelScan instead, the anchoring step in Step 2 did not take, and you should stop and rebuild the constraints before continuing. Wiring these comparisons into an automated reconciliation job is the job of data validation and integrity checks.

Edge cases & gotchas

Nullable foreign keys. Filter them out in Python before the batch is sent — a null target_col makes the second MATCH fail to bind and quietly drops the row, so the absence should be explicit, not incidental. In a property graph, a missing association is the absence of an edge, never an edge to null.

python

chunk = [r for r in chunk if r.get(target_col) is not None]

Polymorphic references. A commentable_type / commentable_id pair must not collapse into one relationship type. Branch on the discriminator and emit concrete types — :COMMENTED_ON_POST, :COMMENTED_ON_PHOTO — each with its own anchored target label, so traversals stay type-selective.

Self-referencing FKs. An employees.manager_id → employees.id key maps both endpoints to the same label. It works with the loader as written, but guard against cycles where the domain forbids them (an org chart, a category tree) before you rely on variable-length traversal; the acyclicity rules live in Relationship Cardinality & Directionality.

Parent context

Foreign-key-to-relationship conversion is one concrete task inside Relational Schema Mapping Strategies, which sets the broader rules — junction tables, composite keys, and nullable columns — for translating a whole relational schema into a Neo4j property graph, and which feeds every downstream stage of the automated migration pipeline.

Up: Relational Schema Mapping Strategies — the full mapping contract this FK conversion is one rule of.
Implementing Idempotent Migration Scripts for Neo4j — the deterministic-key and bounded-transaction discipline that keeps this loader replay-safe.
Batch Processing & Chunking Workflows — sizing the commit units the loader writes in.
Data Validation & Integrity Checks — proving edge counts match the source after cutover.