The methodology articles in this sub-series have used Neo4j as the working substrate. The choice is not accidental — Cypher is the most mature graph query language, the Neo4j Graph Data Science library has the broadest algorithm coverage, the APOC extension set covers most operational needs, and the developer-community resources are deeper than for any alternative. But for a mid-size DD practice committing to a multi-year graph build, the platform decision is more than a methodology preference. It is a total-cost-of-ownership (TCO) calculation across at least three operational tiers (small / medium / large), a build-versus-buy decision between self-hosted Neo4j and managed Aura Enterprise, an evaluation of when a graph workload outgrows what Neo4j can practically deliver, and a portability assessment for the inevitable case where one of those constraints shifts.

This article walks the decision framework. Neo4j Aura (managed Neo4j cloud service) versus self-hosted Neo4j (Community or Enterprise) — the operational-burden trade-off. Alternative graph databases — JanusGraph (Apache-licensed, distributed), TigerGraph (proprietary, query-language GSQL), Amazon Neptune (AWS-native, supports both property graphs and RDF), Memgraph (Cypher-compatible, in-memory), NebulaGraph (open-source, distributed). The portability question — Cypher is now an open standard under ISO/IEC 39075:2024 GQL; how much of the methodology survives a platform switch. The article delivers a recommendation framework, not a single-vendor verdict.

The author’s relationship to the substrate is disclosed at the outset: the methodology articles in this sub-series use Neo4j, the author has worked extensively with Neo4j in DD contexts, and the author has no commercial relationship with Neo4j or any other vendor discussed in this article. Recommendations are methodological, not commercial.

The platform-commitment problem

A multi-year graph build is a substantial commitment of methodology, training, tooling, and analyst muscle memory. Switching platforms after Year 2 of a build means re-implementing the algorithm library, retraining the analysts, rebuilding the operational tooling, and re-establishing the institutional confidence that the prior platform earned. The cost of a forced switch is high enough that the initial platform choice deserves careful evaluation, not a default.

Three constraints shift over time and force platform reconsideration. Graph scale — what fits comfortably in Neo4j Aura Professional at the start of Year 1 may not fit at the start of Year 3 as the graph accumulates data. Workload pattern — read-heavy analyst queries shift toward write-heavy nightly ingestion as the firm builds automated data feeds. Compliance / regulatory environment — data-residency requirements (EU, UK, specific U.S. state-level rules) may foreclose certain cloud-managed options that were viable at the start of the build.

The audit-defensibility dimension of platform choice deserves explicit note. Engagement reviewers and PCAOB inspectors expect the audit firm to have evaluated and selected its analytical-procedures tooling with documented rigor. A platform decision that reads “we picked Neo4j because the lead analyst was familiar with it” is harder to defend in inspection than “we evaluated Neo4j Aura, self-hosted Neo4j, JanusGraph, TigerGraph, and Amazon Neptune against [these specific criteria] and chose Neo4j Aura because [these specific reasons].” The framework in this article is the documentable evaluation.

Neo4j Aura vs self-hosted Neo4j

For most mid-size DD practices, the first major decision is managed-service (Neo4j Aura) versus self-hosted (Neo4j Community or Enterprise on the firm’s own infrastructure).

Aura advantages. Operational burden is the vendor’s. Backups, version upgrades, security patches, high-availability topology, monitoring infrastructure all come pre-built. The DevOps headcount cost is zero for the basic deployment; the small-to-medium tier scales without operational re-architecture; pricing is predictable monthly subscription.

Aura disadvantages. Per-relationship pricing at the larger tiers becomes meaningful — a 200M-relationship Aura Enterprise instance in the 2026 pricing landscape runs into the high four figures per month, an amount that justifies a self-hosted comparison. Data leaves the firm’s controlled infrastructure (encrypted at rest and in transit, but managed by Neo4j Inc., not the firm). Customization options are limited to what Aura exposes.

Self-hosted Neo4j advantages. Full control of the deployment environment. Data stays in the firm’s controlled infrastructure (on-premise or in the firm’s AWS / Azure / GCP account). At sufficient scale, cost crosses below Aura’s per-relationship pricing (typically around the 50M-100M relationship mark, workload-dependent). Customization is unconstrained — APOC procedures can be extended, GDS algorithms can be supplemented with custom plugins.

Self-hosted Neo4j disadvantages. Operational burden is the firm’s. A DevOps allocation of at least 0.25 FTE for the basic self-hosted deployment is realistic, scaling to 1.0 FTE for a high-availability multi-region setup. Version upgrades require careful planning and testing. The Community edition lacks clustering, hot backups, role-based access control, and several other production-required features — practitioners running self-hosted at scale need the Enterprise license, which carries its own pricing.

Recommendation framework. Start with Neo4j Aura Professional or Enterprise Light unless one of three conditions applies: (a) data-residency requirements foreclose Aura’s regional availability, (b) the firm already has DevOps capacity allocated to graph infrastructure (which makes the self-hosted operational burden marginal), or (c) the graph is already at 100M+ relationships at deployment time and Aura’s cost crosses below self-hosted’s amortized cost. Most mid-size DD practices satisfy none of these conditions and Aura is the right default.

JanusGraph

JanusGraph is an Apache-licensed distributed graph database that runs over Cassandra, HBase, BerkeleyDB, or Google Bigtable storage backends. Its query language is TinkerPop Gremlin, not Cypher. The audience is a firm that has already committed to a Cassandra or HBase operational stack and prefers operational reuse over methodology reuse.

JanusGraph advantages. Distributed by design — scales to billions of relationships across many machines. Apache-licensed (no vendor lock-in). Storage-backend choice (different operational properties for Cassandra vs HBase vs BerkeleyDB).

JanusGraph disadvantages. TinkerPop Gremlin is a different query language from Cypher — methodology portability from this sub-series is substantial work, not a trivial port. The algorithm library is less mature than Neo4j GDS. Operational complexity is much higher — the firm now operates JanusGraph plus its underlying storage backend.

When JanusGraph wins. Firms with existing Cassandra/HBase operations that need a graph layer, very large graph scales (1B+ relationships), or organizations with strong preferences for Apache-licensed software stacks. For most mid-size DD practices, the operational complexity and methodology-portability cost don’t justify the choice.

TigerGraph

TigerGraph is a proprietary graph database whose query language is GSQL. It is designed for very-large-scale graph analytics with strong distributed-query execution.

TigerGraph advantages. Sustained performance at billion-relationship scale where Neo4j shows latency degradation. GSQL is procedural and expressive for complex multi-hop analytics. Enterprise vendor support is mature.

TigerGraph disadvantages. Proprietary platform with vendor-relationship implications. GSQL is incompatible with Cypher; methodology portability requires a substantial port. Licensing costs at production scale are meaningful and require negotiation.

When TigerGraph wins. Large financial institutions with billion-relationship-scale graphs and budget for the platform investment. Outside that profile, the vendor-relationship and language-port costs are usually higher than the performance gain justifies.

Amazon Neptune

Amazon Neptune is AWS’s managed graph database. It supports both property graphs (with openCypher and Gremlin query languages) and RDF (with SPARQL). The Neptune Analytics variant adds an in-memory layer for analytical workloads.

Neptune advantages. Native AWS integration — IAM authentication, VPC isolation, CloudWatch monitoring, AWS Backup, all the operational primitives that AWS-shop firms already operate. openCypher support means a Cypher-based methodology ports with moderate (not trivial, but not full-rewrite) effort. Multi-modal — same instance can host both property graph and RDF data.

Neptune disadvantages. AWS-ecosystem lock-in. The Neo4j GDS algorithm library does not run on Neptune; algorithm portability requires implementing in openCypher or pulling data out for external algorithm execution. Pricing at production scale is comparable to Aura Enterprise but with AWS-instance pricing dynamics rather than relationship-count pricing.

When Neptune wins. Firms that have already committed to AWS as the platform substrate and want to minimize the platform-vendor count. For AWS-shop firms, Neptune is often the right choice. For multi-cloud or non-AWS firms, Neo4j Aura is usually the better choice.

Memgraph and NebulaGraph

Two open-source alternatives serve narrower niches.

Memgraph. Cypher-compatible (the same queries from this sub-series run with minor adaptations), in-memory architecture (much lower query latency than disk-based Neo4j for the same workload), and real-time streaming-graph features (continuous query subscriptions over Kafka topics). The narrow niche is firms whose workload includes hard real-time requirements (sub-100ms query latency on continuously-updating graphs). For DD work, the real-time requirement rarely applies.

NebulaGraph. Open-source distributed graph database with nGQL query language (similar to Cypher but not identical). The narrow niche is firms with a strong preference for open-source-distributed architecture and the operational capacity to run the multi-component NebulaGraph deployment. For most mid-size DD practices, the operational complexity isn’t justified.

Methodology portability

Cypher is now an international standard. ISO/IEC 39075:2024 — Information technology — Database languages — GQL — was published in 2024 as the first ISO graph-query-language standard. GQL standardizes a Cypher-derived query language that all major property-graph databases either already support or have committed to supporting.

The portability implication: the methodology articles in this sub-series rely on Cypher syntax, but the substantive content (graph schema design, algorithm choice, query patterns, audit-defensibility framing) is platform-agnostic. A switch from Neo4j to Memgraph or Amazon Neptune requires re-validating Cypher syntax (most queries run unchanged or with minor edits) but doesn’t require re-thinking the methodology. A switch to JanusGraph (Gremlin) or TigerGraph (GSQL) requires translating the queries but the underlying analytical approach survives.

Algorithm-library portability is the harder question. Neo4j GDS has the deepest algorithm coverage (PageRank, Louvain, label propagation, betweenness, closeness, weakly-connected-components, strongly-connected-components, embedding algorithms, community-detection variants). Memgraph supports a subset; Neptune supports a smaller subset; JanusGraph and TigerGraph have their own algorithm libraries that overlap with GDS but aren’t identical. For methodology that relies heavily on GDS-specific algorithms (the Random Walks on DD Graphs and Community Detection articles in this sub-series), the algorithm-portability cost is meaningful in a platform switch.

The decision framework

A practical decision flowchart for mid-size DD practices:

Step 1 — graph size estimation. Project the graph at 3 years out: counterparty count, relationship count per counterparty, time-series-edge accumulation rate. Most mid-size DD practices land in the 1M-50M relationship range at 3 years; some scale higher with aggressive automation.

Step 2 — workload pattern. Read-heavy analyst queries dominate for most DD practices; write-heavy nightly ingestion appears when automation matures. Most practices have read:write ratios of 10:1 to 100:1.

Step 3 — operational capacity. Does the firm have allocated DevOps headcount for graph infrastructure, or is the methodology team also the operations team? Most mid-size DD practices do not have dedicated DevOps; the methodology lead and her team are also the operators.

Step 4 — cloud-vendor commitment. Is the firm already AWS-committed, multi-cloud, or non-cloud? AWS-committed firms benefit from Neptune’s integration; non-AWS firms get more from Aura.

Step 5 — methodology-portability priority. Does the firm anticipate needing to switch substrates in the next 3-5 years? Firms that value optionality lean toward Cypher-compatible options (Neo4j Aura, Memgraph, Neptune with openCypher); firms that are confident in their commitment can consider GSQL or Gremlin options.

Default recommendation for the mid-size DD practice (1M-50M relationships, no dedicated DevOps, multi-cloud or AWS-flexible, prefers methodology portability): Neo4j Aura Enterprise. The combination of methodology-portability (Cypher + GDS), low operational burden (managed service), and predictable pricing fits the profile. Revisit the choice when graph crosses 100M relationships, when AWS commitment becomes firm, or when ISO GQL standardization mature enough to support a cross-vendor mainstream.

Conditions to revisit

Three signals indicate the platform choice should be re-evaluated.

Graph size crosses 100M relationships. Aura Enterprise pricing at this scale becomes substantial; self-hosted Neo4j Enterprise plus a fractional DevOps allocation may become cost-favorable; alternative platforms designed for larger scale (TigerGraph, JanusGraph) enter the comparison.

Workload becomes write-heavy with real-time requirements. Memgraph’s in-memory architecture and streaming-graph features become meaningful; the Cypher-compatibility minimizes methodology-port cost.

Regulatory environment foreclosing managed-service options. Data-residency rules in specific jurisdictions (EU’s GDPR specifically, post-Brexit UK, certain U.S. state-level rules) may foreclose Aura’s regional availability and force self-hosting in the firm’s controlled infrastructure.

The article closes the Neo4j sub-series. Wave 2 extensions — multi-modal graphs combining property graph and RDF and vector, federated graph queries across multiple physical databases, graph-data-mesh architectures — are reserved for the next cohort of articles when the methodology demands and reader appetite justify the depth.


References

Vendor documentation:

  • Neo4j Aura Documentation — pricing tiers, SLA, regional availability, instance catalog.
  • Neo4j Enterprise Licensing Documentation.
  • JanusGraph Documentation — storage-backend trade-offs (Cassandra, HBase, BerkeleyDB).
  • TigerGraph Documentation — GSQL language, distributed query execution, vendor pricing tiers.
  • Amazon Neptune Documentation — instance-class catalog, openCypher and SPARQL support, Neptune Analytics in-memory variant.
  • Memgraph Documentation — Cypher compatibility, in-memory architecture, real-time streaming-graph features.
  • NebulaGraph Documentation — open-source distributed graph database, nGQL query language.

Standards:

  • ISO/IEC 39075:2024. Information technology — Database languages — GQL. International Organization for Standardization.
  • openCypher Resources — the open subset of Cypher that maps to GQL standardization.

Theory background:

  • Brewer, E. (2000). “Towards Robust Distributed Systems.” PODC keynote. (CAP theorem.)
  • Angles, R., Arenas, M., et al. (2017). “Foundations of Modern Query Languages for Graph Databases.” ACM Computing Surveys, 50(5), 1-40.

Reproducible code: Companion repository at github.com/noahrgreen/dd-tech-lab-companion ships the TCO calculator across the major architecture options (refreshed quarterly with current vendor pricing), the decision-framework flowchart as a runnable rules engine, vendor-comparison feature matrices across the major dimensions, and a portability-assessment checklist for evaluating methodology survival across platform switches.