Graph Databases

7 Data-driven Reasons Why You Should Use Graph Databases (Developer Survey)

I have written about graph database use cases and when to choose graph databases over relational databases in previous articles. This article looks at graph database use case characteristics data from several case studies and surveys to answer why you should use graph databases.

As a succint summary, here are 7 common reasons why you should consider moving to graph databases according to industry data by IBM and TechValidate (a global survey of 1,365 entrepreneurs and developers):

  1. Faster batch processes
  2. 37% of respondents across all business sizes and 37% of respondents from large enterprises in the IBM and Techvalidate survey (chiefly developers and enterprenuers) cited batch processes taking too long as their main reason for adopting graph databases.

  3. Faster application performance
  4. Another 37% of the respondents across all business sizes and 40% of respondents from large enterprises in the survey cited concerns over slow application performance as the reason they turned to graph databases.

  5. Missed correlations between data
  6. 33% of the respondents across all business sizes and 36% of respondents from large enterprises cited missed correlations between data as the problem they were experiencing that made them consider using graph databases.

  7. Cumbersome development processes
  8. 32% of the respondents across all business sizes and 30% of respondents from large enterprises said they were choosing graph databases because they were not comfortable with the cumbersome development processes they were experiencing.

  9. Faster testing cycles
  10. 18% of the respondents across all business sizes and 20% of respondents from large enterprises said they needed working graph databases because they wanted faster testing cycles than what they currently had.

  11. Fraud detection capabilities
  12. 16% of the repondents across all business sizes and 15% of respondents from large enterprises indicated the need for improving fraud detection capabilities as the reason they condsidered graph databases.

  13. New data integration
  14. While 32% of the repondents across all business sizes and 35% of the respondents from large enterprises indicated that problems bringing in new data sources why their main motivation for adopting graph databases.

    Let's discuss some of these reasons.

Faster Batch Processes

Graph technology is often classified into two groups; technologies focused around OLTP-like persistence, typically the graph databases themselves, and; technologies focused around OLAP-like analytics, typically graph computation algorithms.

Graph computation algorithms run batch processes against large datasets (especially in Big Data, Data Mining and Advanced Analytics), usually with an emphasis on global queries.

A raft of different types of graph compute engines exist, the most popular being distributed such as Pegasus and Giraph.

Batch Imports

Batch imports usually

Batching Writes

Faster Application Performance

For graph data, graph databases are more performant than their relational and NoSQL counterparts.

Join-intensive query performance progressively deteriorates, in relational databases, as datasets expand. The same tends to even out in a graph database with similar expansion.

This is because graph queries are typically localized to a specific portion of the graph, instead of querying the entire data in broad sweeps using foreign keys. The execution time of a graph query is, thus, proportional only to the portion of the graph traversed, not the size of the overall graph (or an relational database table).

Relationships don't exhibit uniformity across domains, but they naturally forms paths, and these paths are perfect for traversal.

Traversing a graph is simply following paths, and when paths are closely aligned with how the data relates, they are extremely efficient improving query execution and application performance.

Missed Data Relationships

In graph databases, relationships are first-class citizens. A graph database is more relational than a relational database.

Unlike relational databases where relationships are inferred using foreign keys in tables, which results in complex abstractions, problems are more closely mapped (and simply expressed) to how they present themselves in the real world in a graph database.

With simple nodes and relationship structures, sophisticated connections can be more exhaustively inferred, much more than is possible, with say many-to-many relationships, in SQL relational databases and other NoSQL stores.

Real world data is often more intricately related than is modeled in relational databases at the time of joining tables, especially when those relationship are on semi-structured data or when the relationships change.

Some intricacies of data relationships are easy to miss with relational and other NoSQL databases, such as the weight and strength of specific relationships, they are even harder to represent, express accurately or evolve.

As data gets more intricately related or connected, relational databases develop expensive joins adding unnecessary complexity to the schema, reciprocal queries are even more expensive.

NoSQL data stores are almost completely devoid of relationships, trying to fit them to related data flirts with the idea of introducing foreign keys, which then requires joins, at which point the queries are already expensive.

Better Development Processes

Graph databases make a good case for frictionless development. The flexible structure of graph data models (schema-free), make for a pleasant developer experience.

What results is controlled incremental development with lesser rollbacks and migrations. By lacking a more forceful and rigid schema, graph databases are easier to evolve and maintain.

Faster Testing Cycles

With graph databases, it is easier to write unit tests on small representative subgraphs (often a few nodes) that expose the features and relationships the overall graph.

Graph correctness and understanding is typically tested in a series of very small tests, each of which explores a discrete graph feature, which collectively follow an exploratory path through a problem domain.

It is rare for graphs to have breaking changes, even as the models change to accomodate new data, these is also evident in testing.

Graph queries are tested for performance to make sure they are a perfect fit for production data. The main focus here will be perfecting queries with simulations of production data.

Application performance tests are typically centered around simulations of production usage scenarios.

Effective Fraud Detection

New Data Integration

Graphs are naturally additive - they do not impose a structure upfront when least is known about the problem domain, and this flexibility allows the schema to merge as the problem is explored.

Additionally, new nodes and relationships, even new subgraphs can be integrated to an existing structure without affecting the application and existing queries. There is therefore no need to be exhaustive ahead of time.

This is the complete reverse of the methodology used in relational databases - this approach would be nearly catastrophic!