What is a Graph Database?
A graph database is a type of NoSQL database that uses graph theory (graph data models) stores, map and query relationships.
In graph theory, a graph comprises of vertices (nodes) connected by edges (arcs).
A graph database is thus, essentially a collection of vertices and edges. A vertice represents entity such as a person, place or event, a discrete object, while an edge represents a relationship such as between vertices, such as a person known to another, having been involved in an event at a certain place.
A vertice in a graph database has a unique identifier, followed by a set of edges. Both vertices and edges can have an arbitrary number of key/value pairs i.e properties.
Properties typically express non-relational information about vertices and edges.
A graph as used in graph databases is often referred to as a propery graph.
When a graph is undirected, it means that any two vertices connecting an edge are not different.
A graph database models vertices and edges in the relational graph as first-class entities. This allows to complex interactions which mimic a more natural form of data modeling and representation.
What is a Graph Data?
Most data can be represented as graphs.
Data that is composed of heterogenious sets of objects (which can be represented as vertices) and that can be related to one another in a complex ways (which can be represented as edges) is a perfect fit for a graph data model.
While data in a tables can also be related, as represented in relational databases, the relationship are somewhat simplistic when contrasted to graph data. Data that submits itself to complex many-to-many relationship is more rightly represented with graphs.
Gremlin traverses property graphs using a sequences of map-step, filter-steps or sideEffect-steps in queries.
Is My Data a Graph?
There really aren't a lot of true hierarchies in data. Those adon't really exist.
Graph data is a much better representation of how data actually works in the real world.
Here are three common pointers are to whether your data is better of with graphs than relational or hierarchical databases.
- If data is best represented by many-to-many relationships.
- If these complex relationships between data change often (highly flexible but important relationships).
- If data has unstructured relationships (complex but non-hierachical - much more closer to an unstructured network).
How do Graph Databases work?
On an abstracted level, graph databases see data from a completely model with relational databases. A graph database sees your data as vertices related with edges while a relational database sees your data as a set of tables connected by the primary-key in each table.
At a lower level a graph database is just a huge index of data vertices. A graph query targets clear, explicit vertices never touching the others. There are ho hidden assumptions. A relational data, by contrast, sweeps across large dataset only to collect a single field such with FROM clause.
When to Use Graph Databases instead of Relational Databases (The Pros)
Graph databases are a better fit for some problems than others. Generally, data than can be modeled on a graph database can also be modeled on a relational database. Using graph databases offers the following advantages over relational databases.
- Low-latency at Large Scale
- Intricately Structured High Value Relationships
- Near Perfect Data Visualization
- Aggregating Queries
- Constantly Evolving Real-time Data
A unique value proposition of graph databases is superior performance when querying huge datasets.
Relational databases have a somewhat limited ability to handle multiple joins, especially on big data datasets without introducing an unnecessary level of complexity. The complex relational join query is a back-breaker.
Graph databases excel at querying huge related data. Graphs flow from an relational data structure which stores data in its natural relationship as opposed to a adapting it to fit a tabular model. Data is accessed exactly as defined in the schema at raw loading time. Query processing is faster because non-relevant data is easily bypassed.
A key niche curved out by graph databases is shaping out to be real-time big data, particularly because of the flexibility of queries, coupled with their efficiency.
Some of the perfomance bottlenecks of relational databases can be directly attributed to such inefficient design concepts such as sequential scans.
Graph databases are an AI favorite because of their ability to model complex data relationship.
Data relationships are intimately structured to accomodate inference of things such as indirect facts and tangentically related information. The edges are just as important and detailed as the vertices.
The capability of graph databases to accomodate rich relationship data is virtually unmatched by any other database technology available today by a wide margin. Google cites one of the key advantages of graph-based semi-supervised machine learning approach as the ability to model labeled and unlabeled data JOINTLY during learning by leveraging the graph data structures - this then allows them to combine multiple signals into a single graph and use graph learning over it.
Capabilities for knowledge inference from graph data structures relationship has also been emphasized by DeepMind, especially as an optimizationa and configuration for neural networks.
Data visualization is a notable graph database forte. Graph data structures are the industry standard here.
Combining multiple dimensions to visualize large datasets such as time series, demographics etc. is one of the default use cases. Graph data structures are perfectly suited for model natural intuitive data relationships.
Aggregating queries in a tabular data structure is a pain because tables already dictate how data is grouped. A relational database simply will not group data from a specific selection of data points.
Schema evolvement in graph queries is a key advantage in this regard. You can aggregate and manipulate your data by simply dropping or adding vertices that extend or shrink your data.
Relational databases to do not easily adapt to constantly chaning object types that are ccmmon in realtime and live-update applications.
Highly expressive graph query languages are very adaptive to querying constantly changing underlying schema.
NoSQL databases are just as adaptive to constantly changing object types.
When to NOT to Use Graph Databases (The Cons)
As with any popular technology, there is a tendency towards solving every database problem with graph databases.
When paired with the right use case, graph databases are a viable solution
Some edge cases are a good fit for graph databases especially those that have no need for advanced data structures such as graphs.
Let's look at some of these.
- Unrelated data
- Standard query language
- No proficient graph developers
If your data is not related, graphs might not be the best fit. Most data is connected is naturally connected in some way, but sometimes a dataset can have no connections via properties and have no connections at runtime via queries.
A scenario is conceivable that data fits into, say, exactly one object type. This is not an ideal use case for a graph database.
NoSQL database generally have no standard query language (SQL). This can be both and advantage and a drawback. Graph databases are NoSQL. There are a host of different query languages with no central authority. e.g GraphQL, Gremlin and so on.
This results to few languages will having support and tooling outside their immediate ecosystem. This often derails enterprise adoption because of the need for skilled inhouse developer teams.
Because the graph data model is fairly new and now as mature or as popular as relational data models. The ecosystem is still catching up and it is still hard to find and keep talent.
Which the best Graph Database?
This is a bit hard to answer this...
Mostly because Neo4j is pretty much what most people using graph data are familiar with.
For multi-model graph databases Datastax seems to be following a close second after Microsoft Azure Cosmos DB. Orient DB and ArangoDB are also popular alternatives to Neo4j.
The top 5 Graph Databases: Alternatives to Neo4j
As I said earlier, Neo4j is a front runner by a wide margin right now.
It's still seems a little to early to tell what impact Amazon Neptune with have in the Graph community.
How good is Neo4j as a graph database?
Andrew Nikishaev had an interesting take on Neo4j after using it for a year.