What is the Difference Between Tree and Graph Data Structures?

A Detailed Blow-by-blow Comparison of Trees and Graphs

The first time I realised GraphQL, unlike every other graph query language, returns a tree instead of a graph came as surprise. It was right there in the name!

So, naturally, this go me thinking... what the heck is the difference between tree and graph data structures anyway? Here's what I've found out.

In a nutshell, a tree is simply a hierachical graph with a root node. Tree data structures will not be as intricately connected as graphs, trees tend to have a single path between nodes and they never ever have loops or circuits. Let's see what all of this means.

Tree and graph data structures have a lot of overlaps and the two can get really confusing very fast. Lets go over the differences in a side-by-side chart - to keep things simple.

Tree Data Structures

Graph Data Structures

Trees data structures are hierarchical, non-linear collections of linked nodes.

Linear data structures such as linked lists and stacks have a logical start and logical end.

A good example of hierarchical data is a family tree where each level of depth would represent a hierarchy such as grand parents, parents etc.

The first node of a tree is the root node. Everything else flows hierarchically from this node.

Another great example or a tree is the HTML DOM where every other tag flows hierachically from the html doctype tag.

The root node is also called the parent node and all the other nodes connected to it child nodes.

A tree is thus a collection of linked nodes starting from the root or parent where each node is a data store with a value and references to it's child nodes.

Like real world trees, tree data structures have a root, branches and leaves. Leaves are the last nodes without children.

Data structures such as BST have several advantages over Hash tables.

Graphs data structures are are collections of linked nodes in non-linear network models.

Like tree data structures, graphs have nodes (that is, the vertices where data is stored) and and connections between nodes (that is, edges which carry the rich relationship data of nodes).

A good example of a network graph is a map of roads within a city.

Graph data structures do not have root nodes and there are no parent-child relationships.

A graph data structure therefore is a data structure that contains; (i) a finite set of nodes and; (ii) a finite set of edges (u, v). The edges are ordered because (u, v) (from vertex u to vertex v) is not the same as (v, u) in case of a di-graph.

Other examples where graph are used to represent a networks is a circuit network and social networks.

What are the Types of Tree and Graph Data Structures?

The most common types of trees data structures are:

  1. Binary Trees (BT)
  2. This is the most basic, and perhaps most familiar, type of a data tree data structure. Every parent node in a BT has two nodes or less.

    In a perfect binary tree, all inner nodes have exactly two children each and all leaves have similar depth.

    In a full binary, proper [15] or plane binary tree, all nodes have either no children or exactly 2 children.

    In a complete binary, every level is completely filled and all nodes in the last level are as far left as possible.

    In an infinite binary, every node has at least two children.

  3. Binary Search Trees (BST)
  4. A binary search tree is a binary tree (BT) which the following four properties:

    1. The left subtree of every node contains only nodes with keys lesser that the node's key.
    2. The right subtree of every node contains only nodes with keys greater that the node's key.
    3. Both left and right subtrees also have binary search trees (BST).
    4. There must be no duplicate nodes.

    The purpose of these properties is to provide ordering so that operations on BSTs such as search and minmax can be blazingly fast—there is no comparison of keys during during operations.

    In-Order traversal of a BST alway produces sorted output and we can construct a BST with only a Pre-Order or Post-Order or Level-Order traversal. See tree traversals.

  5. AVL Trees (Height Balanced Binary Tree)
  6. An Height balanced AVL tree is a binary tree (BT) where, you guessed it, the height difference between left and right sub tree is utmost 1.

    If the above property is violated, rebalancing is done to restore it.

  7. Heap Tree Structures (Heaps)
  8. Like BSTs, Heap Structures also have a specific ordering property.

    In min Heap tree structures, the parent node must be smaller than its children.

    In max Heap tree structures, the parent node must be larger than its children.

    In a Binary Heap, each parent node can have at most two children.

  9. Red-Black Trees
  10. Red-Black trees are self-balancing (like AVL trees) BSTs. Red and Black comes from the coloring of their nodes.

  11. Splay Trees
  12. Splay trees are self-adjusting (recently accessed elements are quicker to access again—splaying) BSTs.

  13. N-array or Forest Trees
  14. N-array tree are NOT BT or BSTs because nodes can have n children—hence the forest analogy. But, like, BTs, they can be complete or perfect.

  15. Trie Structure Trees
  16. A trie tree, prefix tree or radix tree is an ordered tree data structure used to store a dynamic set or associate array where the keys are strings.

    All child nodes with a parent node prefix. Child nodes of a common parent have a similar prefix while the root has an empty string.

  17. B Trees
  18. B trees are generalized binary search tree BST where a node can have more than two children.

    B tree are sorted self balancing tree data structure that allow operations such as searches, insertions, deletions and sequential access in logarithmic time.

  19. B+ Trees
  20. B+ trees are N-array trees where internal nodes have two or more children—n children per node. Unlike N-arrays each node contains only keys (never key-value pairs) with linked leaves.

  21. Suffix or PAT Trees
  22. A suffix tree or PAT tree is a compressed trie tree with the custom text suffixes in keys and position in the text as values.

    Suffixes are very fast many string operations.

  23. Huffman Trees
  24. Huffman trees are frequency sorted BTs. Huffman trees are used widely in data compression algorithms.

    Huffman tree is constructed to allocate a short code word to a long text based on its frequency of occurrences.

This list only lists the most common types and is by no means exhaustive, there are types of tree not covered here such as R Trees, Counted-B Trees, K-D Trees (K-Dimensional BSTs), Decision Trees (N-array Tree variants), Binary Index Trees (Fenwick) and Range Trees and Markel Trees.

Here's an even bigger list of types of tree data structures if you are looking into that sort of thing.

The most common types of graph data structures distinguished on the basis of edges, direction and weight are:

  1. Simple Graphs
  2. A simple graph is a graph in which:

    1. Each edge connects two different nodes, and;
    2. No two edges connect the same pair of nodes

    In a simple graph no node has a self-loop and no two vertices have more than one edge connecting them.

  3. Multigraphs
  4. In multigraphs are similar to simple graphs except, many edges can connect the same pair. The multiplicity of nodes indicates the number of edges between those nodes.

  5. Directed Graphs
  6. In directed graphs, the order of nodes and edges matters. Arrows are used are used for the edges between the nodes.

    They are directed if the relationship exists in only one direction (also called di-graphs).

  7. Undirected Graphs
  8. In undirected graphs, the order of nodes and edges does not matter. The edges are usually draw with straight lines with no directional bias.

    They are undirected if the relationship exists in both directions. Whether a relationship exists in one or both directions must be explicitly stated.

    The adjacency relationship of undirected graphs is symmetric i.e. u ~ v is the same as v ~ u.

  9. Cyclic Graphs
  10. A cyclic graph is a directed graph with at least one cycle. See graph paths and cycles down below.

  11. Directed Acyclic Graphs
  12. A DAG is a directed graph with no cycles. These are some of most widely used types of graphs.

  13. Vertex Labeled Graphs
  14. In labeled graphs, each node has some data in addition to that which identifies the node while there is no addtional data in the edges except that which identified them.

  15. Edge Labeled Graphs
  16. Edge labeled graphs are the opposite of vertex labeled graphs, there is additional data in the edges but not in the nodes.

    This means that edges are associated with labels e.g. in RDF triples.

  17. Weighted Graphs
  18. A weighted graph is an edge labeled graph where the labels can be operated on by operations such as comparison, min and max etc. The labels are usually integers or floats.

    A key concept behind weighted graphs is that the labels denote cost. In application, some edges may be more expensive than others.

  19. Disconnected Graphs
  20. An interesting concept in graph data structures is that nodes do not need to be connected. When lone nodes are part of a graph, it is called a disconnected graph.

This list only scratches the surface with the most common types and is by no means exhaustive because there are many—and I mean MANY—types of graph data structures. Just look at the huge number of simple graphs alone to get an idea of what I mean.

Tree vs. Graph Paths and Cycles

A path represents a sequence of connections (also called, edges) between two or more nodes. These connections are used to manage trees.

Trees have only one path between any two nodes.

The height of a tree is the longest path to the leaf.

The depth of a node is the length of the path to its roots.

Whereas trees in mathematics and graph theory are are assumed to be undirected, tree data structures they are assumed to be directed and rooted unless otherwise qualified.

Tree data structures are directed acyclic graphs (DAG). In graphs, cycles are paths of edges and nodes wherein a node is reachable from itself.

Graphs can be cyclic or acyclic. Cycles can be simple or closed-walk.

Closed-walks start and end at the same node and each two consecutive nodes are adjacent to each other

.

Simple cycles are closed-walks with no repetitions or nodes AND/OR edges except the starting and ending nodes, which makes them directed. They are therefore also known as directed cycles.

Circuits can be closed-walks allowing repetition of nodes BUT NOT edges, it can also be a simple cycle.

Tree vs. Graph Traversals and Search

Tree traversals are just a special type of graph traversal. Tree data structures can be traversed in In-Order, Pre-Order and Post-Order (DFS or BFS) traversal algorithms.

Take the following tree as an example:

tree data structure example

The DFT (depth first traversals) would be:

  1. In-Order - (Left, Root, Right): 4, 2, 5, 1, 3
  2. Pre-Order - (Root, Left, Right): 1, 2, 4, 5, 3
  3. Post-Order - (Left, Right, Root): 4, 5, 2, 3, 1

While, the breadth first traversal would be:

  1. BFT: 1, 2, 3, 4, 5

In binary search trees, In-Order traversals return nodes in non-decreasing order.

Pre-Order traversals are used to create a copy of the tree and to get the prefix expressions of an expression tree.

Post-Order traversals are used to delete the tree and to get prefix expressions of an expression tree.

Graphs data structures are traversed with depth first search (DFS) and breadth first search (BFS) traversal algorithms.

DFS for Graphs is similar as for trees except when there are cycles, where a boolean visited array is used.

In the following graph example:

graph data structure example

Assuming we start traversal from vertex 2, then head for 1, we need to mark 2 as a visited vertex otherwise that's already a non-terminating process.

The depth first traversal here would be:

  1. DFT: 2, 1, 3, 4

While the breadth first traversal would be:

  1. BFT: 2, 1, 4, 3

Loops and Complexity

Looking at the examples used in traversal above, it is easy to see that tree data structures have no loops and are not intricately connected and therefore not complex.

Trees are also simpler than graphs because they follow the parent-child concept - there is exactly one root node and every child has only a single parent.

Graph data structures structures are a bit more complex than trees because they can have loops, circuits and self-loops see the (1, 2, 3) loops in traversals.

Graphs therefore tend to be more connected and complex than trees. The bi-directional nature of some graphs also adds to the complexity.

Applications, Common Use Cases and Examples

The most handy use case for trees is storing and manipulating data that naturally forms a hierarchy. An examples of data that forms a hierarchy would be file system in a computer.

Tree data structures with some ordering (or sorted) such as binary search trees (BST) are used to search, inserting and deleting keys in moderate time (quicker than linked lists but slower than arrays). Trees generally make information easy to search.

Trees are often used to store and manipulate sorted lists.

Another common uses case for trees is in router algorithms.

Trees are also heavily used to create visual effects.

Breadth first traversal of unweighted graphs is used in finding shortest paths and minimum spanning trees.

BFS is also heavily used to find all neighbours nodes in peer-to-peer networks.

Crawlers build indexes using BFS, so does broadcasting in networks and GPS navigation systems.

Cycle detection in undirected graphs and identifying Bipartite graphs have several important applications

Both BFS and DFS traversals are used for path finding in applications such as maps.

Graph data structures have such wide ranging applications including job scheduling, operational research.

Graph coloring with Greedy Algorithms has applications such as mobile frequency assignment, CPU register allocations in compiler optimizations, identifying Bipartite graphs, map coloring and many others.

Graph Traversals in GraphQL

With these two data structure distinctions at hand, it's easy to see the truth in the statement Falcor data model is a graph, and the GraphQL data model is a tree.

Granted, the most popular use case for GraphQL is operating on graph data and besides,

There is technically nothing in the GraphQL spec that binds it to use with graph data structures. A perfect example of GraphQL querying a tree instead of a graph data stucture in the wild is Gatsby. The file system and DOM are both trees and GraphQL allows them to be queried as trees.

GraphQL defines a Root Query Type this is the tree data structure's root node and where the GraphQL result starts to build its tree. This is also where our queries start traversing a GraphQL API.

A GraphQL Query therefore always returns a tree data structure.