Graph Data Models and Query Languages
Explore graph databases and when to use them. Learn about vertices, edges, property graphs, and why graphs excel at modeling highly connected data like social networks and recommendation systems.
When Relationships Are Everything
We've explored document databases (great for hierarchical, tree-like data) and relational databases (solid for structured data with some relationships). But what happens when your data is all about connections? When the relationships between things matter as much as, or more than, the things themselves?
Think about a social network. The interesting part isn't just that Alice exists and Bob exists. It's that Alice knows Bob, Bob works with Carol, Carol went to school with Alice, and they all live in the same city. The connections tell the story. When relationships become this central to your data, you need a data model designed specifically for them: the graph.
What Is a Graph?
A graph is surprisingly simple. It consists of just two things:
Vertices (also called nodes or entities): The things in your data. People, places, web pages, products, anything you want to represent.
Edges (also called relationships or arcs): The connections between things. Friendships, purchases, links, routes, any relationship you want to capture.
That's it. Vertices and edges. But from these simple building blocks, you can model incredibly complex real-world scenarios.
Real-World Graphs Are Everywhere
Once you start looking, you'll see graphs everywhere:
Social networks represent people as vertices and friendships as edges. Facebook's entire data model is essentially one massive graph with billions of vertices.
The web itself is a graph. Each web page is a vertex, and every hyperlink is an edge pointing to another page. Google's PageRank algorithm, the foundation of their search engine, works by analyzing this web graph.
Transportation networks model junctions as vertices and roads or railway lines as edges. When your GPS finds the shortest route to your destination, it's running a graph algorithm.
Recommendation systems connect users to products they've bought, movies they've watched, or songs they've listened to. Finding what to recommend next means traversing these connections.
The Power of Heterogeneous Graphs
Here's where graphs get really interesting. The examples above show homogeneous graphs, all vertices are the same type (all people, or all web pages). But graphs can mix completely different types of things in the same structure.
Consider Facebook's actual data model. Their graph contains vertices representing people, locations, events, check-ins, photos, comments, and pages. Edges represent friendships, event attendance, photo tags, comments on posts, and check-ins at locations. One unified graph captures all of it.
Let's model a concrete example. Imagine we're tracking two people: Lucy, who was born in Idaho and now lives in London, and Alain, who's from Beaune, France, and also lives in London. They're married to each other.
Notice something powerful here: we've mixed people, cities, countries, and continents in the same graph. The edges capture different kinds of relationships: personal (MARRIED_TO), biographical (BORN_IN), current status (LIVES_IN), and geographical hierarchy (WITHIN). This flexibility is a core strength of the graph model.
The Property Graph Model
The most popular way to implement graphs in databases is the property graph model, used by Neo4j, Amazon Neptune, and others. In this model, both vertices and edges can carry additional information as key-value properties.
Each vertex has:
- A unique identifier
- A set of outgoing edges (relationships pointing away from it)
- A set of incoming edges (relationships pointing toward it)
- A collection of properties (key-value pairs)
Each edge has:
- A unique identifier
- A tail vertex (where the edge starts)
- A head vertex (where the edge ends)
- A label describing the relationship type
- A collection of properties (key-value pairs)
# Python representation of a property graph
class Vertex:
def __init__(self, vertex_id, properties=None):
self.vertex_id = vertex_id
self.properties = properties or {}
self.outgoing_edges = []
self.incoming_edges = []
def __repr__(self):
return f"Vertex({self.vertex_id}, {self.properties})"
class Edge:
def __init__(self, edge_id, tail_vertex, head_vertex, label, properties=None):
self.edge_id = edge_id
self.tail_vertex = tail_vertex # Where the edge starts
self.head_vertex = head_vertex # Where the edge ends
self.label = label # Type of relationship
self.properties = properties or {}
def __repr__(self):
return f"Edge({self.tail_vertex.vertex_id} --{self.label}--> {self.head_vertex.vertex_id})"
# Build our example graph
lucy = Vertex(1, {"name": "Lucy", "type": "person"})
alain = Vertex(2, {"name": "Alain", "type": "person"})
idaho = Vertex(3, {"name": "Idaho", "type": "location"})
london = Vertex(4, {"name": "London", "type": "location"})
beaune = Vertex(5, {"name": "Beaune", "type": "location"})
edges = [
Edge(1, lucy, idaho, "BORN_IN"),
Edge(2, lucy, london, "LIVES_IN", {"since": 2015}),
Edge(3, alain, beaune, "BORN_IN"),
Edge(4, alain, london, "LIVES_IN", {"since": 2012}),
Edge(5, lucy, alain, "MARRIED_TO", {"since": 2014}),
]
# Each edge connects its vertices
for edge in edges:
edge.tail_vertex.outgoing_edges.append(edge)
edge.head_vertex.incoming_edges.append(edge)
# Now we can traverse the graph
print(f"{lucy.properties['name']}'s relationships:")
for edge in lucy.outgoing_edges:
print(f" --{edge.label}--> {edge.head_vertex.properties['name']}")Output:
Lucy's relationships:
--BORN_IN--> Idaho
--LIVES_IN--> London
--MARRIED_TO--> Alain
Storing Graphs in a Relational Database
Here's an insight that might surprise you: you can store a property graph in a relational database using just two tables. This isn't always the best approach (specialized graph databases are optimized for graph traversal), but it illustrates the structure beautifully.
-- The vertices table: one row per node
CREATE TABLE vertices (
vertex_id INTEGER PRIMARY KEY,
properties JSON
);
-- The edges table: one row per relationship
CREATE TABLE edges (
edge_id INTEGER PRIMARY KEY,
tail_vertex INTEGER REFERENCES vertices(vertex_id),
head_vertex INTEGER REFERENCES vertices(vertex_id),
label TEXT,
properties JSON
);
-- Indexes for efficient traversal in both directions
CREATE INDEX edges_tails ON edges(tail_vertex);
CREATE INDEX edges_heads ON edges(head_vertex);-- Insert our example data
INSERT INTO vertices VALUES
(1, '{"name": "Lucy", "type": "person"}'),
(2, '{"name": "Alain", "type": "person"}'),
(3, '{"name": "Idaho", "type": "location"}'),
(4, '{"name": "London", "type": "location"}'),
(5, '{"name": "Beaune", "type": "location"}');
INSERT INTO edges VALUES
(1, 1, 3, 'BORN_IN', '{}'),
(2, 1, 4, 'LIVES_IN', '{"since": 2015}'),
(3, 2, 5, 'BORN_IN', '{}'),
(4, 2, 4, 'LIVES_IN', '{"since": 2012}'),
(5, 1, 2, 'MARRIED_TO', '{"since": 2014}');The two indexes on tail_vertex and head_vertex are crucial. They let you efficiently find all outgoing edges from a vertex (query by tail_vertex) or all incoming edges to a vertex (query by head_vertex). This enables fast graph traversal in both directions.
Three Superpowers of Graphs
The property graph model gives you three remarkable capabilities that are difficult or impossible to achieve with other data models.
1. Any Vertex Can Connect to Any Other Vertex
There's no schema restricting which types of things can be related. A person can be connected to a city, a city to a country, a person to another person, a person to an event. This flexibility means you never have to ask "can I add this relationship?", you just add it.
2. Efficient Bidirectional Traversal
Given any vertex, you can quickly find both what it points to (outgoing edges) and what points to it (incoming edges). This lets you "walk" the graph in any direction, following chains of relationships naturally.
3. Multiple Relationship Types in One Graph
By using different labels for edges, you can represent many different kinds of relationships in the same structure. MARRIED_TO, BORN_IN, LIVES_IN, WORKS_AT, KNOWS, PURCHASED, REVIEWED, all coexist cleanly in one graph.
# Demonstrating bidirectional traversal
def find_all_connected(vertex, visited=None):
"""Find all vertices reachable from a starting vertex."""
if visited is None:
visited = set()
if vertex.vertex_id in visited:
return visited
visited.add(vertex.vertex_id)
print(f"Visiting: {vertex.properties.get('name', vertex.vertex_id)}")
# Follow outgoing edges
for edge in vertex.outgoing_edges:
find_all_connected(edge.head_vertex, visited)
# Follow incoming edges (bidirectional!)
for edge in vertex.incoming_edges:
find_all_connected(edge.tail_vertex, visited)
return visited
print("All vertices connected to Lucy:")
find_all_connected(lucy)Output:
All vertices connected to Lucy:
Visiting: Lucy
Visiting: Idaho
Visiting: London
Visiting: Alain
Visiting: Beaune
Graphs for Evolvability
One of the most practical benefits of graph databases is how gracefully they handle change. As your application evolves, your data model needs to grow with it.
Imagine starting with just people and locations. Later, you want to add:
- Food allergies (connect people to allergen vertices)
- Employment history (connect people to company vertices with date properties)
- Education (connect people to school vertices)
- Dietary restrictions based on allergies (query which foods are safe)
With a graph, each new feature is just new vertices and edges. You don't need to redesign your schema, create new tables, or migrate existing data. The structure naturally expands.
Handling Complex Hierarchies
Graphs excel at representing real-world complexity that would be awkward in relational tables. Consider geographic hierarchies:
- In the USA: City → County → State → Country
- In France: City → Département → Région → Country
- In the UK: City → Country (no intermediate levels)
A relational schema would struggle here. Do you add columns for département even though they don't exist in the USA? Do you make county nullable for French cities?
In a graph, you just use WITHIN edges, and each location has as many or as few levels as appropriate. The same WITHIN relationship works for any hierarchy depth.
# Different geographic hierarchies coexist naturally
# USA hierarchy
new_york = Vertex(10, {"name": "New York City", "type": "city"})
ny_state = Vertex(11, {"name": "New York State", "type": "state"})
usa = Vertex(12, {"name": "USA", "type": "country"})
# France hierarchy
paris = Vertex(20, {"name": "Paris", "type": "city"})
ile_de_france = Vertex(21, {"name": "Île-de-France", "type": "région"})
france = Vertex(22, {"name": "France", "type": "country"})
# UK hierarchy (simpler)
london = Vertex(30, {"name": "London", "type": "city"})
uk = Vertex(31, {"name": "UK", "type": "country"})
# All use the same WITHIN relationship type
within_edges = [
Edge(101, new_york, ny_state, "WITHIN"),
Edge(102, ny_state, usa, "WITHIN"),
Edge(103, paris, ile_de_france, "WITHIN"),
Edge(104, ile_de_france, france, "WITHIN"),
Edge(105, london, uk, "WITHIN"), # Direct, no intermediate level
]
# Query: Find the country for any city by traversing WITHIN edges
def find_country(city_vertex):
current = city_vertex
while True:
within_edge = next(
(e for e in current.outgoing_edges if e.label == "WITHIN"),
None
)
if within_edge is None:
return None # No more WITHIN edges
current = within_edge.head_vertex
if current.properties.get("type") == "country":
return current.properties["name"]
# Works regardless of hierarchy depth!
print(f"NYC is in: {find_country(new_york)}") # USA (2 hops)
print(f"Paris is in: {find_country(paris)}") # France (2 hops)
print(f"London is in: {find_country(london)}") # UK (1 hop)When to Choose a Graph Database
Graphs aren't the right choice for every application. Here's when they shine:
Choose graphs when:
- Relationships are as important as the data itself
- You need to traverse many connections (friends of friends, paths through networks)
- Your data is highly interconnected (many-to-many relationships everywhere)
- You want to model heterogeneous data (different types of things all connected)
- Your schema needs to evolve frequently without migrations
Stick with relational/document when:
- Your queries don't involve traversing relationships
- Data is mostly hierarchical (one-to-many)
- You need complex aggregations and analytics (SQL is better optimized)
- Your relationships are simple and predictable
Key Takeaways
Let's summarize what makes graph databases special:
The Model:
- Vertices represent entities (things)
- Edges represent relationships (connections)
- Both can have properties (key-value pairs)
- Edges have labels describing the relationship type
The Strengths:
- Natural fit for highly connected data
- Flexible schema that evolves easily
- Efficient traversal in any direction
- Mix different types of data seamlessly
The Mental Shift:
- Instead of "what tables do I need?", ask "what things exist and how are they connected?"
- Relationships are first-class citizens, not afterthoughts
- Queries often explore paths, not just retrieve rows
The graph model isn't just another way to store data, it's a different way of thinking about data. When you start seeing the world as vertices and edges, you'll find that many complex real-world problems become surprisingly simple to model and query.