Pages

Monday, August 22, 2016

Thinking in Patterns in Neo4j with Cypher

Thinking in patterns is the key to interacting with a graph database like Neo4j. One of the main challenges I see with those with deep relational database experience when transitioning to a graph database is the use of a relational approach for querying data. To query a graph database most efficiently there is a need to update the mental model for how database query interactions are approached. We’ll look at some examples of this and making this transition to thinking in patterns.
The overuse of relational query techniques most often manifests itself in a tendency to use WHERE clauses exclusively for filtering and comparisons from multiple complete sets of nodes, rather than enabling Neo4j to begin ignoring nodes as it expands the starting set in the MATCH clause. The goal of querying in the Neo4j graph database should be to get to the smallest starting set as quickly as possible to maximize the benefits of constant-time, index-free adjacency traversals within the local network around each starting node.

Thinking in Patterns Starts at Data Modeling

In order to query Neo4j in a pattern-centric manner that is sympathetic to the data layout the data model must consider these patterns that are important. One key in modeling the data is to know that each relationship off a node is literally a memory pointer to another node and the relationships around a node are grouped by their type. This allows constant time traversal and targeting from one node to a set of nodes all connected by a single type. Let’s look at an example…
Assuming we want to see individuals from Wooster, Ohio that were actors in a movie and see if any of them worked with any of the same directors. The non-normalized RDBMS approach to model this could be putting isActor, isDirector, city, state and movies properties on the Person node. Here’s a bit of an extreme example of this could look:
1
2
3
4
MATCH (actor:Person) WHERE actor.isActor = true AND actor.state = “Ohio” and actor.city = “Wooster”
WITH actor, actor.movies AS movies UNWIND movies AS movie
MATCH (director:Person) WHERE director.isDirector = true AND movie IN director.movies
RETURN director, collect(person) AS persons;
The issue with such approach is that it requires you to go through each node within the Person label to find the intersection of the values within the movies array for the Person nodes that have been determined to be 

No comments:

Post a Comment