Pages

Monday, September 26, 2016

Getting Acquainted with an Unknown Graph

Finding your way around an unknown graph can seem a bit ambiguous at first due to Neo4j being schema-free. Especially if you’re newer to graph databases and used to a relational database where you would simply open the ERD and have a look through the tables. Just because Neo4j is schema-free doesn’t mean that schema-like elements are not present. The Neo4j graph database schema
elements are composed of Label Names, Relationship Types, Indexes and Constraints on Property Keys. Let’s look at some techniques for getting aquatinted with an unknown graph.

Initial Unknown Graph Exploration

Here are a few quick tips to help build out the initial mental model of connections within the graph to get you started:
  • To observe the graph schema, the easiest area to look into is the browser panel in Neo4j. From there, you’ll be able to observe Label Names, Relationship Types and Property Keys. Each one can be clicked and will immediately load a maximum of 25 associated results. These results can provide a basic starting point to help you navigate your way through the graph.
  • To understand the Indexes and Constraints applied to the graph database, which will begin to shape some of the business rules around the data in the graph, you’ll want to execute the query, “:schema” within the browser to observe all the index and constraint rules for the Labels and Relationships with their respective Property Keys.
  • To gain a better understanding of the amount of nodes in the graph as a whole or within a certain

When Your Data Is Not a Graph

I often get asked at the Neo4jtrainings and meetups about which types of data or use cases a graph database doesn’t handle. While graph data structure models the world we live in exceptionally well there are some use cases and scenarios where your data is not a graph – or more likely not ONLY a graph.

The Neo4j graph database is used for many use cases and influences the situations of current world leaders by representing effectively how these are connected, allow fraud rings and networks to be surfaced through their common connections, enables business analyst to understand the relationships within their data for better business insights, and help users increase their chances of finding pertinent documents within a network.
Any of these connected data examples benefits tremendously from a native graph database like Neo4j. At the same time, there are other scenarios where your data is not a graph.

Not Only a Graph Rather than Not a Graph

Here are a examples to help you think through data and understand if you’re dealing with data that would benefit from being represented as a graph:
  • When data entities have no contextual importance via their connections with other data entities
    For instance, if you’re building some kind of calculator, the housing medium for your numbers, equations, and base data won’t likely be taking advantage of powerful contextual relationships.
    Another scenario could if you’re tracking your personal budget each month and simply want to see if you spent more than you made at the end because in this case your goal isn’t to understand the relationships of where that money was spent.
    However, as soon as you want to be able to ask questions about the items in that budget and see more details about the expenditure and which person within your household executed the transaction, where it occurred and whom they were with at the time. Now you’ve got a graph problem.
  • When data involves large text files, web pages, or JSON documents
    If your scenario requires bulk storage and direct lookup retrieval of large string data without any requirement on the connections between documents then you’ll be better of using a pure

Thinking in Patterns in Neo4j with Cypher

Thinking in patterns is the key to interacting with a graph database like Neo4j. One of the main challenges I see with those with deep relational database experience when transitioning to a graph
database is the use of a relational approach for querying data. To query a graph database most efficiently there is a need to update the mental model for how database query interactions are approached. We’ll look at some examples of this and making this transition to thinking in patterns.
The overuse of relational query techniques most often manifests itself in a tendency to use WHERE clauses exclusively for filtering and comparisons from multiple complete sets of nodes, rather than enabling Neo4j to begin ignoring nodes as it expands the starting set in the MATCH clause. The goal of querying in the Neo4j graph database should be to get to the smallest starting set as quickly as possible to maximize the benefits of constant-time, index-free adjacency traversals within the local network around each starting node.

Thinking in Patterns Starts at Data Modeling

In order to query Neo4j in a pattern-centric manner that is sympathetic to the data layout the data model must consider these patterns that are important. One key in modeling the data is to know that each relationship off a node is literally a memory pointer to another node and the relationships around a node are grouped by their type. This allows constant time traversal and targeting from one node to a set of nodes all connected by a single type. Let’s look at an example…
Assuming we want to see individuals from Wooster, Ohio that were actors in a movie and see if any of them worked with any of the same directors. The non-normalized RDBMS approach to model this could be putting isActor, isDirector, city, state and movies properties on the Person node. Here’s a bit of an extreme example of this could look:
1
2
3
4
MATCH (actor:Person) WHERE actor.isActor = true AND actor.state = “Ohio” and actor.city = “Wooster”
WITH actor, actor.movies AS movies UNWIND movies AS movie
MATCH (director:Person) WHERE director.isDirector = true AND movie IN director.movies
RETURN director, collect(person) AS persons;
The issue with such approach is that it requires you to go through each node within the Person label to find 

Jack of All Data; Master of None

In building any technology there are always trade-offs when squeezing maximum performance out of the implementation so knowing what that guiding light is for a technology becomes very important. If you’re trying to do everything, or even too many things, odds are none if it will be great because you can’t go all out for one primary objective. Your focus will be split.
This becomes especially important when evaluating the graph database landscape where you have implementations that range from
  • doing only edge caching or single hop queries by using focused indexing strategies
  • to graph layers on top of existing database that still will suffer during complex graph traversals due to the underlying data storage implementation, which also restricts them from being able to guarantee the consistency of a relationship between two nodes on write
  • to various hybrids such as combining document with graph that try to do it all, but ultimately end up doing neither well
  • to fully native graph implementations that are designed for optimal graph traversal and navigation throughout the set of connected data
It’s key to understand these aspects of the underlying database implementation and the guiding light. Be wary of graph hybrids because they have a split focus on where they optimize. The saying, “Jack of trades; master of none” isn’t just true for people. If you’ve been creating technology for any period of time you know the tradeoffs you make to do everything will result in none of it be as excellent as it could. I think this realization is why we’re seeing so many specialized data storage technologies that focus on doing that one thing really well.
Picking a fully native graph database offers granular control on operations from transactional behavior to data organization, to data consistency and reliability to cluster and driver protocols. With having maximum control over each graph database aspect, fine-tune graph traversal optimizations can be done. The sympathetic choices to graph principals for reliability and ACID transactional support can be

Graph Advantage: Business Recommendation Engines

The most common interactions we have with recommendations today involve social, the people we may know, and retail, the products we may also like, but some of the more interesting recommendation engines are the ones that operate internal to an organization providing business
recommendations around strategy, direction and execution. Designing and building business recommendation engines that leverage a comprehensively connected data view within an enterprise can provide many competitive advantages. These advantages can include
increased efficiencies for how subject matter experts on the business domain should prioritize their daily efforts as well as helping the organization transition to being a more data driven enterprise with these insights guiding internal business use cases that go deep in offering a business-based direction on a holistic data view.

Business Recommendation Engines Guide Engagement

Whether it involves leveraging direct or indirect customer feedback through social media platforms, business supply chain details from the manufacturing plant to the logistics network, or inferring relationships according to an activity utilizing the network to determine the confidence in that assertion, the Neo4j graph database offers the significant advantages when it comes to making an enterprise data driven through business recommendation engines.
A known strategy for business recommendations internal to business is the design of pattern-based recommendation algorithms. Such recommendations are dynamic in nature as data flows through the system and aid business analysts that need help to prioritize their time to filter data by reviewing them in order of those that rank highly enough to be focused on first.
For instance, if you’re intent of finding insurance, medical, or financial fraud, there are a number of understood patterns associated with fraud within the data, which can be used to proactively pause transactions and rank them by priority for closer examination. Such detection is effective for

Neo4j is Designed to be Your Source of Truth Database

When introducing the idea of using Neo4j within an enterprise one common assumption is that because Neo4j is a graph database it must not provide ACID-compliance that RDBMS has delivered for so long. This assumption isn’t unfounded considering most of the NoSQL database solutions
have moved towards a performance and availability at all costs model. But it’s a fact: Neo4j is a fully ACID-compliant and transactional database intended to be a secure and safe source of truth database for your enterprise.
Neo4j is a reliable, scalable and high-performing native graph database that’s suitable for enterprise use. Its proper ACID characteristics is a foundation of data reliability. Neo4j ensures that operations involving the modification of data happen within a transaction to guarantee consistent data.
This is especially important in graph because the paradigm for writing data reliably shifts when you introduce the concept of a relationship that is a primary entity within the database. To write a relationship reliably requires locking both the nodes it’s connected to in order to guarantee that they both agree on that relationship between them.

What is ACID?

For those that may not know or need a refresher as to what that acronym includes, heres a quick summary. ACID is a set of properties that guarantee database transactions are processed reliably and a transaction is a single logical operation on the database.
  • Atomicity requires that a transaction is all or nothing, which means if a portion of a transaction fails, the state of the database is left without changes.
  • Consistency ensures any transactional operation will leave the database at a consistent state as defined by constraints, etc applied to the database.
  • Isolation guarantees concurrent transactions will execute as if they were performed sequentially requiring that within a transaction, altered data won’t be accessed by other operations prior to commit.
  • Durability means that once a transaction has been committed it will remain even if the database crashes immediately thereafter.

Neo4j ACID Overview

To completely preserve integrity of data while ensuring adequate transactional behavior, Neo4j supports 

Read More......

Easier Data Migration with Neo4j

Data migration is one of the necessary evils involved with keeping a database aligned with the evolving needs of the business and applications using it. With the increasing demand for enterprises of all sizes to iterate more quickly and drive change from within the data migration conversation
becomes much more frequent. Data migration procedures are something that can take a very long time or not even be feasible depending on the size and structure of the data in a database.
Data migration is not just an enterprise issue. Startups are changing at even more rapid rate while they iterate on their product(s) and business model(s) trying to figure out exactly what they need to be. Being able to perform rapid, low risk data migrations with minimal impact to existing applications using the database is one great benefit of Neo4j with it’s flexible schema-free data model.

Neo4j Graph Database Data Migration

As a native graph database Neo4j provides several advantages when it comes to managing data migrations:
  • Neo4j treats relationships as primary entities within the database which means you can add a new relationship to connect certain nodes in a new way without needing to migrate a table schema enforcing a new foreign key along with inserting all the corresponding references into each row in the table or building a JOIN-table.
  • Neo4j uses labels to index common nodes. A label is like a tag and node can have any number labels. Labels are useful in a data migration because while they associate nodes together under a certain type they don’t bring with them, by default, a schema definition containing properties, data types and the like that must be adhered to by any node given that label. This means you can temporarily group a set of nodes with a certain label prior to being migrated and once each node is migrated move it to the new label or remove the label altogether to indicate it has been migrated successfully.
One of the most common data migrations that occurs within the Neo4j graph database is changing a concept from being stored as a property or an array value to a node with relationships. In Neo4j Nodes can be inserted to move a concept that was originally a property to be a primary entity in the graph to which other nodes can be connected. This can be done using MERGE to synthesize all the occurrences of a property’s unique value into a single node to represent that thing. Then all the nodes related to that property can be connected with their contextually relevant relationship.

Data Migration Example 

Neo4j 3.0 Welcomes a New Era for Graphs

At GraphConnect at the end of April the Neo4j team announced the release of Neo4j 3.0. We had the opportunity to celebrate this release at The Honest Company last night with the Graph Database LA Meetup group where I shared many of these highlights from the official Neo4j announcement. The first release in the 3.x series ushers in a new era of scalable yet reliable graph database technology with, this version of Neo4j based on a completely redesigned architecture that offers enhanced
developer productivity, and varying deployment options at a massive scale.

3 Things to Expect in Neo4j 3.0

Here’s what to be expected with the new Neo4j 3.0:
  • Redesigned internals that eradicates limits on node numbers and restoration of indexed and stored properties and relationships.
  • Official support for language drivers via Bolt binary protocol and Java Stored Procedures support, while enabling full-stack developers for powerful application creation.
  • Streamlined deployment structure and configuration for deploying Neo4j in the cloud or on premise.

Diving Deeper into Neo4j 3.0

Here’s an in-depth look of what’s new in the latest version:
  • Unlimited Graph Storage
    By far the biggest headline in the release. Graph to size infinite – challenge accepted! Dynamic pointer compressions expands the available address of Neo4j as needed, making it possible to house graphs regardless of size. Such features can be seen in the Neo4j Enterprise Edition, which complements its scale-out clustering features.
  • Enhanced Cost-based Optimizer
    This is a huge one for us because most of the Cypher we write are complex MERGE operations so we need as much write performance as possible. A cost-based optimizer has been enhanced by adding support for write queries. The new parallel indexes capability within the optimizer also allows for swifter performance population of indexes.
  • Language Drivers & Bolt
    Bolt is great for Neo4j developers because it means better performance of the applications they build all the way around and enables them to go bigger and do more with Neo4j. Bolt is a connection-driven protocol for graph access. It utilizes a portable binary encoding over web sockets or TCP for lower latency and enhanced throughput. It comes with built-in security that enhances both graph database performance and developer experience.
    Official language drivers have been released to complement Bolt, which also encapsulate the protocol. These drivers include .NET, Java, JavaScript, and Python.
  • Java Stored Procedures
    These new and powerful performance facility offers low-level and direct graph access, giving you a way to conduct an imperative code when you want to conduct complex work within the database. Neo4j comes bundled with built-in procedures as part of the APOC project. There are some very useful procedures in the APOC project so you should definitely check it out. One that stood out to me as immediately useful is the last one in the list that makes periodic commit available for use outside of LOAD CSV.
  • Neo4j Browser Sync 

Graph Advantage: Moving Beyond Big Data

Enterprises today are amassing data at a faster rate than ever before and largely this data flows into a data warehouse or data lake or just individual databases where it sits. With enterprises struggling to leverage it in a holistic and meaningful way for their business, the appeal of “big data” is waning.
So how do enterprises begin moving beyond big data?

Data Driven and Connectedness

In the last years we’ve seen enterprises acting on the acknowledgement that organizations need to be more data driven, but there is still a gap of how to really do that well. In the years ahead we’ll see an increasing push by organizations implementing new technologies promising to get them there. It’s unfortunate, but I fear many organizations will be left disappointed. Disappointed not by the technology getting in place successfully, but by the lack of real business value derived from it.
Many technical architects and business people alike have been captivated by the size and speed of data. However, when it comes to knowledge and understanding these are not the most important parameters. Technologies that are scale first place the focus in the wrong area for solving knowledge and understanding problems. What do you gain by writing 1 billion rows per day into redshift if you don’t have that data connected in a meaningful way to rest of your organization? (Now there is definitely a time and place for just getting data persisted, but that’s a very different scenario than a BI/Recommendation/Analytics conversation about driving business understanding and decision making). Ideally you’d be doing both: getting your data persisted and connected at the same time. Too many enterprises are simply collecting and hoarding data at this point.
Being data driven with a concrete understanding of your organization is completely dependent on connectedness. It’s all about what things are connected to and understanding the ways they’re connected. This is the essential foundation of any business intelligence, cognitive, or predictive analytics. Without understanding at the core there is no movement beyond just having vast amounts of “big data”.

Getting Connected with Neo4j

Neo4j provides and advantage in managing data due to the ease in which Neo4j can be brought into your data architecture and the short period of time needed to start seeing benefits of connecting data from your big data deluge. A flexible graph data model that is very representative of the real-world is what Neo4j has 

Read More......

Neo4j is for the Non-Technical

Neo4j unifies organizations across departments and across teams, both technical and non-technical, 
enabling a greater level of understanding and clarity in communication than previously possible. A Neo4j graph model is whiteboard friendly and allows everyone from business to engineering groups to speak the same language of connections. Communicating in contextually relevant connections that bring together business concepts reduces the potential for misunderstandings that cause delays and rework later.

Neo4j Connects Your Organization by Connecting Your Data

The world today is highly connected. Graph databases are whiteboard friendly and effective in mimicking erratic and inconsistent relationships through intuitive means. They help provide insights and understanding by creating connections within complex big data sets. As enterprises become increasingly data driven it is essential that all individuals, especially the non-technical groups have the ability to collaborate with engineering in a more integrated fashion. Neo4j removes the intimidation factor of technology typically required to deal with complex data and enables more unified collaboration because we all can relate to connections.
There a number of reasons why both technical and non-technical teams within an organization could all agree on Neo4j:
  • It offers incredible performance
    The more connected data gets in typical RDBMS and NoSQL databases, the faster performance query degrades. It’s fact that data within all organizations is growing rapidly in size and connectedness. Neo4j provides constant time navigation through your connected data whether your one level deep or ten levels deep.
  • It guarantees data reliability
    Neo4j is fully ACID-compliant and transactional with a guarantee of referential integrity, which means that once your data is written it cannot be lost and two nodes will never disagree on the relationship between them.
  • It has tremendous flexibility
    With a Neo4j graph database, non-technical business personnel, IT, and data architect staff are united and move at a speed of business due to the schema and structure of a graph model that “flexes” as business solutions and marketing industry changes. Your team won’t have to exhaust themselves modeling your domain in advance. Instead, they can make additions to the current structure without interrupting current functionality.
  • It’s very agile  Read More......