Data Modeling: July 2016

Monday, July 25, 2016

Neo4j Data Pipeline

Every enterprise has a constant flow of new data that needs to be processed and stored, which can be done effectively using a data pipeline. Upon introducing Neo4j into an enterprise data architecture it becomes necessary to efficiently transform and load data into the Neo4j graph database. Doing this efficiently at scale with the enterprise integration patterns involved requires an intimate understanding of Neo4j write operations along with routing and queuing frameworks such as Apache Camel and ActiveMQ. Managing this requirement with its complexity proves to be a common challenge from enterprise to enterprise.

One of the common needs we’ve observed over the years is that an enterprise that wants to move forward efficiently with a Neo4j graph database needs to be able to rapidly create a reliable and robust data pipeline that can aggregate, manage and write their ever increasing volumes of data. The primary reason for this is to make it possible to write data in a consistent and reliable manner at a know flow rate. Solving this once and providing a robust solution for all is the driving force behind the creation of GraphGrid Data Pipeline.

GraphGrid Data Pipeline

The GraphGrid Data Platform, offers a robust data pipeline that manages high write throughput to Neo4j from varying input sources. The data pipeline is capable of batch operations management, keeps highly connected writes, manages data throttling, and carries out error handling processes.

Concurrent Write Operation Management

GraphGrid’s data pipeline handles concurrent write operations for any incoming data via strategies involving preservation of transactional integrity and transaction batch sizing and data throttling. A majority of writes to Neo4j work well for concurrent write operations, but in scenarios where dense nodes are involved sequential strategies can be utilized to avoid excessive write retry processes. The data pipeline also

Read More......

Keeping Your Data Current and Flowing into Neo4j

For an enterprise to excel today a key aspect centers around utilization of the data-based business assets. To grow and succeed as a whole an enterprise must enable the usability, quality, and constant flow of its data into a connected state. Sometimes, an enterprise with a data architecture may have to deal with a complex life cycle while undergoing varying transformation processes. This makes it difficult to track the origin and flow of data as well as managing changes, audit trails, history, and a host of other critical processes.

Distributed Graph Database Platform with Neo4j

The dynamics of an increasingly distributed and connected world are shining the spotlight on a new generation of database focused on more efficiently modeling, storing and querying the connected nature of the data enterprises deal with in the real world. But as graph database usage grows, solving the issue of handling large volumes of read and write operations at scale will pose a serious challenge for the growing market.

Graph databases like Neo4j are perfect aggregation and landing place for data across the enterprise because it effectively deals with challenges presented with variations in data. As a leading graph database, enterprises are relying on Neo4j to effectively connect data for usage by real-time enterprise applications. The big challenge though is efficiently and continuously flowing data into your Neo4j graph database.

To do this effectively data connectors need to be utilized to perform ETL. The data extraction will come from your existing data source such as a MySQL database. This extracted data is then transformed to Cypher or a CSV format for use with LOAD CSV; both which can be routed and flowed effectively

Read More......

Introducing a Graph Database into Your Data Architecture

A graph database is capable of offering long-lasting competitive advantages for organizations worldwide from startups to the largest enterprise. Interest within the enterprise sector surged dramatically the past two years and Forrester recently projected that graph databases will reach over 80% of leading enterprises within two years. Graph databases provide business benefits because graph databases make use of intuitive principles of the connections experienced between everything and everyone as a realistic representation of the way the world interacts. Even with all the benefits discussed in Graph Advantage: Why Every Enterprise Should Use a Graph Database, the introduction of a graph database into an enterprise, especially one that may have just finished getting their Hadoop implementation into production, can seem risky.

Graph Database Data Model Flexibility

One of the great strengths of a the Neo4j graph database is it’s schema free flexible data model, which it turns out provides a very low-risk entry point as way for an enterprise to begin to explore the benefits of using a graph database. The Neo4j graph database is made to model and navigate connected data with high performance. The Neo4j graph database processes and stores data within the node and relationship structure defined by the written data, making it flexible enough to accommodate the many data models of the existing databases within an enterprise.

Enterprise Data Challenge

We know it’s not reasonable for an enterprise to go all in on a graph and try to replace existing SQL or NoSQL databases overnight and certainly a Hadoop data lake isn’t going to be replaced by a

Read More......

Graph Advantage: Why Every Enterprise Should Use a Graph Database

Data in the enterprise today is a bi-directional, always-flowing, continuously changing business asset. Yet it remains largely segmented and disconnected. For enterprises to begin converting their data into business value this data must be connected, understood and acted upon.

Enterprise Need for Graph Databases

Enterprise data stored in graph databases with explicit nodes and edges provides competitive advantages to organizes adopting graph databases in all industries, beyond the common use case of social media companies today. With an increasing number of connected devices producing data and the need for an advancing enterprise to be data driven in their decision making, creates a deep necessity for an enterprise to connect and understand their data in a meaningful way. When data is connected and accessible across the departments of an enterprise by using a graph database like Neo4j, their teams will benefit from a more comprehensive awareness of the business and make more informed decisions to help the enterprise grow.

Today, CIOs and CTOs aren’t just after large data volume management. They also need to gain insight and direction from their current data. In this case, relationships between data points are a lot more important than the individual data points. To effectively leverage data relationships, enterprises should rely on a graph database that treats relationship information as a first-class citizen. Additionally a graph database like Neo4j does more than just store data relationships effectively, it also is flexible in expanding the relationship

Read More......

Monday, July 18, 2016

How Do I Load Data Into Neo4j?

The ability to load data into Neo4j is enabled through a variety of data loading APIs and tools. For processes where big data sets flow in or out of the Neo4j graph database, consideration needs to be taken to batch these read and write operations into batch sizes that are sympathetic to the master instances memory capacity as well the transactional overhead of data writes.

Neo4j provides a number of APIs to import big data sets including:

the Cypher transactional endpoint, which uses the Cypher query language and is simple to utilize from any programing language because files containing CQL can be structured to bulk load data and write consistently.
the Cypher data import capabilities exposed through LOAD CSV enable CSV files from a specified remote or local URL to be loaded and batching into desirable transaction sizes for importing massive data efficiently.
the batch inserter which removes transactional overhead, but does require the database to be offline

Load Data Transactionally

To load or update data in Neo4j with an efficient write throughput, a reasonable transaction size needs to be consistently maintained based on the complexity of the writes being performed. Smaller-than-usual transactions (consisting of one or a few updated elements) suffer from transaction-committed overhead. Larger-than-expected transactions (involving hundreds of thousands or millions of elements) can lead to higher memory for the transient transaction state. Therefore, an adequate transaction as we’ve seen should consist of anywhere from 1k and 10k elements.

Importing Initial Data

When it comes to large initial imports consisting of million or billions of nodes, having a transactional process doesn’t lead to maximum write performance. To saturate a complete write speed, it’s important to bypass transaction semantics and create your initial data store in a “raw” manner via a

Read More......

Cypher is Awesome

Cypher is a declarative pattern matching language created by Neo4j for the purposes of describing graph data representations effectively. Cypher is considered to be one of the most powerful features fore effectively expressing graph database traversals for reading and writing graph database data into Neo4j.

Cypher makes it capable for queries to do something like: “bring back my friends’ friends right now” or “give me back all pages this page is linked to within the last day” in the form of several code lines. As such, graph database queries and operations across all languages and integrations with Neo4j are able to query in a consistent manner.

The reception to Cypher has been so great that Neo4j launched the OpenCypher initiative to make Cypher the SQL for graph databases. The organizations that are joining OpenCypher are very important to the graph database movement because supporting a common graph database query language means there will be more utilization and commonality across graph database implementations.

Cypher Query Syntax

Cypher looks a lot like ASCII art because it uses textual pattern representations to express nodes and relationships. The nodes are surrounded with parenthesis which appear like circles and the relationships consist of dashes with square brackets. Here’s an example: (graphs)-[:ARE]-(everywhere)

If we want to find all people and their preferences, the query will involve identifiers for the person and thing. A pattern like “(person)-[:LIKES]->(thing)” could be utilized so it can be referred to later say, for instance, to

Read More......

Querying Your Neo4j Graph Database

There are different ways for querying, storing, managing and retrieving data in the Neo4j graph database. For a majority of users querying with Cypher is a great experience when it comes down to performing efficient and effective graph database traversals and interactions within the graph data model. For specific applications that entail further control on how graphs can be stored and queried in high performance, multi-threaded manner, a native Java API offers low-level access to graph database for granular control over the traversal and retrieval from the Neo4j graph database. When making use of the Java API, you’ll realize that you’re given great freedom and flexibility to communicate to the Neo4j graph database on how to best query your data for optimal results.

Querying with Cypher

The Cypher query language is an innovative SQL-like syntax designed for graphs that takes a more declarative approach. That means, you can tell Neo4j what you desire — not based on how to acquire it. When running a Cypher query, you’re expressing to the graph database what you need from it. In return, Neo4j has a compiler that translates the query into an executable plan describing data operation sets. The plan is conveniently arranged in a way that the obtained data from the graph is processed in a manner for each operation until a result is returned from the Cypher query.

The usual way for communicating with Neo4j consists of sending a Cypher query and parameters via an initiated POST request to the Neo4j database server. Frameworks or libraries managing wrappers around the REST API Neo4j methods from a programming language are called “drivers.” These drivers function by moving numerous queries and results over the network. Neo4j then operates by making further translation of the declarative syntax of Cypher into an executable plan.

Cypher Querying Supported by Major Language Drivers

Neo4j currently offers driver support for major programming languages. These include Python, Ruby many others. These language drivers make use of the same APIs and are conveniently made available. Thanks largely to the Neo4j community, there are Neo4j drivers present in almost all major programming language with a

Read More......

Native Graph Database Benefits

Choosing a native graph database provides granular control over all operations from the transactional behavior to on-disk data organization to clustering and driver protocols. With complete

control over every aspect of the native graph database, fine-tune graph traversal optimizations can be performed and choices sympathetic to graph principals for reliability and ACID transactional support can be made and implemented without restriction.

Durability and certainty of the graph database records are crucial to preserve. Choosing reliability and making sure failed graph database transactions roll back maintains a consistent data state in the native graph database.

Graph Database Reliability

For graph databases, reliability is far more essential than availability since the connectedness of the data make them more highly demanding than aggregate databases. The issue of placing a graph later over an existing datastore will boil down to how data is written and which record is factual since within a graph there are two perspectives: the node from each side.

If mutations are made through multiple requests simultaneously, it’ll lead to an uncertain relationship status. A non- native graph database will resolve this by means of complex algorithms, but in the end, they simply don’t work, leaving you with erroneous data. Such incorrectness can spread through the graph as well as other parts of your application that depend on relationships from a certain node perspective. If data correctness is your priority first and foremost, then it’s recommended to go

Read More......

Native Graph Databases versus Non-Native Graph Databases

As with any graph database management system, native graph databases revolve round the idea of storage and use of question engines, that deals specifically with connected knowledge persistence and traversal queries. The database question engine is responsible of in operation queries, modifying, and extracting knowledge. Native graphknowledgebases showcase the traversal of the graph data model paired with strategic index usage for locating thebeginning nodes for such operations. Storage involves how knowledge are often physically

Read More......

What is a Graph Database?

A graph database, or otherwise known as “graph-oriented database,” is a particular form of NoSQL database that makes use of a graph to query, house, and map out relationships. It consists of databases which specifically serve to store data structures that are graph-oriented.

A graph database is an example of a storage solution that shows where linked elements are

connected to each other in the absence of an index. Groups of a specific entity can be accessed by means of dereferencing a pointer.

There are various kinds of graphs that can be stored. They can range from a single undirected graph to property graphs to hyper-graphs.

Read More......

Sunday, July 3, 2016

Using Neo4j Cypher MERGE Effectively

One of the areas in Neo4j and Cypher that brings the most questions and discussion when I’m giving Neo4j trainings or working with other engineers building with Neo4j, is around how to use Cypher MERGE correctly and efficiently. The way I’ve explained Cypher MERGE to all our engineers and all the training attendees is this.

There are a few simple things to understand about how Neo4j handles Cypher MERGE operations to

avoid undesired or unexpected behavior when using it.

1. The Cypher MERGE operation is a MATCH or CREATE of the entire pattern. This means that if any element of the pattern does NOT exist, Neo4j will attempt to create the entire pattern.Read more…

Data Modeling with Neo4j: “School Immunization in California” CSV to Graph

1 state, over 9 million children, and 42,981 rows of CSV immunization data. After many rough drafts, I was finally able to land on an efficient and aesthetically pleasing way to map out the immunization data of children in California (found and downloaded online from the California Department of Education*).In this post our goal is to walk through the data modeling process to show how this CSV data can be connected meaningfully with Neo4j. What makes this data so interesting is its varying percentages-all spanning over two separate school years.

degrees of location, three distinct grade levels, and a dense record of immunization numbers and

After successfully mapping the data, I could then easily explore it, answering questions such as: Where in California has the lowest amount of children vaccinated?, Are less parents vaccinating their children in 2015 compared to 2014?, and Which age group is more up to date on its vaccinations?. Furthermore, I was able to clearly visualize the data in small and large quantities using the neo4j graph.

Data Modeling with Neo4j: “Chemicals in Cosmetics” Step-by-Step Process

Take this unique dataset in a CSV format and transform it into a graph using Neo4j. Using the Neo4j model, we can compact the vast number of relationships and properties within the Chemicals in Cosmetics dataset, creating more meaningful and easily applicable data.Read more…

Neo4j for Your Modern Data Architecture

We recently sat down with Neo Technology, the company behind Neo4j, the world’s leading graph database, to talk more about our role as a trusted Neo4j solution partner and to dive deeper into our Neo4j Enterprise offerings.

Talk to me about GraphGrid. What’s your story?
So to understand GraphGrid, let’s dive into a little back story: We co-founded AtomRain nearly seven years ago with the vision to create an elite engineering team capable of providing enterprises worldwide with real business value by solving their most complex business challenges.Read more…

MySQL to Neo4J

You’ve probably heard that an effective way to take move data from an existing relational database to graph is using LOAD CSV. But what exactly does the process of converting all or part of the database tables from MySQL to Neo4j using LOAD CSV involve start to finish? We’ll be using the Mysql5 Northwind database as our example. There is a Neo4j tutorial that has a similar explanation using Postgres and discusses the graph modeling aspects as well. So definitely good to read through that. Here we’ll focus on MySQL and the CSV export in preparation for the Neo4j import.
Read more…

Modeling Time Series Data with Neo4j

I’ve been receiving many questions recently at trainings and meetups regarding how to effectively model time series data with use cases ranging from hour level precision to microsecond level precision. In assessing the various approaches possible, I landed on a tree structure as the model that best fit the problem. The two key questions I found myself asking as I went through the process of building the time tree to connect the time series events were, “How granular do I really need to make this to efficiently work with and expose the time-based data being analyzed?” and “Do I need to generate all time nodes down to the desired precision level?” The balance that needs to be considered isRead more…

Pages

Monday, July 25, 2016

GraphGrid Data Pipeline

Concurrent Write Operation Management

Distributed Graph Database Platform with Neo4j

Graph Database Data Model Flexibility

Enterprise Data Challenge

Enterprise Need for Graph Databases

Monday, July 18, 2016

Load Data Transactionally

Importing Initial Data

Cypher Query Syntax

Querying with Cypher

Cypher Querying Supported by Major Language Drivers

Graph Database Reliability

Sunday, July 3, 2016