Pages

Monday, August 8, 2016

Data Validation and Testing Your Graph Data State

Data validation lets you gain insight on the quality of your data assets. This involves grading your organization consistently to monitor your progress. When testing data, it’s essential to set metrics, as well as succeeding steps and goals to drive improvements. Data testing is even more crucial when loading data into a schema free graph database like Neo4j. So how do we it efficiently and continuously?

Schema-Free Nature of Neo4j and Data Validation

Neo4j is schema-free by nature, but does provide some schema concepts that can be enforced. This means, when your data flows via your Neo4j data pipeline and graph, there won’t be enforced constraints on data type. This also means Neo4j will try to pick the best data type when a property is being written if it isn’t specifically enforced for variations in numerical precision and all numerical values that are desired to be stored as strings. So if you happen to load data into Neo4j using LOAD CSV and you write a property consisting only of numerical value and want it stored as a string, then it’s essential you always wrap it in the Cypher toString() function to ensure you won’t end up with properties consisting of varying data types.

Data Validation with Postman, REST Requests, and Newman

For large scale automated data validation it’s beneficial to make use of a REST-client like Postman to create a test collection and validation requests that can run across the graph as new data flows into your Neo4j graph database to ensure it remains in a valid data state.
The Neo4j graph database features a REST api which can be utilized to query the graph. This can be to create a collection of REST requests that query the graph using Cypher with data validation questions like, “Does every Actor have an ACTED_IN relationship to a Movie?” which, when using

No comments:

Post a Comment