Pages

Monday, October 31, 2016

Data Modeling with Neo4j: “Chemicals in Cosmetics” Step-by-Step Process

Take this unique dataset in a CSV format and transform it into a graph using Neo4j. Using the Neo4j model, we can compact the vast number of relationships and properties within the Chemicals in Cosmetics dataset, creating more meaningful and easily applicable data.

Meet the Data

Since 2005, all Californian cosmetic companies are required to provide the information of any cosmetic product that contains chemical(s) that cause or are suspected to cause cancer, developmental birth defects, or harm to the reproductive system. This list, of the cosmetics and chemicals in question, is openly provided on the California government website*. Even more intriguing, are the numerous properties about the products and chemicals, such as important dates and times, whether the product is still being sold, whether the chemical is still being used, and much more. The interconnectedness of this data illustrates the power of the property graph model and its ability to succinctly store information by prioritizing both the nodes and the relationships.
Below is a list of the more ambiguous/allusive headers found in the CSV. Going through and understanding the headers is a vital step for developing an accurate graph model.Headers & Descriptions:1. CDPHId (#): CA Dpt. of Public Health identification number for product. May occur more than once.
2. CSFId (#): CDPH identification number for CSF.
3. CSF: Color, scent, and/or flavor. Not all products have specific colors, scents, or flavors.
4. CompanyId (#): CDPH internal identification number for company.
5. PrimaryCategoryId (#): CDPH identification number for category.
6. PrimaryCategory: Type of product (13 primary categories).
7. SubCategoryId (#): CDPH internal identification number for subcategory.
8. SubCategory: Type of product within one of the 13 primary categories.
9. CASId (#): CDPH identification number for

No comments:

Post a Comment