Project Description

Digital Farming Training Machine Learning Graph Database

Graph Database Verification and Optimization

Project Goal

The project focus was to leverage the graph data to train machine learning models. For this reason different scenarios for data export, import and automated processing pipelines were established. In this process the whole graph schema and content was checked and validated against the SQL database to derive optimization approaches. Also different options for enhancing the graph schema with additional data for further analysis were evaluated. Especially the integration of weather data, which was not covered in the SQL database, was a goal for this project.

An efficient way to extract data from the graph and write it back via an Python or R development environment was established. With this set up machine learning models can be trained dynamically and applied on the graph data.

Graph Datenbank used tools SQL Neo4j Python and R

The neo4j graph is fed by an SQL database and should be used for flexible queries to forward data to development environments.

Challenges

A generalized documentation of the complex graph structure is challenging. Also, standard queries and aliases for the entities needed to be defined and documented. Building up a graph schema offers a lot of complex options and can be done in very different ways.

Applied Methods (Implementation)

  • Verification: By exporting data from the SQL and the Graph database to the same table-based data format, a full comparison of both databases could be performed. Differences were reverse engineered to their root and recommendations for changes in the graph schema were derived to make sure no information is lost in the future.
  • Querying and interfacing: Guidelines for queries for optimal data extraction, as well as query templates and standard aliases were defined to unify the data extraction. To interface with external development environments, a python BOLT driver was implemented in a machine learning pipeline.

Project outcome

In this project a recommendation catalog was created, including:

  • Syntax and datatype changes
  • Schema alignments for optimization of frequent queries to increase performance
  • Database optimizations
  • Corrections of potential schema deadlocks
  • Integration of weather data and further data sources

As an interface to development environments a Python / R pipeline was built up. By integration into analysis tools the graph can be queried on demand so that the results can be leveraged to train machine learning models and apply them directly on more data.

Category

COMPUTATIONAL LIFESCIENCE

Technologies

Neo4j
SQL
R
Python

Download

Social Sharing

Other Use Cases in this category

All Use Cases at a Glance

Contact

Stefanie Supper
CEO

Contact

Stefanie Supper
CEO