The project focus was to leverage the graph data to train machine learning models. For this reason different scenarios for data export, import and automated processing pipelines were established. In this process the whole graph schema and content was checked and validated against the SQL database to derive optimization approaches. Also different options for enhancing the graph schema with additional data for further analysis were evaluated. Especially the integration of weather data, which was not covered in the SQL database, was a goal for this project.
An efficient way to extract data from the graph and write it back via an Python or R development environment was established. With this set up machine learning models can be trained dynamically and applied on the graph data.
The neo4j graph is fed by an SQL database and should be used for flexible queries to forward data to development environments.
A generalized documentation of the complex graph structure is challenging. Also, standard queries and aliases for the entities needed to be defined and documented. Building up a graph schema offers a lot of complex options and can be done in very different ways.
In this project a recommendation catalog was created, including:
- Syntax and datatype changes
- Schema alignments for optimization of frequent queries to increase performance
- Database optimizations
- Corrections of potential schema deadlocks
- Integration of weather data and further data sources
As an interface to development environments a Python / R pipeline was built up. By integration into analysis tools the graph can be queried on demand so that the results can be leveraged to train machine learning models and apply them directly on more data.