The project focus was to leverage the graph data to train machine learning models. For this reason different scenarios for data export, import and automated processing pipelines were established. In this process the whole graph schema and content was checked and validated against the SQL database to derive optimization approaches. Also different options for enhancing the graph schema with additional data for further analysis were evaluated. Especially the integration of weather data, which was not covered in the SQL database, was a goal for this project.
An efficient way to extract data from the graph and write it back via an Python or R development environment was established. With this set up machine learning models can be trained dynamically and applied on the graph data.
The neo4j graph is fed by an SQL database and should be used for flexible queries to forward data to development environments.
For this task, high-resolution mobile mapping LiDAR scans of two German highways were provided by Cloud-Vermessung + Planung GmbH, which surveyed the road network for the Bavarian State Construction Administration. The used vehicle was a conventional MB V class. The measuring unit was the Trimble MX9 with built-in Riegl scanners. With this setup very high accuracies (+/- 1 cm) and very dense point, clouds were created. Also, the scanner captured buildings such as bridges around the highway and delivers point clouds including intensity values and other additional data that can be used for the deep learning task.
A generalized documentation of the complex graph structure is challenging. Also, standard queries and aliases for the entities needed to be defined and documented. Building up a graph schema offers a lot of complex options and can be done in very different ways.
Verification: By exporting data from the SQL and the Graph database to the same table-based data format, a full comparison of both databases could be performed. Differences were reverse engineered to their root and recommendations for changes in the graph schema were derived to make sure no information is lost in the future.
Querying and interfacing: Guidelines for queries for optimal data extraction, as well as query templates and standard aliases were defined to unify the data extraction. To interface with external development environments, a python BOLT driver was implemented in a machine learning pipeline.
In this project a recommendation catalog was created, including:
Syntax and datatype changes
Schema alignments for optimization of frequent queries to increase performance
Corrections of potential schema deadlocks
Integration of weather data and further data sources
As an interface to development environments a Python / R pipeline was built up. By integration into analysis tools the graph can be queried on demand so that the results can be leveraged to train machine learning models and apply them directly on more data.