The project goal was to predict how corn hybrid variants would behave at new sites under varying environmental conditions, i.e. temperature, rain fall and soil condition. Based on the data provided, a model was created to predict crop yields and furthermore identify new favorable combinations of corn hybrid variants and locations.
Several datasets were provided representing a 15 years lifespan for more than 2,000 types of hybrids and 2,000 locations, genetic markers as well as yield, soil and weather parameters.
Challenges & Solutions
For each dataset provided, a thorough analysis was performed. Outliers within the different locations were identified by overlapping geographic data, climate information, and multiplicity of events and removed.
The genetic dataset possessed many subsets. Therefore, the dimensionality was significantly reduced without a considerable loss of information.
Applying a combination of different spatio-temporal models it was possible to predict the weather with an accuracy of 95%.
The reduced genetic material data set, weather data, soil data and yield data were merged into one dataset and used as training set for the model. Using several algorithms hybrid performances were predicted with an accuracy of 75%.
The model was successfully applied to around 20,000 combinations of new locations and new hybrids. Like this, the best performing species for the year 2017 could be identified.