Nowadays humanity is increasingly faced with the finite nature of fossil fuels and the environmental damage they cause. Due to the growing use of non-renewable energy sources, renewable energy gets into highlight, however its production, leads to a great challenge to operators of electrical grid, especially in terms of planning In my dissertation I researched a data driven solution to this problem. Among the renewable energy sources I have been focusing on wind energy extraction. In my diploma work, I made short-term, 6- and 24-hours forecasts for 22 wind farms using statistical and data mining models.
Two databases were used for my analysis. One includes the utilization of wind farms for 2012 and 2013, which I downloaded from AEMO's website. Since wind energy extraction is weather-dependent, the other database contains weather forecasts for the same period. I downloaded it from the ECMWF’s web page.
The processing of the data was carried out with the methodology of the CRISP-DM. The models were run with the same sliding window validation method and I made the comparison with error metrics. One hand the models’ 6-hour forecasts were set against a basic ARMA model, on the other hand the 6-hour forecasts were compared with 24-hour forecasts outcome.
As a result of the research, we can say that in the case of 6-hour forecasts the gradient boosting regressor, the random forest regressor and the ARMAX models provided the best predictions. For the longer, 24-hour forecasts, ARMAX model’s performance was worse, even compared with the ARMA model, while gradient boosting and random forest regressors continued to deliver the best results. Accuracy and wider utilization of data - for example the transformation of data or the introduction of new variables – plays an important role in predicting wind energy production. A 10 to 30 km distance between the wind farms and the weather forecasting sites did not clearly affect the results of the models.