Energy data is difficult to deal with. Publicly available energy datasets are very far from ideal. When it comes to power plant data, critical information is often missing, incorrect or mislabeled, compromising data quality, usability and therefore, value.
When we first set out to build the world’s largest renewable energy project database at ENIAN in 2018, a significant proportion (~40%) of the data came from OpenEI. However, we quickly found that many of the project coordinates listed in these open source databases did not point to actual project sites, such as solar PV or wind farms. A quick check on Google Maps often showed that instead they pointed to coordinates at the centre of the nearest town or city to where the project was actually located.
In fact, so much of the data faced this issue that geocoding the name of the project using the Google Maps Geocoding API led to accuracies of the coordinates in some datasets increasing 10 fold. But how was this 10 fold increase measured?
We determined that the number of solar PV and wind power plants available for mapping from openly available sources on the web is close to 100,000 projects. Having an analyst or a team of analysts check the coordinates for each project one by one can take a very long time. Assume, for example that for a very dedicated analyst with a high-level of focus, it takes them 20 seconds (although probably a bit longer), to find the coordinate in the database, enter it into mapping software, and determine whether the project was actually visible from the satellite image. That is 2,000,000 seconds, or 23 days of very tedious searching, with no breaks. Account for breaks for sleeping, leisure and eating… You get the picture. It just doesn’t scale.
So to solve this inefficiency we did image recognition on satellite images. A breakdown of the workflow was as follows:
In order to test for solar panels in the satellite images, a model developed at Stanford University called DeepSolar was used. In order to test for wind turbines, I developed my own basic model using Keras for Tensorflow as a readily available model could not be found at the time and repeated the process in the figure above.
The workflow described in the flowchart above was used on the dataset prior and post-correction using the Google Maps Geocoding API, in order to calculate the 10 fold increase mentioned earlier. Specifically, testing the satellite image for panels/turbines for each project resulted in the image recognition algorithm scoring positive outputs 10 times more often after the coordinates were corrected with the geocoding API. Images like the figure above were saved in a folder for each project for the analysts to quickly scroll through and then verify.
This project integrates AI assistance into an analyst’s workflow to reduce the time taken to validate renewable energy project data. An adventurous future for the project could see satellite images of the whole world being observed to locate unrecorded renewable energy projects, and perhaps even infer their size.