This article series intends to explain how we at Greenbyte use Artificial Intelligence to provide deep insights for operators of wind and solar assets. In part one, we reminisced about the journey that got us to where we are now. Part two explains how we make our AI services available 24/7, continuously updated, and constantly expanding with our CI/CD pipeline. Part three will go in depth into some of the more advanced aspects and experiments we are running right now. You can find part one in the related articles below.
Keeping it simple
Pragmaticism is one of our key tenets at Greenbyte. One way of looking at it is that we always focus on solving our customers’ problems first, and then solve our own problems. Being a value-driven company, this comes naturally to us, even though it does mean we sometimes have to set the tech geek in us to the side, in order to deliver valuable insights.
In a field as complex and ever-changing as AI, this becomes especially important. Continuously reworking our thesis, data, models, and post-processing is the only recipe for success (that we have found), and in order to do that, we chose early insight over early infrastructure.
The first step
I still remember getting a good chuckle when I realized just how simple our first implementation was. Going into AI for the first time, you expect bleeding edge tech that’s deployed by phrases to Alexa (or whoever your favorite smart assistant is), kept load-balanced and monitored by evolutionary algorithms that keep self-improving several generations per second.
The reality? A tiny executable running on a virtual server, tirelessly crunching byte after byte of real-time data (in a batched manner mind you, none of this streaming/clustered business the kids seem to be into these days). It was deployed manually by copying an executable anywhere onto the machine’s drive and running it from a terminal window.
The kicker? It worked. In fact, it worked wonders. We could iterate quickly, re-run and deploy, run locally, and everything else without any fuss. As a prototype stage, it doesn’t get simpler than this, but I can’t reiterate the point enough: it worked.
The second step
If you read part 1 (and if you haven’t it’s OK, I’m not crying, I promise) you know the amount of work that was involved in that first phase before the insights were solid. However once we matured from that stage, the problems of scale came at us. We have a lot of different customers, a metric ton of data, and a system that we don’t want to put unnecessary load on. No longer was it feasible to manually copy our executable to the server, or start crunching large amounts of data.
So, did we go all the way? Hadoop clusters with stream processing (I’m partial to Apache Spark, but you pick your poison) on networked GPU nodes (the Tesla does good work), versioned and orchestrated? Almost.
And by almost, I mean not even close. In this phase, we realized just how heavy training was, and how not-AI-ready our back-end was. So we separated AI training data from our main data stores. As a customer was introduced into our AI pilot program, we cloned relevant data, prepared and pre-processed it in a separate database suitable for the purpose, and then ran our training on that source. The executables were still built locally but distributed automatically, and we kept track of what versions were running for what customers.
Once it was time to run the models, we had a single (single!) application instance processing the data for all customers. It worked.
The third step — where we are now
As we rolled out our AI feature commercially, we needed to add additional complexity. Automatic versioning, deployment, and orchestration became a must, so we did that (in fact, you can read about it in a different blog post).
Performance became an issue, so we started moving our training and running to machines tailored for this type of heavy-duty computation. Obviously it skyrocketed our bills, but it was worth it.
Are we at the point where we Spark all over our Hadoop clusters? No, not yet. Are we heading there? Yes. But we let the need for insight dictate our need for complexity.
As we add more AI services (look forward to a reveal soon!) the need for technical complexity grows because new problems arise that can not be solved otherwise. We embrace this, but never prematurely. Right now we’re putting a lot of the strategy and architecture in place to move forward as we add our third, up to our millionth AI service. This is exciting, and the tech geek in us can revel — finally!
Part 3 is released August, 2018.