Ratomir Vukadin
Machine Learning

Solar production forecasting with CatBoost and Azure

A solar energy forecasting platform combining weather data, inverter APIs, exchange prices, Azure services, and plant-specific CatBoost models.

3 min
CatBoostAzureforecastingenergy

The platform produces day-ahead and intra-day solar production forecasts. It integrates inverter APIs, weather forecasts, and energy exchange data to support traders and producers.

Product context

Solar production forecasting is valuable only when the prediction is connected to operational decisions. Traders need day-ahead and intra-day expectations, but those expectations change as weather forecasts shift, inverter data arrives, and market prices move. The platform was designed to combine those signals into a workflow that could explain expected production and support commercial decisions.

The system collects plant configuration, location data, trader and organization data, weather models, inverter measurements, and exchange prices. Each data source has different freshness, reliability, and structure, so the architecture separates static domain data from generated forecasts and high-volume weather payloads.

Architecture

The backend combines Azure Functions and Azure Container Apps. MongoDB stores deep weather models and generated plans, Redis caches generated data, and PostgreSQL stores static domain data such as plants, traders, locations, organizations, and users.

Azure Functions handle scheduled and event-driven collection work, including calls to weather providers, inverter APIs, and exchange data sources. Azure Container Apps host longer-running or service-style workloads where predictable runtime and deployment control are more useful than a purely serverless execution model.

The frontend is a React and Next.js trader portal with Plotly.js charts for comparing forecasts, generated plans, and historical production. The interface focuses on operational review rather than generic dashboard decoration: users need to scan plant-level output, understand changes, and make decisions from the latest available data.

Prediction engine

The first algorithmic engine was replaced with CatBoost models because each plant behaves differently over time. Python notebooks were used to compare model versions by absolute error and financial impact.

The model work started with a more standard calculation approach, then moved toward plant-specific learning because physical behavior varied by installation. Orientation, inverter characteristics, local weather patterns, and historical production all affect the relationship between forecast inputs and real output. A single generic formula was not enough to capture those differences.

CatBoost was a practical fit because it handles tabular features well and supports fast iteration during model comparison. The notebook workflow made it possible to test model versions against historical data before promoting a version into the production pipeline.

Data and validation

Forecast quality was evaluated with both technical and business measures. Absolute error shows how far the prediction is from observed production. Financial impact shows whether the error matters in market terms. Looking at both prevents a model from appearing strong statistically while still causing poor trading decisions.

Generated plans are stored separately from static configuration so the platform can preserve historical forecast versions. That helps with debugging, model comparison, and explaining why a trader saw a specific plan at a specific time.

Outcome

The architecture gives the product a path from data ingestion to plant-specific prediction and trader-facing review. It also keeps the system open for future improvements such as additional inverter providers, new weather models, revised market inputs, and model retraining workflows without replacing the entire platform.