Skip to content

Rl env

polars_ts.adapters.rl_env

Gymnasium-compatible RL environment for forecast-based decision making.

ForecastEnv

A gymnasium-like environment wrapping a polars-ts forecast pipeline.

At each step, the agent observes recent time series values and a forecast, then takes an action (e.g. inventory order, trading signal). The reward is computed from a configurable reward function.

Parameters

data Numpy array of shape (n_steps,) with the actual time series values. forecasts Numpy array of shape (n_steps,) with forecast values. window_size Number of recent observations provided as the observation. reward_fn Callable (action, actual, forecast) -> float. Defaults to negative absolute error: -|actual - action|.

reset()

Reset the environment. Return the initial observation.

step(action)

Take one step.

Parameters

action The agent's decision for this timestep.

Returns

tuple (observation, reward, done, info)

_get_obs()

Build observation: recent values + current forecast.