Rl env
polars_ts.adapters.rl_env
Gymnasium-compatible RL environment for forecast-based decision making.
ForecastEnv
A gymnasium-like environment wrapping a polars-ts forecast pipeline.
At each step, the agent observes recent time series values and a forecast, then takes an action (e.g. inventory order, trading signal). The reward is computed from a configurable reward function.
Parameters
data
Numpy array of shape (n_steps,) with the actual time series values.
forecasts
Numpy array of shape (n_steps,) with forecast values.
window_size
Number of recent observations provided as the observation.
reward_fn
Callable (action, actual, forecast) -> float. Defaults to
negative absolute error: -|actual - action|.
reset()
Reset the environment. Return the initial observation.
step(action)
Take one step.
Parameters
action The agent's decision for this timestep.
Returns
tuple
(observation, reward, done, info)
_get_obs()
Build observation: recent values + current forecast.