Env
polars_ts.marl.env
Portfolio environment for multi-agent RL decision making.
PortfolioEnv
Gymnasium-like environment for portfolio allocation.
At each step, the agent observes recent returns for all assets and produces portfolio weights. The reward is the portfolio return minus transaction costs.
Parameters
returns
Array of shape (n_steps, n_assets) with per-period returns.
window_size
Number of recent return periods provided as the observation.
transaction_cost
Proportional cost applied to weight changes between steps.
from_dataframe(df, window_size=10, transaction_cost=0.0, id_col='unique_id', time_col='ds', target_col='y')
classmethod
Create environment from a polars panel DataFrame of prices.
Converts prices to log returns and pivots to wide format.
reset()
Reset and return initial observation of shape (window_size, n_assets).
step(action)
Take one step with portfolio weights.
Parameters
action Raw portfolio weights (will be normalized to sum to 1).
Returns
tuple
(observation, reward, done, info)
_get_obs()
Build observation: recent returns window of shape (window_size, n_assets).