검색 상세

Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure

초록/요약

We consider an inventory replenishment problem with unknown and non-stationary demand. We design a structured reinforcement learning algorithm that efficiently adapts the replenishment policy to changing demand without any prior knowledge. Our proposed method integrates the known structural properties of a well-performing inventory replenishment policy with reinforcement learning. By exploiting the policy structure, we tune reinforcement learning to characterize the inventory replenishment policy and approximate the value function. In particular, we propose two methods for stochastic approximation on the gradient of the objective function. These novel reinforcement learning algorithms ensure an efficient convergence rate and lower algorithmic complexity for solving practical problems. The numerical results demonstrate that the proposed algorithms adaptively update the policy to changing demand and lower inventory costs compared to various benchmarks. We also conduct a numerical validation for a South Korean retail shop to validate the practical feasibility of the proposed method. Understanding the policy structure is beneficial for designing reinforcement learning algorithms that can address the inventory replenishment problem. These well-designed reinforcement learning algorithms are particularly promising when we require policy updates based on observations without precise knowledge of non-stationary demand. These research findings could be extended to address the various inventory decisions in which policy structures are available. © 2023

more