We present a method for maritime platform defense using constrained deep reinforcement learning (DRL), showing how competing desires to reliably defend a fleet and conserve inventory may be managed through a dual optimization strategy. Against persistent and variable raids of threats, our agents minimize inventory expenditure subject to a constraint on the average time before a threat impacts the fleet being defended. Critically, the additional inventory consideration is introduced only after the agent has learned to defend the fleet well enough to consistently satisfy the constraint. In evaluations against a realistic simulation environment and with variable multi-ship geometries, we find that our strategy may be tuned to either (1) enable the agent to make significant gains in efficiency while losing very little in terms of reliability or (2) closely track specified reliability constraints while reducing inventory expenditure even further. The result is an agent with considerably stronger long-term viability, since the conserved inventory may be used for future engagements. We speculate on the potential of this method to provide a tunable, trustworthy artificial assistant to human decision-makers tasked with defense scheduling.
|