Endow NPCs and other autonomous agents with the ability to acquire new behavior through reinforcement learning – an algorithmic approach to decision making that mimics the way humans and other animals learn
Companion Video: https://youtu.be/Wy7HDj2igPo
Tabular Q-Learning can be used to provide an NPC with intentional behavior including avoiding enemy players, collecting health points, and most behaviors a human is capable of manifesting within the game environment. With this system, instead of having to create complex hand-crafted behavior trees, simply reward the actions you want the agent to take and it learns strategies on its own to recieve those rewards.
Q-learning is also the foundation for more advanced systems of intelligent behavior such as those found in the AI Emotions Toolkit and the MindMaker DRL Engine. It is best suited for low dimension game environments where there are a limited number of actions and objects the AI is learning from. For more complex games, try the Neurostudio Learning Engine.
In Q-Learning, behavior is broken up into an exploration and exploitation phase. During exploration, the agent acquires knowledge about the effects of its actions. In the exploitation phase it uses that knowledge to make strategic decisions. In this example project, the Q-learning algorithm is used to solve a 'match to sample' puzzle in which the NPC learns that it must activate a switch within the level at the same time that a light is on in order to receive a “food reward”.
Features:
· 1 Custom Structure – 2 dimensional array
· 1 AI Behavior Tree
· 1 AI Blackboard
· 1 AI Character Controller
· 1 AI Character Blueprint
Number of Blueprints: 1
Input: None
Network Replicated: No
Supported Development Platforms: Unreal Engine 4.15 and Up
Supported Target Build Platforms: All
Documentation: https://unrealai.wordpress.com/2017/12/19/q-learning/
Important/Additional Notes:
To disable visualization of the training phase, set all delay nodes after the AI movement code to 0 and change both delay nodes for the light turning off from 10 to zero. This somewhat defeats the purpose of learning whether the light is on or off, but is necessary if one wishes to train up the agent very fast.