TransWikia.com

What is a policy in machine learning?

Data Science Asked by Ramya Raj on November 28, 2020

While I was reading the paper "Grounded Action Transformation for Robot Learning in Simulation", I came across the term "policy". Could someone explain to me what that actually is (in general and in the particular context of the paper)?

3 Answers

A policy is a state-action mapping. A 'state' is a formalism used in AI that represents the state of the world, i.e. what the agent's idea of the world is. The action is, naturally, what action it should take in that state. A policy just maps states to actions.

One of the basic problems in AI is how to maximize reward over time at some task. One strategy for the agent is to try to understand the system and predict the results of their actions, and the reward that would follow. Another strategy is for the agent to try lots of things and record the results. Either of these (eventually) allow the agent to calculate a good policy - and once it's calculated, the difficult computational work is done and the agent just has to 'look up' what action to take in each state.

Answered by tom on November 28, 2020

It's not so much a machine learning term as it is a control theory term. A "control policy" is a heuristic that suggests a particular set of actions in response to the current state of the agent (in your case, a robot) and the environment. In the case of reinforcement learning, a policy is parameterized by the network weights. Changing the weights changes the policy, so the distribution of weights comprises a distribution over policies, hence why fitting models in this context is often referred to as "policy search". It's not uncommon to use an ensemble for these kinds of problems, in which case each component of the ensemble comprises a different policy which recommends some action, and then the ensemblification mechanism selects an action from one of these distinct policies (e.g. by vote or highest score) or combines their recommendations into an action (e.g. by taking an average). The whole ensemble can also be described as representing a policy.

Answered by David Marx on November 28, 2020

A policy is a mapping from "states" (images, joint angles, robot position) to "actions" (joint positions, joint torques, options). In that paper, the parameterized policy used is a mapping from states (robot state, joint angles and joint velocities from a state observer) to actions (target joint positions) of the robot.

Answered by Haresh K Miriyala on November 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP