Department of Computer Science and Technology

Technical reports

W-learning:
competition among selfish Q-learners

Mark Humphrys

April 1995, 30 pages

DOI: 10.48456/tr-362

Abstract

W-learning is a self-organising action-selection scheme for systems with multiple parallel goals, such as autonomous mobile robots. It uses ideas drawn from the subsumption architecture for mobile robots (Brooks), implementing them with the Q-learning algorithm from reinforcement learning (Watkins). Brooks explores the idea of multiple sensing-and-acting agents within a single robot, more than one of which is capable of controlling the robot on its own if allowed. I introduce a model where the agents are not only autonomous, but are in fact engaged in direct competition with each other for control of the robot. Interesting robots are ones where no agent achieves total victory, but rather the state-space is fragmented among different agents. Having the agents operate by Q-learning proves to be a way to implement this, leading to a local, incremental algorithm (W-learning) to resolve competition. I present a sketch proof that this algorithm converges when the world is a discrete, finite Markov decision process. For each state, competition is resolved with the most likely winner of the state being the agent that is most likely to suffer the most if it does not win. In this way, W-learning can be viewed as ‘fair’ resolution of competition. In the empirical section, I show how W-learning may be used to define spaces of agent-collections whose action selection is learnt rather than hand-designed. This is the kind of solution-space that may be searched with a genetic algorithm.

Full text

PS (0.1 MB)

BibTeX record

@TechReport{UCAM-CL-TR-362,
  author =	 {Humphrys, Mark},
  title = 	 {{W-learning: competition among selfish Q-learners}},
  year = 	 1995,
  month = 	 apr,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-362.ps.gz},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-362},
  number = 	 {UCAM-CL-TR-362}
}