![]() |
Connect++ 0.6.0
A fast, readable connection prover for first-order logic.
|
Implementation of the ERWA algorithm for multiarmed bandits. More...
#include <ERWA.hpp>
Public Member Functions | |
ERWA (size_t, bool=true, bool=true) | |
size_t | get_choice () const |
void | set_epsilon (double _e) |
Reset epsilon to something different. | |
void | set_alpha (double _a) |
Reset alpha to something different. | |
size_t | choose () |
Choose using the current state. | |
void | reward (double) |
Provide reward for the most recent choice. | |
Private Member Functions | |
size_t | find_max () const |
Find the index of the currently maximum r_hat. | |
Private Attributes | |
boost::random::bernoulli_distribution | p |
boost::random::uniform_int_distribution | p2 |
vector< double > | r_hat |
double | epsilon |
double | alpha |
size_t | K |
size_t | n |
bool | epsilon_greedy |
bool | alpha_is_1_over_n |
bool | choose_next |
Belt-and braces: warn if choose/reward happens in the wrong order. | |
size_t | choice |
Store the last choice made. | |
Static Private Attributes | |
static boost::random::mt19937 | random_generator |
Random source. | |
Friends | |
ostream & | operator<< (ostream &, const ERWA &) |
Implementation of the ERWA algorithm for multiarmed bandits.
It's easy enough to find a description of this algorithm. The implementation here is pretty much straight from my book, with additions based o Suttin and Barto. In the latter, they (1) have alpha(n) = 1/n and (2) allow epsilon-greedyness. So those possibilities have been added.
ERWA::ERWA | ( | size_t | _K, |
bool | _eg = true, | ||
bool | _a = true ) |
Definition at line 36 of file ERWA.cpp.
size_t ERWA::choose | ( | ) |
Choose using the current state.
Definition at line 63 of file ERWA.cpp.
|
private |
Find the index of the currently maximum r_hat.
Definition at line 50 of file ERWA.cpp.
|
inline |
void ERWA::reward | ( | double | reward | ) |
|
inline |
|
inline |
|
friend |
|
private |
|
staticprivate |