Connect++ 0.6.0
A fast, readable connection prover for first-order logic.
Loading...
Searching...
No Matches
ERWA Class Reference

Implementation of the ERWA algorithm for multiarmed bandits. More...

#include <ERWA.hpp>

Collaboration diagram for ERWA:

Public Member Functions

 ERWA (size_t, bool=true, bool=true)
 
size_t get_choice () const
 
void set_epsilon (double _e)
 Reset epsilon to something different.
 
void set_alpha (double _a)
 Reset alpha to something different.
 
size_t choose ()
 Choose using the current state.
 
void reward (double)
 Provide reward for the most recent choice.
 

Private Member Functions

size_t find_max () const
 Find the index of the currently maximum r_hat.
 

Private Attributes

boost::random::bernoulli_distribution p
 
boost::random::uniform_int_distribution p2
 
vector< double > r_hat
 
double epsilon
 
double alpha
 
size_t K
 
size_t n
 
bool epsilon_greedy
 
bool alpha_is_1_over_n
 
bool choose_next
 Belt-and braces: warn if choose/reward happens in the wrong order.
 
size_t choice
 Store the last choice made.
 

Static Private Attributes

static boost::random::mt19937 random_generator
 Random source.
 

Friends

ostream & operator<< (ostream &, const ERWA &)
 

Detailed Description

Implementation of the ERWA algorithm for multiarmed bandits.

It's easy enough to find a description of this algorithm. The implementation here is pretty much straight from my book, with additions based o Suttin and Barto. In the latter, they (1) have alpha(n) = 1/n and (2) allow epsilon-greedyness. So those possibilities have been added.

Definition at line 53 of file ERWA.hpp.

Constructor & Destructor Documentation

◆ ERWA()

ERWA::ERWA ( size_t _K,
bool _eg = true,
bool _a = true )

Definition at line 36 of file ERWA.cpp.

37: K(_K)
38, epsilon(0.0)
39, alpha(0.0)
40, p()
41, p2(0, _K - 1)
42, r_hat(_K, 0.0)
43, choose_next(true)
44, epsilon_greedy(_eg)
45, alpha_is_1_over_n(_a)
46, choice(0)
47, n(0)
48{}
size_t choice
Store the last choice made.
Definition ERWA.hpp:79
bool choose_next
Belt-and braces: warn if choose/reward happens in the wrong order.
Definition ERWA.hpp:75

Member Function Documentation

◆ choose()

size_t ERWA::choose ( )

Choose using the current state.

Definition at line 63 of file ERWA.cpp.

63 {
64 if (!choose_next) {
65 cerr << "STOP IT!! EXP3 should be receiving reward..." << endl;
66 }
67 choose_next = false;
68 if (epsilon_greedy && p(random_generator)) {
70 }
71 else {
72 choice = find_max();
73 }
74 n++;
75 return choice;
76}
static boost::random::mt19937 random_generator
Random source.
Definition ERWA.hpp:61
size_t find_max() const
Find the index of the currently maximum r_hat.
Definition ERWA.cpp:50

◆ find_max()

size_t ERWA::find_max ( ) const
private

Find the index of the currently maximum r_hat.

Definition at line 50 of file ERWA.cpp.

50 {
51 double r = std::numeric_limits<double>::min();
52 size_t result = 0;
53 for (int i = 0; i < r_hat.size(); i++) {
54 double d = r_hat[i];
55 if (d > r) {
56 r = d;
57 result = i;
58 }
59 }
60 return result;
61}

◆ get_choice()

size_t ERWA::get_choice ( ) const
inline

Definition at line 89 of file ERWA.hpp.

89 {
90 return choice;
91 }

◆ reward()

void ERWA::reward ( double reward)

Provide reward for the most recent choice.

Definition at line 78 of file ERWA.cpp.

78 {
79 if (choose_next) {
80 cerr << "STOP IT!! EXP3 should be choosing..." << endl;
81 }
82 choose_next = true;
83 double r = r_hat[choice];
84 if (alpha_is_1_over_n) {
85 r_hat[choice] = r + ((1 / static_cast<double>(n)) * (reward - r));
86 }
87 else {
88 r_hat[choice] = r + (alpha * (reward - r));
89 }
90}
void reward(double)
Provide reward for the most recent choice.
Definition ERWA.cpp:78

◆ set_alpha()

void ERWA::set_alpha ( double _a)
inline

Reset alpha to something different.

Definition at line 103 of file ERWA.hpp.

103 {
104 alpha = _a;
105 }

◆ set_epsilon()

void ERWA::set_epsilon ( double _e)
inline

Reset epsilon to something different.

Definition at line 95 of file ERWA.hpp.

95 {
96 epsilon = _e;
97 boost::random::bernoulli_distribution<> new_p(_e);
98 p = new_p;
99 }

Friends And Related Symbol Documentation

◆ operator<<

ostream & operator<< ( ostream & out,
const ERWA & erwa )
friend

Definition at line 92 of file ERWA.cpp.

92 {
93 out << "r_hats:" << endl;
94 for (size_t i = 0; i < erwa.K; i++)
95 out << erwa.r_hat[i] << " ";
96 out << endl;
97 return out;
98}

Member Data Documentation

◆ alpha

double ERWA::alpha
private

Definition at line 66 of file ERWA.hpp.

◆ alpha_is_1_over_n

bool ERWA::alpha_is_1_over_n
private

Definition at line 70 of file ERWA.hpp.

◆ choice

size_t ERWA::choice
private

Store the last choice made.

Definition at line 79 of file ERWA.hpp.

◆ choose_next

bool ERWA::choose_next
private

Belt-and braces: warn if choose/reward happens in the wrong order.

Definition at line 75 of file ERWA.hpp.

◆ epsilon

double ERWA::epsilon
private

Definition at line 65 of file ERWA.hpp.

◆ epsilon_greedy

bool ERWA::epsilon_greedy
private

Definition at line 69 of file ERWA.hpp.

◆ K

size_t ERWA::K
private

Definition at line 67 of file ERWA.hpp.

◆ n

size_t ERWA::n
private

Definition at line 68 of file ERWA.hpp.

◆ p

boost::random::bernoulli_distribution ERWA::p
private

Definition at line 62 of file ERWA.hpp.

◆ p2

boost::random::uniform_int_distribution ERWA::p2
private

Definition at line 63 of file ERWA.hpp.

◆ r_hat

vector<double> ERWA::r_hat
private

Definition at line 64 of file ERWA.hpp.

◆ random_generator

boost::random::mt19937 ERWA::random_generator
staticprivate

Random source.

Underlying random number generator for epsilon-greedyness.

Definition at line 61 of file ERWA.hpp.


The documentation for this class was generated from the following files: