38size_t UCB::get_choice() {
44 cerr <<
"STOP IT!! EXP3 should be receiving reward..." << endl;
53 cerr <<
"STOP IT!! EXP3 should be choosing..." << endl;
59ostream& operator<<(ostream& out,
const UCB& ucb) {
Implementation of the UCB algorithm for multiarmed bandits.
void reward(double)
Provide reward for the most recent choice.
size_t choice
Store the last choice made.
static boost::random::mt19937 random_generator
Random source.
bool choose_next
Belt-and braces: warn if choose/reward happens in the wrong order.
size_t choose()
Choose using the current state.