Optimal Learning

This text presents optimal learning techniques with applications in energy, homeland security, health, sports, transportation science, biomedical research, biosurveillance, stochastic optimization, high technology, and complex resource allocation problems. The coverage utilizes a relatively new class of algorithmic strategies known as approximate dynamic programming, which merges dynamic programming (Markov decision processes), math programming (linear, nonlinear, and integer), simulation, and statistics. It features mathematical techniques that are applicable to a variety of situations, from identifying promising drug candidates to figuring out the best evacuation plan in the event of a natural disaster.

Warren B. Powell, PhD, is Professor of Operations Research and Financial Engineering at Princeton University, where he is founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. The recipient of the 2004 INFORMS Fellow Award, Dr. Powell is the author of Approximate Dynamic Programming: Solving the Curses of Dimensionality, Second Edition (Wiley). Ilya O. Ryzhov, PhD, is Assistant Professor in the Department of Decision, Operations, and Information Technologies at the Robert H. Smith School of Business at the University of Maryland. He has made fundamental contributions to bridge the fields of ranking and selection with multiarmed bandits and optimal learning with mathematical programming.

Preface	p. xv
Acknowledgments	p. xix
The Challenges of Learning	p. 1
Learning the Best Path	p. 2
Areas of Application	p. 4
Major Problem Classes	p. 12
The Different Types of Learning	p. 13
Learning from Different Communities	p. 16
Information Collection Using Decision Trees	p. 18
A Basic Decision Tree	p. 18
Decision Tree for Offline Learning	p. 20
Decision Tree for Online Learning	p. 21
Discussion	p. 25
Website and Downloadable Software	p. 26
Goals of this Book	p. 26
Problems	p. 27
Adaptive Learning	p. 31
The Frequentist View	p. 32
The Bayesian View	p. 33
The Updating Equations for Independent Beliefs	p. 34
The Expected Value of Information	p. 36
Updating for Correlated Normal Priors	p. 38
Bayesian Updating with an Uninformative Prior	p. 41
Updating for Non-Gaussian Priors	p. 42
The Gamma-Exponential Model	p. 43
The Gamma-Poisson Model	p. 44
The Pareto-Uniform Model	p. 45
Models for Learning Probabilities*	p. 46
Learning an Unknown Variance*	p. 49
Monte Carlo Simulation	p. 51
Why Does It Work?*	p. 54
Derivation of ¿	p. 54
Derivation of Bayesian Updating Equations for Independent Beliefs	p. 55
Bibliographic Notes	p. 57
Problems	p. 57
The Economics of Information	p. 61
An Elementary Information Problem	p. 61
The Marginal Value of Information	p. 65
An information Acquisition Problem	p. 68
Bibliographic Notes	p. 70
Problems	p. 70
Ranking and Selection	p. 71
The Model	p. 72
Measurement Policies	p. 75
Deterministic Versus Sequential Policies	p. 75
Optimal Sequential Policies	p. 76
Heuristic Policies	p. 77
Evaluating Policies	p. 81
More Advanced Topics*	p. 83
An Alternative Representation of the Probability Space	p. 83
Equivalence of Using True Means and Sample Estimates	p. 84
Bibliographic Notes	p. 85
Problems	p. 85
The Knowledge Gradient	p. 89
The Knowledge Gradient for Independent Beliefs	p. 90
Computation	p. 91
Some Properties of the Knowledge Gradient	p. 93
The Four Distributions of Learning	p. 94
The Value of Information and the S-Curve Effect	p. 95
Knowledge Gradient for Correlated Beliefs	p. 98
Anticipatory Versus Experiential Learning	p. 103
The Knowledge Gradient for Some Non-Gaussian Distributions	p. 105
The Gamma-Exponential Model	p. 105
The Gamma-Poisson Model	p. 108
The Pareto-Uniform Model	p. 109
The Beta-Bernoulli Model	p. 111
Discussion	p. 113
Relatives of the Knowledge Gradient	p. 114
Expected Improvement	p. 114
Linear Loss*	p. 115
The Problem of Priors	p. 118
Discussion	p. 120
Why Does It Work?*	p. 120
Derivation of the Knowledge Gradient Formula	p. 120
Bibliographic Notes	p. 125
Problems	p. 125
Bandit Problems	p. 139
The Theory and Practice of Gittins Indices	p. 141
Gittins Indices in the Beta-Bernoulli Model	p. 142
Gittins Indices in tie Normal-Normal Model	p. 145
Approximating Gittins Indices	p. 147
Variations of Bandit Problems	p. 148
Upper Confidence Bounding	p. 149
The Knowledge Gradient for Bandit Problems	p. 151
The Basic Idea	p. 151
Some Experimental Comparisons	p. 153
Non-Normal Models	p. 156
Bibliographic Notes	p. 157
Problems	p. 157
Elements of a Learning Problem	p. 163
The States of our System	p. 164
Types of Decisions	p. 166
Exogenous Information	p. 167
Transition Functions	p. 168
Objective Functions	p. 168
Designing Versus Controlling	p. 169
Measurement Costs	p. 170
Objectives	p. 170
Evaluating Policies	p. 175
Discussion	p. 177
Bibliographic Notes	p. 178
Problems	p. 178
Linear Belief Models	p. 181
Applications	p. 182
Maximizing Ad Clicks	p. 182
Dynamic Pricing	p. 184
Housing Loans	p. 184
Optimizing Dose Response	p. 185
A Brief Review of Linear Regression	p. 186
The Normal Equations	p. 186
Recursive Least Squares	p. 187
A Bayesian Interpretation	p. 188
Generating a Prior	p. 189
The Knowledge Gradient for a Linear Model	p. 191
Application to Drug Discovery	p. 192
Application to Dynamic Pricing	p. 196
Bibliographic Notes	p. 200
Problems	p. 200
Subset Selection Problems	p. 203
Applications	p. 205
Choosing a Subset Using Ranking and Selection	p. 207
Setting Prior Means and Variances	p. 207
Two Strategies for Setting Prior Covariances	p. 208
Larger Sets	p. 209
Using Simulation to Reduce the Problem Size	p. 210
Computational Issues	p. 212
Experiments	p. 213
Very Large Sets	p. 214
Bibliographic Notes	p. 216
Problems	p. 216
Optimizing a Scalar Function	p. 219
Deterministic Measurements	p. 219
Stochastic Measurements	p. 223
The Model	p. 223
Finding the Posterior Distribution	p. 224
Choosing the Measurement	p. 226
Discussion	p. 229
Bibliographic Notes	p. 229
Problems	p. 229
Optimal Bidding	p. 231
Modeling Customer Demand	p. 233
Some Valuation Models	p. 233
The Logit Model	p. 234
Bayesian Modeling for Dynamic Pricing	p. 237
A Conjugate Prior for Choosing Between Two Demand Curves	p. 237
Moment Matching for Nonconjugate Problems	p. 239
An Approximation for the Logit Model	p. 242
Bidding Strategies	p. 244
An Idea From Multi-Armed Bandits	p. 245
Bayes-Greedy Bidding	p. 245
Numerical Illustrations	p. 247
Why Does It Work?*	p. 251
Moment Matching for Pareto Prior	p. 251
Approximating the Logistic Expectation	p. 252
Bibliographic Notes	p. 253
Problems	p. 254
Stopping Problems	p. 255
Sequential Probability Ratio Test	p. 255
The Secretary Problem	p. 261
Setup	p. 261
Solution	p. 262
Bibliographic Notes	p. 266
Problems	p. 266
Active Learning in Statistics	p. 269
Deterministic Policies	p. 270
Sequential Policies for Classification	p. 274
Uncertainty Sampling	p. 274
Query by Committee	p. 275
Expected Error Reduction	p. 277
A Variance-Minimizing Policy	p. 277
Mixtures of Gaussians	p. 280
Estimating Parameters	p. 280
Active Learning	p. 282
Bibliographic Notes	p. 283
Simulation Optimization	p. 285
Indifference Zone Selection	p. 288
Batch Procedures	p. 288
Sequential Procedures	p. 290
The 0-1 Procedure: Connection to Linear Loss	p. 292
Optimal Computing Budget Allocation	p. 293
Indifference-Zone Version	p. 293
Linear Loss Version	p. 295
When Does It Work?	p. 295
Model-Based Simulated Annealing	p. 296
Other Areas of Simulation Optimization	p. 298
Bibliographic Notes	p. 299
Learning in Mathematical Programming	p. 301
Applications	p. 303
Piloting a Hot Air Balloon	p. 303
Optimizing a Portfolio	p. 308
Network Problems	p. 309
Discussion	p. 313
Learning on Graphs	p. 313
Alternative Edge Selection Policies	p. 317
Learning Costs for Linear Programs*	p. 318
Bibliographic Notes	p. 324
Optimizing Over Continuous Measurements	p. 325
The Belief Model	p. 327
Updating Equations	p. 328
Parameter Estimation	p. 330
Sequential Kriging Optimization	p. 332
The Knowledge Gradient for Continuous Parameters*	p. 334
Maximizing the Knowledge Gradient	p. 334
Approximating the Knowledge Gradient	p. 335
The Gradient of the Knowledge Gradient	p. 336
Maximizing the Knowledge Gradient	p. 338
The KGCP Policy	p. 339
Efficient Global Optimization	p. 340
Experiments	p. 341
Extension to Higher-Dimensional Problems	p. 342
Bibliographic Notes	p. 343
Learning With a Physical State	p. 345
Introduction to Dynamic Programming	p. 347
Approximate Dynamic Programming	p. 348
The Exploration vs. Exploitation Problem	p. 350
Discussion	p. 351
Some Heuristic Learning Policies	p. 352
The Local Bandit Approximation	p. 353
The Knowledge Gradient in Dynamic Programming	p. 355
Generalized Learning Using Basis Functions	p. 355
The Knowledge Gradient	p. 358
Experiments	p. 361
An Expected Improvement Policy	p. 363
Bibliographic Notes	p. 364
Index	p. 381
Table of Contents provided by Ingram. All Rights Reserved.

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

FREE SHIPPING

Amazon no longer offers textbook rentals. We do!

Summary

Author Biography

Table of Contents

Supplemental Materials