**Abstract:**In discrete time, option hedging and pricing amount to sequential risk minimization. In particular, a discrete-time version of the Black-Scholes-Merton (BSM) option pricing model can be formulated as a problem of dynamic Markowitz optimization of an option replicating (hedge) portfolio made of an underlying stock and cash. This talk shows how this problem can be approached using Reinforcement Learning (RL). Once the problem is posed as an RL problem, option pricing and hedging can be done without any model for the underlying stock dynamics, using instead model-free, data-driven RL methods such as Q-learning and Fitted Q Iteration. As a result, both option price and hedge are obtained by a well-defined and converging maximization problem that uses only market prices and option trading data (inter-temporal re-hedges and hedge losses in the replicating portfolio) to find the optimal option hedge and price. The model can learn when re-hedges in data are suboptimal/noisy, or even purely random. This means, in particular, that our RL model can learn the BSM model itself, if the world is according to BSM.

Computationally, the RL-based option pricing model is very simple, as it uses only basic linear algebra and linear regressions to compute the option price and hedge. The only tunable parameters in this approach are parameters defining the optimal hedge and price themselves. This approach does not need any model calibration (as there is no model anymore), and it automatically solves the volatility smile problem of the BSM model. We also discuss some extensions of this approach, including in particular an Inverse Reinforcement Learning setting, where inter-temporal losses from re-hedges are unobservable.

**Bio:**Igor Halperin is Research Professor of Financial Machine Learning at NYU Tandon School of Engineering. His research focuses on using methods of Reinforcement Learning, Information Theory, neuroscience and physics for financial problems such as portfolio optimization, dynamic risk management, and inference of sequential decision-making processes of financial agents.

Igor has an extensive industrial experience in statistical and financial modeling, in particular in the areas of option pricing, credit portfolio risk modeling, portfolio optimization, and operational risk modeling. Prior to joining NYU Tandon, Igor was an Executive Director of Quantitative Research at JPMorgan, and before that he worked as a quantitative researcher at Bloomberg LP. Igor has published numerous articles in finance and physics journals, and is a frequent speaker at financial conferences. He has also co-authored the book “Credit Risk Frontiers” published by Bloomberg LP.

Igor has a Ph.D. in theoretical high energy physics from Tel Aviv University, and a M.Sc. in nuclear physics from St. Petersburg State Technical University. He advices a several fintech and data science start-ups and risk management firms.