Machine Learning in Finance Workshop 2018

Events

Past Event

Machine Learning in Finance Workshop 2018

April 20, 2018
8:00 AM - 5:00 PM
Event time is displayed in your time zone.
Lerner Hall (2920 Broadway, New York, NY 10027)

The workshop is organized by:

cfe logo
Data Science logo

Workshop Program

The following is the schedule.

  • 8.15 - 9.00 Registration
  • 9.00 - 9.15 Introduction
  • 9.15 - 9.55 Dave Baggett (Inky)
    Title: Use of Machine Learning to Thwart Phishing Attacks  
  • 9.55 - 10.35 Igor Halperin (NYU, Tandon School of Engineering)
    Title: Model-free Option Pricing and Hedging by Reinforcement Learning 
  • 10.35 - 11.15 Alexander Statnikov (American Express)
    Title: Machine Learning Powers Better Decisioning
  • 11.15 - 11.45 Break
  • 11.45 - 12.25 Diego Klabjan (Northwestern, Industrial Engineering & Management Sciences)
    Title: Classification-Based Security Prediction Using Deep Neural Networks 
  • 12.25 - 13.05 Peter Decrem (Citi)
    Title: Application of Natural Language Processing and Related Machine Learning Techniques at Large Commercial Banks
  • 13.05 - 14.25 Lunch (A boxed lunch will be provided)
  • 14.25 - 15.05 Matthew Dixon (Stuart School of Business, Illinois Institute of Technology)
    Title: High Frequency Market Making with Machine Learning 
  • 15.05 - 15.45 Afsheen Afshar (Cerberus Capital, Chief Artificial Intelligence Officer)
    Title: Real-world challenges of using AI in the enterprise
  • 15.45 - 16.10 Break
  • 16.10 - 16.50 Harvey Stein (Bloomberg)
    Title: Big Data's Dirty Secret
  • 17.00 Wine reception - Davis Auditorium Lobby (CEPSR Building)

 

 

Participating Researchers/Practitioners

Use of Machine Learning to Thwart Phishing Attacks

Abstract: Email-based phishing has evolved from a long-shot one-off scam to a routine, mechanized, effective practice driving over a billion dollars of wire fraud a year. This talk will detail how machine learning -- specifically anomaly detection algorithms -- can identify and flag spear phishing attempts. We'll work through specific real-world examples and detail how we identify them.

Bio: 

David Baggett graduated Magna Cum Laude with a B.S./B.A. in Computer Science and Linguistics from the University of Maryland in 1992. As an undergraduate, he worked extensively with Bill Pugh and Bill Gasarch in the Computer Science Department, and with Sharon Inkelas in the Linguistics Department. In addition to undergraduate work in type theory and recursion theory, David studied tonal phenomena in African languages. He received the CMPS Dean's Award for Academic Excellence in 1992.

After the University of Maryland, David entered the Ph.D. program at the MIT Artificial Intelligence Laboratory, where he studied computational linguistics with Robert Berwick under an Office of Naval Research (NDSEG) Graduate Fellowship. His 1994 master's thesis, A System for Computational Phonology included a complete implementation of autosegmental phonological theory.

In 1994, David left MIT to join video game company Naughty Dog, where he co-developed the Crash Bandicoot series for the Sony Playstation. The Crash games were worldwide best-sellers that redefined the state of the art, leading Sony to adopt Crash as its mascot character. The success and appeal of Crash significantly contributed to Sony's ability to enter the video game console market and quickly establish supremacy over former industry leaders Nintendo and Sega.

The Crash series represented a technical high water mark for the Playstation 1. As one of two developers on the first game in the series, David pioneered the use of distributed polygonal scene precomputation to vastly reduce rendering time and greatly increase scene complexity. David also introduced octree-based collision detection — an essential innovation for Crash's lush, organic environments — to the video game world. David implemented many of the level processing tools and the entire rendering pipeline, and produced the music for the first three games in the series.

David next co-founded ITA Software with two other MIT AI Lab graduates. ITA Software developed the first new airfare pricing and shopping software in decades, and licensed the technology to most major US and many international airlines. ITA Software's technology powers OrbitzKayakHipmunk, and many other travel industry websites. Google completed its purchase of ITA Software for over $700M in April, 2011. ITA Software now powers Google Flights.

David's latest venture is Inky -- a cloud-based email protection system that uses machine learning and computer vision algorithms to identify email forgeries and prevent pishing attacks. Try it!

David lives in the Washington, DC area with his wife, Catherine, and their two children — Charlie and Lizzie.

View a detailed Resume/Curriculum Vitae.

View slides

 

Model-free Option Pricing and Hedging by Reinforcement Learning

Abstract: In discrete time, option hedging and pricing amount to sequential risk minimization. In particular, a discrete-time version of the Black-Scholes-Merton (BSM) option pricing model can be formulated as a problem of dynamic Markowitz optimization of an option replicating (hedge) portfolio made of an underlying stock and cash. This talk shows how this problem can be approached using Reinforcement Learning (RL). Once the problem is posed as an RL problem, option pricing and hedging can be done without any model for the underlying stock dynamics, using instead model-free, data-driven RL methods such as Q-learning and Fitted Q Iteration. As a result, both option price and hedge are obtained by a well-defined and converging maximization problem that uses only market prices and option trading data (inter-temporal re-hedges and hedge losses in the replicating portfolio) to find the optimal option hedge and price. The model can learn when re-hedges in data are suboptimal/noisy, or even purely random. This means, in particular, that our RL model can learn the BSM model itself, if the world is according to BSM. 

Computationally, the RL-based option pricing model is very simple, as it uses only basic linear algebra and linear regressions to compute the option price and hedge. The only tunable parameters in this approach are parameters defining the optimal hedge and price themselves. This approach does not need any model calibration (as there is no model anymore), and it automatically solves the volatility smile problem of the BSM model. We also discuss some extensions of this approach, including in particular an Inverse Reinforcement Learning setting, where inter-temporal losses from re-hedges are unobservable.     

Bio: Igor Halperin is Research Professor of Financial Machine Learning at NYU Tandon School of Engineering. His research focuses on using methods  of Reinforcement Learning, Information Theory, neuroscience and physics for financial problems such as portfolio optimization, dynamic risk management, and inference of sequential decision-making processes of financial agents.

Igor has an extensive industrial experience in statistical and financial modeling, in particular in the areas of option pricing, credit portfolio risk modeling, portfolio optimization, and operational risk modeling. Prior to joining NYU Tandon, Igor was an Executive Director of Quantitative Research at JPMorgan, and before that he worked as a quantitative researcher at Bloomberg LP. Igor has published numerous articles in finance and physics journals, and is a frequent speaker at financial conferences. He has also co-authored the book “Credit Risk Frontiers” published by Bloomberg LP.

Igor has a Ph.D. in theoretical high energy physics from Tel Aviv University, and a M.Sc. in nuclear physics  from St. Petersburg State Technical University. He advices a several fintech and data science start-ups and risk management firms.

View slides

Machine Learning Powers Better Decisioning

Abstract: American Express has always been at the cutting-edge of analytics and decision science. Today, our cutting-edge Machine learning and Artificial Intelligence capabilities are critical to driving key decisions that better serve Card Members and produce industry-best credit and fraud risk management. Learn about American Express’ successes and challenges to applying machine learning to improve each and every process in the Company, and the techniques to help sustain high performance of its models.

Bio: Alexander Statnikov is a Vice President of Machine Learning and Global Line Modeling in the Risk and Information Management organization at American Express. He plays an essential role in leading Machine Learning activities across the enterprise. He has founded and is leading the Machine Learning Advisory Board, a governance body that oversights and prioritizes initiatives related to Machine Learning, AI and Advanced Data Analytics conducted globally at American Express. Alexander is also functionally responsible for developing credit line assignment models across all markets and portfolios. Finally, he is leading Machine Learning Workgroup that conducts cutting edge research and strengthens knowledge and education in Machine Learning and Data Science. Prior to joining American Express, Alexander was an Associate Professor at New York University specializing in various areas of Machine Learning and Data Science. He is the author of 80+ articles, 5 books and monographs, and 13 patents (issued and pending).

Classification-Based Security Prediction Using Deep Neural Networks

Abstract: Market movement direction predictions should take many financial instruments simultaneously into account due to their correlations. This intrinsic complexity leads to thousands of possible features and thus it is appropriate for deep neural networks. We apply a feed forward network on thousands of features and we compare it with more advanced recurrent neural networks that combine convolutional layers for feature embedding within long-short-term-memory cells. We also consider a model that makes predictions a dynamic time horizon in the future based on the confidence of predictions. The models were evaluated on two data sets consisting of commodity futures and the other one on ETL’s. We found out that in the walk forward evaluation process the models tend to overfit and thus a new technique is introduced to cope with this phenomenon. Advanced models outperform feed forward based on prediction accuracy.

Bio: Diego Klabjan is a professor at Northwestern University, Department of Industrial Engineering and Management Sciences. He is also Founding Director, Master of Science in Analytics. After obtaining his doctorate from the School of Industrial and Systems Engineering of the Georgia Institute of Technology in 1999 in Algorithms, Combinatorics, and Optimization, in the same year he joined the University of Illinois at Urbana-Champaign. In 2007 he became an associate professor at Northwestern and in 2012 he was promoted to a full professor. His research is focused on machine learning, deep learning and analytics with concentration in finance, transportation, sport, and bioinformatics. Professor Klabjan has led projects with large companies such as Intel, Baxter, Allstate, AbbVie, FedEx Express, General Motors, United Continental, and many others, and he is also assisting numerous start-ups with their analytics needs. He is also a founder of Opex Analytics LLC.

View slides

Application of Natural Language Processing and Related Machine Learning Techniques at Large Commercial Banks

Abstract: Increased digitalization of communication and recent advances in natural language processing allow us to satisfy new regulatory requirements and to advance automation in the financial industry.  But our industry has its own quirks and challenges – a unique, highly formalized parlance coupled with a lack of large sets of labeled data. We use neural nets and a variety of tools from statistical machine learning to help us solve these evolving problems.  Even more exciting, these methods can now be applied to pricing and risk management methods; fields that have largely stagnated over the last few decades, and that have not adapted to the reduced holding periods of risk by liquidity providers.  Comprehensive data policies and the ability to integrate probabilistic models on this data are preconditions for successful deployment of machine learning in capital markets.

Bio: Peter works in the Rates Trading group at Citi. He focuses on machine learning for the implementation of pricing and risk analytics. Peter has developed neural net applications for natural language processing, as well as probabilistic graphical models for pricing. Peter joined Citibank’s Fixed Income Algo trading group in 2011. This team has deployed the largest bank systematic trading and execution platform for treasuries and bond futures.   

Prior to joining Citi, Peter headed the Rates Group at Quantifi. There, he was responsible for managing the product development process for all Rates, Convertible Bonds and FX Options Solutions within the Quantifi product suite. Peter started his career in Research and Technology at Bear Stearns before heading fixed income derivatives research for Deutsche Bank. He has traded fixed income derivatives (linear and non-linear products), government bonds and agencies for Lehman Brothers and Salomon Brothers. He headed the fixed income derivatives trading desk for a number of European banks. Peter has worked on GPU-based software applications in his areas of expertise for more than ten years. He has appeared as a Speaker on the use of GPUs for the computation of risk, pricing of exotic derivatives and HPC in general, at Nvidia and Microsoft conferences.

High Frequency Market Making with Machine Learning

Abstract: High frequency trading has been characterized as an arms race with 'Red Queen' characteristics [Farmer,2012]. It is improbable, even impossible, that many market participants can sustain a competitive advantage through the sole reliance on low latency trade execution systems. The growth in volume of market data, advances in computer hardware and commensurate prominence of machine learning in other disciplines, have spurred the exploration of machine learning for price discovery. Even though the application of machine learning to price prediction has been extensively researched, the merit of this approach for high frequency market making has received little attention. 

This paper introduces a trade execution model to evaluate the economic impact of classifiers  through backtesting. Extending the concept of a confusion matrix, we present a 'trade information matrix' to attribute the expected profit and loss of tick level predictive classifiers under execution constraints, such as fill probabilities and position dependent trade rules, to correct and incorrect predictions. We apply the execution model and trade information matrix to Level II E-mini S&P 500 futures history and demonstrate an estimation approach for measuring the sensitivity of the P&L to classification error. Our approach directly evaluates the performance sensitivity of a market making strategy to classifier error and augments traditional market simulation based testing. 

Bio: Matthew Dixon is an Assistant Professor of Finance and Statistics at the Illinois Institute of Technology. His research in computational methods for finance is funded by Intel. Matthew began his career in structured credit trading at Lehman Brothers in London before pursuing academics and consulting for financial institutions in quantitative trading and risk modeling. He holds a Ph.D. in Applied Mathematics from Imperial College (2007) and has held postdoctoral and visiting professor appointments at Stanford University and UC Davis respectively. He has published over 20 peer reviewed academic publications on machine learning and financial modeling, has been cited in Bloomberg Markets and the Financial Times as an AI in fintech expert, and is a frequently invited speaker in Silicon Valley and on Wall Street. He has published R packages, served as a Google Summer of Code mentor and is the founder of an analytics consulting firm, Quiota LLC.

View slides

Real-world challenges of using AI in the enterprise

Abstract: Recent advances in the field of AI has been exciting and the resultant hype has been great. However, most enterprises have yet to substantively harness their data to positively affect their bottom lines. In this talk, we discuss some of the underlying technological and cultural reasons as well as approaches for success.  From a technological perspective, there are a multitude of legacy systems with different formats and data models that must be merged. We discuss some approaches for managing this landscape using ad-hoc query methods. In addition, while technological and analytical challenges abound, having a high degree of cultural sensitivity, empathy for the end-user, and design-orientation are key to success. We discuss a few high profile examples of technologically advanced AI products that 

have failed to gain traction.

Bio: Afsheen Afshar is the Chief Artificial Intelligence Officer for Cerberus. He is a seasoned business leader and deeply technical professional with hands-on knowledge across the entire data and analytics value chain in multiple industries. His experience spans business management, rapid organizational growth, design and creation of industrial-scale data and analytics infrastructure, bleeding-edge artificial intelligence algorithm design, product design and creation, and large-scale value generation.

Professional History:

  • Chief Artificial Intelligence Officer and Senior Managing Director, Cerberus
  • Chief Data Science Officer and Managing Director, JPMorgan, Corporate and Investment Bank
  • Managing Director, Goldman Sachs

Accomplishments:

  • Creating and leading first artificial intelligence function from scratch for one of the world’s leading investment firms
  • Advising, assisting, and serving as a senior resource to Cerberus’s affiliates, portfolio companies, and investments worldwide
  • Created, grew, and led multiple industrial-strength data science and engineering functions at two of the world’s largest and most prestigious investment banks. One of the first to do so.
  • Created one of the first scalable and flexible data and analytics platforms that could be used by technical as well as non-technical professionals alike

 

Big Data's Big Secret

Abstract: Let the data speak for themselves. We apply machine learning to the problem of big data. These are two commonly heard phrases these days. But what data exactly are we speaking about, and what do we intend to do with it? What is ignored all too often is the quality of the data being used and how it impacts the analyses being done. Are there holes in the data? Are there anomalies? Given how dirty data can be, a more apt phrase might be "Garbage in, garbage out".

In this talk we will discuss some of the data problems we've encountered in financial data, and approaches that can be used to address them. Our particular focus will be on techniques we've employed to deal with missing data and bad data in credit default swap (CDS) spread histories.

Bio: Dr. Harvey J. Stein is Head of the Quantitative Risk Analytics Group at Bloomberg, responsible for all quantitative aspects of Bloomberg's risk analysis products. Dr. Stein is well known in the industry, having published and lectured on mortgage backed security valuation, CVA calculations, interest rate and FX modeling, credit exposure calculations, financial regulation, and other subjects. Dr. Stein is also on the board of directors of the IAQF, an adjunct professor at Columbia University, a board member of the Rutgers University Mathematical Finance program and of the NYU Enterprise Learning program, and organizer of the IAQF/Thalesians financial seminar series. He received his BA in mathematics from WPI in 1982 and his PhD in mathematics from UC Berkeley in 1991.

View slides

 

Registration

Online registration is now available. REGISTER HERE
Early registration is available until April 4th after which regular registration rates will apply. The early (regular) registration rates are:

Corporate delegates: $150 ($200)
Non-Columbia students*: $40 ($50)
Columbia students*: $30 ($40)

*Those availing of student rates will be required to show valid student ID at the event.

Please contact Professor Ali Hirsa for further details.

**Deadline to request a refund is Friday, April 13, 2018 at 12PM noon.

Contact Information

Ali Hirsa
Columbia Affiliations