Machine Learning in Finance Workshop 2017

Events

Past Event

Machine Learning in Finance Workshop 2017

April 21, 2017
8:00 AM - 5:00 PM
Event time is displayed in your time zone.
Lerner Hall (2920 Broadway, New York, NY 10027)

The workshop is organized by:

cfe logo
Data Science logo

Workshop Program

The following is a tentative schedule.

  • 8.15 - 9.00 Registration
  • 9.00 - 9.15 Introduction
  • 9.15 - 9.55 Markus Pelger (Stanford, Management Science and Engineering)
    Title: Estimating Latent Asset Pricing Factors from Large-Dimensional Data
  • 9.55 - 10.35 Andrew Gelman (Columbia, Statistics and Political Science)
    Title: What Can We Learn from Data?
  • 10.35 - 11.15 Michael Kearns (U Penn, Computer and Information Science)
    Title: Trading Without Regret
  • 11.15 - 11.45 Break
  • 11.45 - 12.25 Bruno Dupire (Bloomberg)
    Title: Understanding What the Machine Understands
  • 12.25 - 13.05 Jonathan Larkin (Quantopian)
    Title: Herding Robotic Cats: Constructing a Single Portfolio from Hundreds of Thousands of Autonomous Strategies
  • 13.05 - 14.05 Lunch (A boxed lunch will be provided)
  • 14.05 - 14.45 Stefano Pasquali (BlackRock)
    Title: Unified liquidity risk management framework. Where and why use machine learning?
  • 14.45 - 15.25 Harry Mamaysky (Columbia, Graduate School of Business)
    Title: How News and Its Context Drive Risk and Returns Around the World
  • 15.25 - 15.45 Break
  • 15.45 - 16.25 Mayur Thakur (Goldman Sachs)
    Title: Surveillance Development: A Case Study
  • 16.25 - 17.05 Roberto Rigobon (MIT, Sloan)
    Title: The Billion Prices Project: Using Small Data to Improve Big Data
  • 17.10 Wine reception

 

 

Abstracts & Biographies

Understanding What the Machine Understands

Abstract: Machine Learning is a potent paradigm but is often perceived as an opaque methodology and this black box aspect prevents many potential users to trust it. We show how we can get some insights into the inner workings of various learning techniques by presenting some visualizations that shed light on the learning process and interaction tools that improve the progress of the algorithm. We demonstrate it on various financial examples such as pricing of illiquid assets, surprise extraction, sentiment analysis, dividend classification, optimal VWAP replication, default prediction… We also show that it is possible to “interview” a neural net in order to both accelerate its training and to make it produce unexpected suggestions.

Bio: Bruno Dupire is head of Quantitative Research at Bloomberg L.P., which he joined in 2004. Prior to this assignment in New York, he has headed the Derivatives Research teams at Société Générale, Paribas Capital Markets and Nikko Financial Products where he was a Managing Director. He is best known for having pioneered the widely used Local Volatility model (simplest extension of the Black-Scholes-Merton model to fit all option prices) in 1993 and the Functional Itô Calculus (framework for path dependency) in 2009. He is a Fellow and Adjunct Professor at NYU and he is in the Risk magazine “Hall of Fame”. He is the recipient of the 2006 “Cutting edge research” award of Wilmott Magazine and of the Risk Magazine “Lifetime Achievement” award for 2008. Back in 1987 he had used neural nets to forecast currency movements.

What Can We Learn from Data?

Abstract: The standard framework for statistical inference leads to estimates that are horribly biased and noisy for many important examples. And these problems all even worse as we study subtle and interesting new questions. Methods such as significance testing are intended to protect us from hasty conclusions, but they have backfired: over and over again, people think they have learned from data but they have not. How can we have any confidence in what we think we've learned from data? One appealing strategy is replication and external validation but this can be difficult in the real world of social science. We discuss statistical methods for actually learning from data without getting fooled.

Bio: Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He has received the Outstanding Statistical Application award from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), and A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina).

Andrew has done research on a wide range of topics, including: why it is rational to vote; why campaign polls are so variable when elections are so predictable; why redistricting is good for democracy; reversals of death sentences; police stops in New York City, the statistical challenges of estimating small effects; the probability that your vote will be decisive; seats and votes in Congress; social network structure; arsenic in Bangladesh; radon in your basement; toxicology; medical imaging; and methods in surveys, experimental design, statistical inference, computation, and graphics.

Trading Without Regret

Abstract: No-regret learning is a collection of tools designed to give provable performance
guarantees in the absence of any statistical or other assumptions on the data (!), and thus
stands in stark contrast to most classical modeling approaches. With origins stretching
back to the 1950s, the field has yielded a rich body of algorithms and analyses that
covers problems ranging from forecasting from expert advice to online convex optimization.
I will survey the field, with special emphasis on applications to quantiative finance problems,
including portfolio construction and inventory risk.

Bio: Michael Kearns is Professor and National Center Chair in the Department of Computer and Information Sciences at the University of Pennsylvania, with secondary appointments in the departments of Economics, and in Statistics, and Operations, Information and Decisions of the Wharton School. Kearns is the Founding Director of Penn’s Warren Center for Network and Data Sciences, as well as the Penn Program in Networked and Social Systems Engineering. His research includes topics in machine learning, algorithmic game theory, computational social science, and quantitative finance and algorithmic trading. Kearns spent a decade at AT&T Bell Labs, where he was head of the AI department, which conducted a range of systems and foundational AI and machine learning research. He has consulted extensively in the technology and finance industries, and is currently Chief Scientist of MANA Partners, a New York trading and technology company. He studied math and computer science at UC Berkeley, and completed his Ph.D. in computer science at Harvard University in 1989. He is an elected Fellow of the American Academy of Arts and Sciences, the Association for Computing Machinery, the Association for the Advancement of Artificial Intelligence, and the Society for the Advancement of Economic Theory.

Herding Robotic Cats: Constructing a Single Portfolio from Hundreds of Thousands of Autonomous Strategies

Abstract:
Many multi-strategy and multi-manager investment managers are faced with a common problem: how to implement a single portfolio subject to a single investment mandate when the collection of underlying strategies are autonomous, private, and independent. This talk demonstrates a framework to solve this problem.

Bio: 
Jonathan directs Quantopian's investment strategy, leading the effort to identify, select, and allocate capital to investment algorithms created by Quantopian's community of more than 120,000 algorithm writers. Writers of selected algorithms share in the profits generated by their algorithms. Larkin's prior experience spans senior roles at some of the largest multi-manager and quantitative investment firms in the world. He was most recently a Portfolio Manager at Hudson Bay Capital Management LP. Previously, he held the roles of Portfolio Manager and Global Co-Head of Equities at BlueCrest Capital Management LP, Managing Director at Nomura Securities, and Senior Managing Director and Global Head of Equities at Millennium Management LLC.

How News and Its Context Drive Risk and Returns Around the World

Abstract:
We develop a novel methodology for classifying the context of news articles to predict risk and return in stock markets. For a set of 52 developed and emerging market economies, we show that a parsimonious summary of news, including context-specific sentiment, predicts future countrylevel market outcomes, as measured by returns, volatilities, or drawdowns. Our approach avoids data mining biases that may occur when relying on particular word combinations to detect changes in risk. The effect of present news on future market outcomes differs by news category, as well as across emerging and developed markets. Importantly, news stories about emerging markets contain more incremental information – in relation to known predictors of future returns – than do news stories about developed markets. We also find evidence of regime shifts in the relationship between future market outcomes and news. Out-of-sample testing confirms the efficacy of our approach for forecasting country-level market outcomes. (Joint work with Charles Calomiris)

Bio: 
Harry Mamaysky is an associate professor of professional practice at Columbia Business School, and the director of the News and Finance research initiative at the Business School’s Program for Financial Studies. He was formerly head of the Systemic Risk Group at Citigroup and a member of the firm's Risk Executive Committee. Previous to that, he was senior portfolio manager in Citi Principal Strategies, where he co-managed the relative value credit book. Before joining Citigroup, he held positions with Old Lane, Morgan Stanley, and Citicorp. He was also an assistant professor of finance at the Yale School of Management during the period 2000–02.

View slides

Unified liquidity risk management framework. Where and why use machine learning?

Abstract:
We will present an overview of several machine learning application research ideas to leverage this novel tool in financial applications.  We will provide an overview of an holistic liquidity risk management framework where machine learning is under investigation for some of the components. We will frame the problem to solve as a complete multi period optimization framework based on risk, transaction cost and redemption modeling. In particular we will provide the first research outcome of Fund Flows forecast using neural network to estimate conditional extreme value distributions as a concrete example where advanced machine learning tools can provide promising results in the out of sample test.  

Bio: 
Stefano Pasquali Managing Director, is the Head of Liquidity Research Group at BlackRock Solutions. As Head of Liquidity Research, Mr. Pasquali is responsible for market liquidity modelling both at the security and portfolio level, as well as estimating portfolio liquidity risk profiles. His responsibilities include defining cross asset class models, leveraging available trade data and developing innovative machine learning based approaches to better estimate market liquidity. Mr. Pasquali is heavily involved in developing methodologies to estimate funding liquidity and better estimate funds flows. These models include: the cost of position or portfolio liquidation, time to liquidation, redemption estimation, and investor behavior modelling utilizing a big data approach. Stefano is a member of the Government Relations Steering Committee within BlackRock.

Previous to Blackrock, Mr. Pasquali oversaw product development and research for Bloomberg's liquidity solution, introducing a big data approach to their financial analytics. His team designed and implemented models to estimate liquidity and risk across different asset classes with a particular focus on OTC markets. Before this he lead business development and research for fixed income evaluated pricing. Mr. Pasquali has more than 15 years of experience examining and implementing innovative approaches to calculating risk and market impact. He regularly speaks at industry events about the complexity and challenges of liquidity evaluation ̶ particularly in the OTC marketplace. His approach to risk and liquidity evaluation is strongly influenced by over 20 years of experience working with big data, data mining, machine learning and data base management.

Prior to moving to New York in 2010, Mr. Pasquali held senior positions at several European banks and asset management firms where he oversaw risk management, portfolio risk analysis, model development and risk management committees. These accomplishments include the construction of a risk management process for a global asset management firm with over 100 Billion AUM. This involved driving projects from data acquisition and normalization to model development and portfolio management support.

Mr. Pasquali, a strong believer in academic contribution to the industry, has engaged in various conversations and collaborations with universities from the US, UK, and Italy. He also participates as a supervisor in the Experiential Learning Program and Masters of Quantitative Finance Program based at Rutgers University, along with tutoring students in research activities.

Before his career in finance, Mr. Pasquali was a researcher in Theoretical and Computational Physics (in particular Monte Carlo Simulation, Solid State physics, Environment Science, Acoustic Optimization). Originally from Carrara (Tuscany, Italy), he grew up in Parma. Mr. Pasquali is a graduate of Parma University and holds a master’s degree in Theoretical Physics, as well as research fellowships in Computational Physics at Parma University and Reading University (UK).

Estimating Latent Asset Pricing Factors from Large-Dimensional Data

Abstract:
We develop an estimator for latent factors in a large-dimensional panel of financial data that can explain expected excess returns. Statistical factor analysis based on Principal Component Analysis (PCA) has problems identifying factors with a small variance that are important for asset pricing. Our estimator searches for factors with a high Sharpe-ratio that can explain both the expected return and covariance structure. We derive the statistical properties of the new estimator based on new results from random matrix theory and show that our estimator can find asset-pricing factors, which cannot be detected with PCA, even if a large amount of data is available. Applying the approach to portfolio and stock data we find factors with Sharpe-ratios more than twice as large as those based on conventional PCA. Our factors accommodate a large set of anomalies better than notable four- and five-factor alternative models.

Bio:
Markus Pelger is an Assistant Professor at the Management Science & Engineering Department at Stanford University and a Reid and Polly Anderson Faculty Fellow at Stanford University. His research interests are in statistics, financial econometrics, asset pricing and risk management. His work includes contributions in statistical factor analysis, high-frequency statistics, credit risk modeling and management compensation. He is particularly interested in how systematic risk factors and tail risk in the form of jumps influence the price of assets and the incentives of agents. For this purpose he has developed various statistical tools to estimate unknown risk factors from large dimensional data sets and from high-frequency data. He uses them empirically to predict asset prices and construct trading strategies. Markus received his Ph.D. in Economics from the University of California, Berkeley. He is a scholar of the German National Merit Foundation and he was awarded a Fulbright Scholarship, the Institute for New Economic Thinking and Eliot J. Swan Prize. He has two Diplomas in Mathematics and in Economics, both with highest distinction, from the University of Bonn in Germany.

View slides

The Billion Prices Project: Using Small Data to Improve Big Data

Bio:
Roberto Rigobon is the Society of Sloan Fellows Professor of Applied Economics at the Sloan School of Management, MIT, a research associate of the National Bureau of Economic Research, a member of the Census Bureau’s Scientific Advisory Committee, and a visiting professor at IESA.

Roberto is a Venezuelan economist whose areas of research are international economics, monetary economics, and development economics. Roberto focuses on the causes of balance-of-payments crises, financial crises, and the propagation of them across countries - the phenomenon that has been identified in the literature as contagion. Currently he studies properties of international pricing practices, try to produce alternative measures of inflation, and is one of the two founding members of the Billion Prices Project, and a co-founder of PriceStats.

Roberto joined the business school in 1997 and has won three times the "Teacher of the year" award and three times the "Excellence in Teaching" award at MIT. He got his Ph.D. in economics from MIT in 1997, an MBA from IESA (Venezuela) in 1991, and his BS in Electrical Engineer from Universidad Simon Bolivar (Venezuela) in 1984. He is married with three kids.

View slides

 Surveillance Development: A Case Study

Abstract:
A “surveillance” is a mathematical model, implemented in code, that takes in as input a large amount of data (for example, hundreds of millions of trades and billions of market data points) and identifies those parts of the data that look suspicious. In the above example, the surveillance model may identify, say, a hundred trades as suspicious. We will present one specific surveillance model and our experience implementing it in production. Through this example we will present the key technical challenges and a surveillance architecture that we have developed. This architecture lets us ingest and store billions of rows of data per day, is easy to update, and scales to hundreds of concurrently running jobs and dozens of users running ad hoc queries.

Bio:
Mayur is head of the Data Analytics Group in the Global Compliance Division. He joined Goldman Sachs as a managing director in 2014. Prior to joining the firm, Mayur worked at Google, where he designed search algorithms for more than seven years.

Previously, he was an assistant professor of computer science at the University of Missouri. Mayur earned a PhD in Computer Science from the University of Rochester in 2004 and a BTech in Computer Science and Engineering from the Indian Institute of Technology, Delhi, in 1999. 

Registration

Online registration is available.
Early registration is available until April 4th after which regular registration rates will apply. The early (regular) registration rates are:


Corporate delegates: $100 ($150)
Non-Columbia students*: $30 ($40)
Columbia students*: $20 ($30)

*Those availing of student rates will be required to show valid student ID at the event.

Please contact Martin Haugh <[email protected]> for further details.

**Deadline to request a refund is Wednesday, April 19, 2017 at 12PM noon.

Contact Information

Ali Hirsa
Columbia Affiliations