Machine Learning in Finance Workshop

The workshop is organized by:



Scott Bauguess (Division of Economics and Risk Analysis at the SEC)

Title: The Hope and Limitations of Machine Learning in Market Risk Assessment

Market risk assessment at the SEC has historically been a manifestation of market supervision, registrant inspections, and review of referrals, tips, and complaints related to potential market misconduct. Efforts have expanded since the financial crisis to include increased focus on market risks beyond misconduct, such as the risk of sequential counterparty failure from the use of OTC derivatives and financial stability related to liquidity risks at pooled investment funds. These assessment efforts are often informed through human review of (mandatory) financial disclosures and other market information. More recently the Commission has engaged in big data initiatives and developed new analytical tools to aid assessment activity. While these programs offer promise and have already enhanced classical risk assessment activities, there remain challenges to collecting relevant data, generating risk measures, modeling market behaviors, and connecting model output to tangible actions for regulators to take. This presentation will illustrate some of the progress and challenges that lie ahead.


Terrence Hendershott (Haas School of Business, UC Berkeley)

Title: Are Institutions Informed About News?

This paper combines daily non-public data on buy and sell volume by institutions from 2003 through 2005 for NYSE-listed stocks with all news announcements from Reuters. Natural language processing categorizes the sentiment associated with each news story. We use institutional order flow (buy volume minus sell volume) as a quantitative measure of net trading by institutions. We find evidence that institutional investors are informed: i) institutional trading volume predicts the occurrence of news announcements; ii) institutional order flow predicts the sentiment of the news; iii) institutional order flow predicts the stock market reaction on news announcement days; iv) institutional order flow predicts the stock market reaction on crisis news days; v) institutional order flow predicts earnings announcement surprises; and vi) institutions do not believe the hype.


Shawn Mankad (Robert H. Smith School of Business, University of Maryland)

Title: Do U.S. Financial Regulators Listen to the Public? Testing the Regulatory Process with the RegRank Algorithm?

We examine the notice-and-comment process and its impact on influencing regulatory decisions by analyzing the text of public rule-making documents of the Commodity Futures Trading Commission (CFTC) and associated comments. For this task, we develop a data mining framework and an algorithm called RegRank, which learns the thematic structure of regulatory rules and public comments and then assigns tone weights to each theme to come up with an aggregate score for each document. Based on these scores we test the hypothesis that the CFTC adjusts the final rule issued in the direction of tone expressed in public comments. Our findings strongly support this hypothesis and further suggest that this mostly occurs in response to comments from the regulated financial industry. We posit that the RegRank algorithm and related text mining methods have the potential to empower the public to test whether it has been given the "due process" and hence keep government agencies in check.


William Morokoff (Standard and Poors)

Title: Modeling Challenges for Credit Risk and Economic Forecasting 

Machine learning techniques hold great promise for applications to credit risk and econometric modeling. However there are a number of special challenges in these areas associated with relatively low frequency data (quarterly or annual) and the need to model the probability of rare events such as investment grade defaults. In this talk we consider a number of applications developed at S&P of machine learning methods including probability of default modeling, credit risk ranking, issuance forecasting, and automated review of regulatory filings. We consider the performance of these models and various challenges arising from data limitations as well as other factors such as calibration to expert judgment. We also consider advantages and disadvantages of different approaches in the context of the examples.


Stefano Pasquali (Bloomberg Research)  

Title: LIQUIDITY RISK AND MARKET IMPACT. Estimating liquidity using price uncertainty and machine learning. A fixed income case study.

The Liquidity Assessment Tool (LQA) helps measure market depth and liquidity of securities for the purposes of regulatory reporting, risk and pre/post trade analysis. It provides information, based on machine learning approach, such as the probability of selling a specific volume at a specific price, the expected cost of liquidation, expected maximum volume that can be liquidated given a maximum market impact, and expected days to liquidate a specific volume given a maximum market impact. The tool also provides the level of uncertainty for each of these returns.

Tobias Preis (Warwick Business School)

          Title: Measuring and Predicting Human Behaviour Using Online Data

In this talk, I will outline some recent highlights of our research, addressing two questions. Firstly, can big data resources provide insights into crises in financial markets? By analysing Google query volumes for search terms related to finance and views of Wikipedia articles, we find patterns which may be interpreted as early warning signs of stock market moves. Secondly, can we provide insight into international differences in economic wellbeing by comparing patterns of interaction with the Internet? To answer this question, we introduce a future-orientation index to quantify the degree to which Internet users seek more information about years in the future than years in the past. We analyse Google logs and find a striking correlation between the country's GDP and the predisposition of its inhabitants to look forward. Our results illustrate the potential that combining extensive behavioural data sets offers for a better understanding of large scale human economic behaviour.

Preis, T., Moat, H. S., Stanley, H. E. & Bishop, S. R. Quantifying the Advantage of Looking Forward. Sci. Rep. 2, 350 (2012).

Preis, T., Moat, H. S. & Stanley, H. E. Quantifying trading behavior in financial markets using Google Trends. Sci. Rep. 3, 1684 (2013).

Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E. & Preis, T. Quantifying Wikipedia usage patterns before stock market moves. Sci. Rep. 3, 1801 (2013).

Curme, C., Preis, T., Stanley, H. E., Moat, H. S. Quantifying the semantics of search behavior before stock market moves. PNAS 111, 11600 (2014).


Stephen Purpura (Context Relevant)   

Title: Dynamic Market Structure Discovery: Automated Detection with Machine Learning

Broad adoption of machine learning in finance is constrained by the time and expertise required to develop, deploy, and control complex systems.  We demonstrate a simplified approach with automated generation of a machine learning model to predict market direction of interest rate swaps.  Predictions are further improved by learning from client trading behavior in other products. Finally, we show how automation technology enables dynamic model updates and discovery of new signals without human intervention. We conclude with an overview of challenges and opportunities in finance.

 Marti Subrahmanyan (Stern School of Business, NYU)   

Title: Informed Options Trading Prior to M&A Announcements: Insider Trading?

We investigate informed trading activity in equity options prior to the announcement of corporate mergers and acquisitions (M&A). For the target companies, we document pervasive directional options activity, consistent with strategies that would yield abnormal returns to investors with private information. This is demonstrated by positive abnormal trading volumes, excess implied volatility and higher bid-ask spreads, prior to M&A announcements. These effects are stronger for out-of-the-money (OTM) call options and subsamples of cash offers for large target firms, which typically have higher abnormal announcement returns. The probability of option volume on a random day exceeding that of our strongly unusual trading (SUT) sample is trivial - about three in a trillion. We further document a decrease in the slope of the term structure of implied volatility and an average rise in percentage bid-ask spreads, prior to the announcements. For the acquirer, we provide evidence that there is also unusual activity in volatility strategies. A study of all Securities and Exchange Commission (SEC) litigations involving options trading ahead of M&A announcements shows that the characteristics of insider trading closely resemble the patterns of pervasive and unusual option trading volume. Historically, the SEC has been more likely to investigate cases where the acquirer is headquartered outside the US, the target is relatively large, and the target has experienced substantial positive abnormal returns after the announcement.


500 W. 120th St., 918 Mudd, New York, NY 10027    212-854-2905                
©2012 Columbia University