Redefining NYC neighborhoods using open data and machine learning

Abstract: New York City’s (NYC’s) neighborhoods are a driving force in the lives of New Yorkers—their identities are closely intertwined and a source of pride. However, the history and evolution of NYC’s neighborhoods don’t follow the rigid, cold lines of statistical and administrative boundaries. Instead, the neighborhoods we live and work in are the result of a more organic confluence of factors. NewerHoods is an interactive web-app that uses open data to generate localized features at the US Census tract-level then clusters them spatially to define a data-driven neighborhood. Traditional clustering algorithms tend to be a-spatial or indifferent to the geographic adjacency of tracts which is problematic when identifying neighborhoods. To solve this, we use a spatial hierarchical clustering model which allow us to balance between spatial vicinity and similarity in the feature space. Users are able to select characteristics of interest (currently open data on housing, crime, and 311 complaints), visualize NewerHood clusters on an interactive map, find similar neighborhoods, and compare them against existing administrative boundaries.  The tool is designed to enable users without in-depth data expertise to compare and incorporate these redefined neighborhoods into their work and life. This presentation will dig into the methodology behind NewerHoods.

NewerHoods was developed by the Two Sigma Data Clinic. The Two Sigma Data Clinic develops pro bono solutions that enable social impact organizations to use data and technology more effectively, and have a greater impact on the communities they serve. As data-driven decision-making has proliferated across sectors, nonprofits have lagged behind due to funding and resource constraints. While they may be in the early stages of data collection, the widespread availability of open data has the potential to fill these gaps and inform nonprofits’ operations and programming.
Bio: Darren Vengroff is a Senior Vice President at Two Sigma, where he leads the AI Engineering team. Previously, Darren served as Chief Scientist at two startups (Rich Relevance and Meld), and CTO of a third (Pelago). Prior to that he was a Principal Engineer at where he developed personalization systems, a Core Strategist at Goldman Sachs and a Software Engineer at Microsoft. He has also served as an advisor to the Gates Foundation and a number of startups. Darren holds both an M.S. and a Ph.D. in Computer Science from Brown University, as well as a B.S.E. in Computer Science from Princeton University

500 W. 120th St., 918 Mudd, New York, NY 10027    212-854-2905                
©2012 Columbia University