I work at the intersection of measurement, social science, and AI. At Microsoft, I built the evaluation infrastructure for Copilot and co-led the largest empirical study of AI usage ever conducted. Previously, I led data science teams at Apple Maps and held faculty appointments at Brown, Columbia, and the University of Colorado Boulder. My academic work is deeply interdisciplinary, spanning fields as diverse as geography, computer science, history, and statistics, with ~50 peer-reviewed publications and a book. I've held advisory roles with the U.S. Census Bureau, Oak Ridge National Laboratory, the National Academy of Sciences, and early-stage startups on how to measure products, people, places, and the economy. Once upon a time, I was the sole proprietor of an antiquarian bookshop in Manhattan.
Designed Copilot's core evaluation infrastructure and co-led the analysis of 37.5M conversations—the largest empirical study of AI usage ever conducted.
Appointed by the Department of Commerce to advise on the design, operation, and modernization of how the U.S. measures its population and economy. Served 2022–2025.
My work has been covered in Science, The Atlantic, the Associated Press, Axios, ABC News, and international media.
Served as interim DRI for Apple's effort to build a map from scratch. Developed patented technology using mobile sensor data to detect and correct map errors at scale.
Published in PNAS, Demography, SIGIR, and more. Authored Urban Analytics (Sage, 2018). Recognized with the AAG Distinguished Scholar Award, the ASA's Spatial Analysis & Intergovernmental Statistics (SPAIG) Award, and the Michael Breheny Prize.
Directed strategy, institutional research, and digital transformation for a 35,000+ student institution. Led a team of 40+ staff and contractors.
Active lines of inquiry at the frontier of AI measurement, user understanding, and evaluation theory.
Building complex agentic evaluation systems that use LLM judges to assess product experiences in real time. This work lands as Copilot’s primary KPI—the Session Success Rate—which Microsoft AI CEO Mustafa Suleyman discusses in the press as the metric he optimizes above all others. These systems go beyond simple accuracy to evaluate whether conversations are genuinely useful, safe, and aligned with what users need—across millions of interactions.
Creating synthetic user agents, grounded in AI-mediated interviews with real people, that are realistic enough to stand in for actual users. The goal is to scale these agents into populations that enable deep product insights, competitive intelligence, and “soft flights”—running A/B experiments against synthetic users rather than launching to the public and waiting weeks for results. This compresses the experimentation cycle from weeks to hours while surfacing blind spots that traditional research misses.
Every metric carries a validity gap—the distance between what you think you’re measuring and what your metric actually captures. In large-scale AI and search systems, this gap can have profound consequences for products and organizations, because subtle choices in measurement design compound into consequential differences in what gets optimized. This research develops theoretical foundations for understanding and closing the validity gap in metrics that shape products used by hundreds of millions of people.
There is a broad literature—in the press, in academia, and from major labs—about how AI is poised to reshape the future of work: jobs displaced, productivity gained. But recent research from Microsoft and OpenAI has shown that people discuss personal matters with AI more often than professional ones. These aren’t trivial queries—they’re about what to buy, what to study, how to maintain health and well-being, what to do on a Sunday afternoon. AI now supports decisions ranging from the minor and routine to the major and life-altering. What are the collective consequences of these millions of AI-mediated choices? We are working to better understand and build a theoretical framework for how personal decisions made with AI aggregate into broader social and economic impacts.
Positions spanning industry, academia, and public service.
Selected peer-reviewed articles, books, patents, and book chapters.
I’m always interested in conversations about measurement, AI, and how to make better decisions with better data. If you’re working on something where these things matter, I’d enjoy hearing from you.
Get in Touch