Loding Complete
Explore Data Collection
There is much disagreement about the extent to which financial incentives motivate study participants. We elicit preferences for being paid for completing a survey, including a one-in-twenty chance of winning a $100 electronic gift card, a guaranteed electronic gift card with the same expected value...
Particulate matter (PM) is a major, clinically important air pollutant. A large portion of emitted PM crosses borders, damaging health outside of its originating jurisdiction, but due in part to technical obstacles these pollutant flows remain unregulated. Proposed attribution approaches assume that...
The Census Bureaus Longitudinal Business Database (LBD) underpins many studies of firm-level behavior. It tracks longitudinally all employers in the nonfarm private sector but lacks information about business financing and owner characteristics. We address this shortcoming by linking LBD...
December 13, 2024 - Chapter
The Office of Management and Budgets Statistical Directive No. 15, first issued in 1977, revised in 1997, and revised again in 2024, sets minimum standards for federal government collection and reporting of data by race and ethnicity. We find that Directive 15 does accomplish its intended purpose of...
Author(s) - Jose Asturias, Emin Dinlersoz, John C. Haltiwanger, Rebecca J. Hutchinson & Alyson Plumb
How are applications to start new businesses related to aggregate economic activity? This paper explores the properties of three monthly business application series from the U.S. Census Bureaus Business Formation Statistics as economic indicators: all business applications, business applications...
Macro data suggests a convex relationship between inflation and economic slack, but identifying causality is challenging. Using micro data from large panel surveys of UK and US firms we show that the response of prices to demand shocks is also convex at the firm level. We obtain similar results...
Author(s) - Anton Korinek
Large language models (LLMs) have seen remarkable progress in speed, cost efficiency, accuracy, and the capacity to process larger amounts of text over the past year. This article is a practical guide to update economists on how to use these advancements in their research. The main innovations...
Author(s) - Prachi Srivastava, Nicholas Bloom, Philip Bunn, Paul Mizen, Gregory Thwaites & Ivan Yotzov
We analyse the importance of climate-related investment using a large economy-wide survey of UK firms. Over half of firms expect climate change to have a positive impact on their investment in the medium term, with around a quarter expecting a large impact of over 10%. Around two-thirds of these...
We introduce DOSEDynamically Optimized Sequential Experimentationto elicit preference parameters. DOSE starts with a model of preferences and a prior over the parameters of that model, then dynamically chooses a customized question sequence for each participant according to an experimenter-selected...
We implement a survey-based randomized information treatment that generates independent variation in the inflation expectations and the uncertainty about future inflation of European households. This variation allows us to assess how both first and second moments of inflation expectations separately...
Statistical agencies have a dual mandate to provide accurate data and protect the privacy and confidentiality of data subjects. These mandates are fundamentally at odds and therefore must be balanced: more accurate data reduces privacy, while privacy protections introduce error that reduces accuracy...
Synthetic microdata data retaining the structure of original microdata while replacing original values with modeled values for the sake of privacy presents an opportunity to increase access to useful microdata for data users while meeting the privacy and confidentiality requirements for data...
Primary historical sources are often by-passed for secondary sources due to high human costs of accessing and extracting primary informationespecially in lower-resource settings. We propose a supervised machine-learning approach to the natural language processing of Chinese historical data. An...
As distorted maps may mislead, Natural Language Processing (NLP) models may misrepresent. How do we know which NLP model to trust? We provide comprehensive guidance for selecting and applying NLP representations of patent text. We develop novel validation tasks to evaluate several leading NLP models...
National surveys are crucial for estimating key economic aggregates, including the unemployment rate, labor force participation, and household expenditures. The accuracy of these indicators is increasingly under scrutiny due to declining response rates and the consequent risk of nonresponse bias....
This paper considers the DeFi intermediation chainthe market structure that underlies the creation and distribution of ETH, the native cryptocurrency of Ethereumto examine how information asymmetry shapes intermediation rents. We argue that using proof-of-stake blockchain technology in DeFi leads to...
The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains...
We study the distribution of political speech across U.S. firms. We develop a measure of political engagement based on firms communications (earnings calls, regulatory filings, and social media), by training a large language model to identify statements that contain political opinions. Using these...
Large literatures have analyzed racial and ethnic disparities in economic outcomes and access to the safety net. For such analyses that rely on survey data, it is crucial that survey accuracy does not vary by race and ethnicity. Otherwise, the observed disparities may be confounded by differences in...
Threshold models have been widely used to analyze interdependent behavior, yet empirical research identifying peoples thresholds is nonexistent. We introduce an incentivized method for eliciting thresholds and use it to study support for affirmative action in a large, stratified sample of the U.S....
The potential impact of nonresponse on election polls is well known and frequently acknowledged. Yet measurement and reporting of polling error has focused solely on sampling error, represented by the margin of error of a poll. Survey statisticians have long recommended measurement of the total...
This paper analyses the response of UK firms to monthly CPI inflation releases using high-frequency data from a large business survey. Firms inflation perceptions and expectations respond within hours of new inflation data releases. Firm expectations are most responsive when inflation coverage in...
We combine a customized survey and randomized controlled trial (RCT) to study the effect of higher-order beliefs on U.S. retail investors portfolio allocations. We find that investors higher-order beliefs about stock market returns are correlated with but distinct from their first-order beliefs....
Using data from a large survey of American households, we compare density forecasts elicited with bins- and scenarios-based questions. We show that inflation density forecasts are sensitive to the survey question designs used to elicit them. The within-person discrepancy is smaller, but still...
We develop a method to identify the individual latent propensity to select into treatment and marginal treatment effects. Identification is achieved with survey data on individuals' subjective expectations of their treatment propensity and of their treatment-contingent outcomes. We use the method to...
Household surveys suffer from persistent and growing underreporting. We propose a novel procedure to adjust reported survey incomes for underreporting by estimating a model of misreporting whose main parameter of interest is the elasticity of regional national accounts income to regional survey...
We create a firm-level ChatGPT investment score, based on conference calls, that measures managers' anticipated changes in capital expenditures. We validate the score with interpretable textual content and its strong correlation with CFO survey responses. The investment score predicts future capital...
During episodes such as the global financial crisis and the Covid-19 pandemic, China experienced notable fluctuations in its GDP growth and key expenditure components. To explore the primary sources of these fluctuations, we construct a comprehensive dataset of GDP and its components in both nominal...
Author(s) - Alexander Dietrich, Edward S. Knotek II, Kristian O. Myrseth, Robert W. Rich, Raphael Schoenle & Michael Weber
This paper introduces a novel measure of consumer inflation expectations: We elicit and combine inflation forecasts across categories of personal consumption expenditure to form an aggregated measure of inflation expectations. Drawing on nearly 60,000 respondents, our data comprise the early low...
Analyses of self-reported-well-being (SWB) survey data may be confounded if people use response scales differently. We use calibration questions, designed to have the same objective answer across respondents, to measure dimensional (i.e., specific to an SWB dimension) and general (i.e., common...
The evolution of the IMS and IFS in the past several hundred years can be viewed through the lens of the Copernican heliocentric system developed over 500 years ago. We trace out the evolution across regimes of the IMS and IFS in terms of network representations of the Copernican system. We provide...
We investigate the potential for Large Language Models (LLMs) to enhance scientific practice within experimentation by identifying key areas, directions, and implications. First, we discuss how these models can improve experimental design, including improving the elicitation wording, coding...
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These high-quality links allow researchers in the social sciences and other disciplines to construct a...
When race is not directly observed, regulators and analysts commonly predict it using algorithms based on last name and address. In small business lendingwhere regulators assess fair lending law compliance using the Bayesian Improved Surname Geocoding (BISG) algorithmwe document large prediction...

September 1, 2023 - Article
Author(s) - Michael Cafarella, Gabriel Ehrlich, Tian Gao, John C. Haltiwanger, Matthew D. Shapiro & Laura Zhao
One of the perennial challenges of constructing price indices like the Consumer Price Index (CPI) is that products change over time. This is often cited as a concern with regard to rapidly evolving products on the technological frontier, such as personal computers, cellphones, and automobiles. One...
How does access to data shape the rate, quality, and policy relevance of academic research? To shed light on this question, we study the impact of access to confidential microdata through the progressive geographic expansion of the U.S. Census Bureaus Federal Statistical Research Data Centers...
Author(s) - Michael Cafarella, Gabriel Ehrlich, Tian Gao, John C. Haltiwanger, Matthew D. Shapiro & Laura Zhao
This paper uses machine learning (ML) to estimate hedonic price indices at scale from item-level transaction and product characteristics. The procedure uses state-of-the-art approaches from hedonic econometrics and implements them with a neural network ML approach. Applying the methodology to...
Author(s) - Gabriel Ehrlich, John C. Haltiwanger, Ron S. Jarmin, David Johnson, Ed Olivares, Luke W. Pardue, Matthew D. Shapiro & Laura Zhao
This paper explores methods for constructing price indices from item-level transactions data on prices, quantities, and product attributes. The paper evaluates approaches that are feasible at scale, i.e., across the wide range of products, disparate encoding of attributes, and rapid product turnover...
Are speculators driving up oil prices? Should we raise energy prices to slow global warming? The present study takes a small number of such questions and compares the views of economic experts with those of the public. This comparison uses a panel of more than 2000 respondents from YouGov with the...
While linking records across large administrative datasets [big data] has the potential to revolutionize empirical social science research, many administrative data files do not have common identifiers and are thus not designed to be linked to others. To address this problem, researchers have...
Author(s) - Luke Sherman, Jonathan Proctor, Hannah Druckenmiller, Heriberto Tapia & Solomon M. Hsiang
The United Nations Human Development Index (HDI) is arguably the most widely used alternative to gross domestic product for measuring national development. This is in large part due to its multidimensional nature, as it incorporates not only income, but also education and health. However, the low...
Good data on the size and composition of the independent contractor workforce are elusive, with household survey and administrative tax data often disagreeing on levels and trends. We carried out a series of focus groups to learn how self-employed independent contractors speak about their work....
This paper estimates consumer demand for firearms with the aim of predicting the likely impacts of firearm regulations on the number and types of guns in circulation. We first conduct a stated-choice-based conjoint analysis and estimate an individual-level demand model for firearms. We validate our...
Author(s) - Deniz Dutz, Michael Greenstone, Ali Hortaçsu, Santiago Lacouture, Magne Mogstad, Azeem M. Shaikh, Alexander Torgovitsky & Winnie van Dijk
We examine why minority and poor households are often underrepresented in studies that require active participation. Using data from a serological study with randomized participation incentives, we find large participation gaps by race and income when incentives are low, but not when incentives are...
Remotely sensed measurements and other machine learning predictions are increasingly used in place of direct observations in empirical analyses. Errors in such measures may bias parameter estimation, but it remains unclear how large such biases are or how to correct for them. We leverage a new...
Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora. Across these diverse...
Author(s) - G. Jacob Blackwood, Cindy Cunningham, Matthew Dey, Lucia S. Foster, Cheryl Grim, John C. Haltiwanger, Rachel L. Nesbit, Sabrina Wulff Pabilonia, Jay Stewart, Cody Tuttle & Zoltan Wolf
An important gap in most empirical studies of establishment-level productivity is the limited information about workers characteristics and their tasks. Skill-adjusted labor input measures have been shown to be important for aggregate productivity measurement. Moreover, the theoretical literature on...
Understanding factors affecting the direction of innovation is a central aim of research in the economics of innovation. Progress on this topic has been inhibited by difficulties in measuring distance and movement in knowledge space. We describe a methodology that infers the mapping of the knowledge...
We use data from a large panel survey of UK firms to analyze the economic drivers of price setting since the start of the Covid pandemic. Inflation responded asymmetrically to movements in demand. This helps to explain why inflation did not fall much during the negative initial pandemic demand shock...
How do people compare bundles of social-distancing behaviors? During the COVID pandemic, we showed 676 online respondents in the US, UK, and Israel 30 pairs of brief videos of acquaintances meeting. We asked them to indicate which in each pair depicted greater risk of COVID infection. Their choices...
- ...
Show: results