Scraping the abstracts for the discussion papers by i4replication.org.
Last active
March 5, 2025 01:05
-
-
Save soodoku/02b6442f39e1c241896a079ba9923026 to your computer and use it in GitHub Desktop.
i4replication scraper
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| url | abstract | |
|---|---|---|
| https://ideas.repec.org/p/zbw/i4rdps/1.html | The long-recognized spurious regressions problem can lead to mistaken inference in panel instrumental variables (IV) estimation. Spurious correlations arising from correlated cycles in finite time horizons can make irrelevant instruments appear strong with signable consequences for estimated IV coefficients, or interfere with valid of inference of causal effects from IV coefficients estimated using relevant instruments. The inclusion of time fixed effects in interacted specifications does not always resolve these problems. We demonstrate these concerns by revisiting recent studies of the causal origins of conflict. We offer diagnostic and corrective recommendations for avoiding the pitfalls arising from time series exhibiting persistence. | |
| https://ideas.repec.org/p/zbw/i4rdps/2.html | Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRP) in order to achieve publishable, positive and significant results. Numerous metrics have been developed to determine replication success but it has not yet been established how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four different types of QRPs: cherry picking, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the golden sceptical p-value does better in maintaining low values of overall type-I error rate, but often needs larger replication sample sizes, especially when severe QRPs are employed. | |
| https://ideas.repec.org/p/zbw/i4rdps/3.html | Selective publication is among the most-cited reasons for widespread replication failures. I show in a simple model of the publication process that the replication rate is completely unresponsive to the suppression of insignificant results. I then show that the expected replication rate falls below its intended target owing to issues with common power calculations in replication studies, even in the absence of other factors such as p-hacking or heterogeneous treatment effects. I estimate an empirical model to evaluate if issues with power calculations alone are sufficient to explain the low replication rates observed in large-scale replication studies. The model produces replication rate predictions (using only data from original studies) that are almost identical to observed replication rates in experimental economics and social science. In psychology, the model explains two-thirds of the gap between the replication rate and its intended target. I conclude by discussing alternative measures of replication that are more responsive to selective publication. | |
| https://ideas.repec.org/p/zbw/i4rdps/4.html | A common approach to identifying the causal impact of immigration on outcomes involves using a "shift-share" or Bartik instrument exploiting country-specific immigration in ows (shifts) and location specific prior shares for the same countries. New econometric findings suggest this instrumental variables approach uses identifying variation not from the shifts, as previously believed, but rather from the shares and suggest a battery of checks to explore the sensitivity of estimates. In this note, I first replicate Hunt and Gauthier-Loiselle (2010) which estimates the effects of immigration on innovation via patenting, and second deploy these new checks from the econometric literature on shift-share instruments. I find that the results of Hunt and Gauthier-Loiselle (2010) (skilled immigration increases innovation and has positive spillovers on the innovation of others) replicate and hold up well to these new tests. | |
| https://ideas.repec.org/p/zbw/i4rdps/5.html | This analysis is an independent replication of Heft-Neal et al. (2020). The original authors (HBBVB) provide evidence that particulate matter air pollution increases infant mortality in 30 African nations between 2000 and 2015. They provide three effect estimates. Using ordinary least squares, a 10 μg/m3 increase in PM2.5 exposure results in an estimated 8.6% increase in infant mortality. Using dust in the Bod'el'e depression as an instrumental variable, the same exposure increases infant mortality by 23.6%. Using rainfall in the Bod'el'e depression, the same exposure increases infant mortality by 24.3%. Using similar data and independently developed procedures I find corresponding estimates of 3.4%, 31.0%, and 29.7%. | |
| https://ideas.repec.org/p/zbw/i4rdps/6.html | The social sciences face a replicability crisis. A key determinant of replication success is statistical power. We assess the power of political science research by collating over 16,000 hypothesis tests from about 2,000 articles. Using generous assumptions, we find that the median analysis has about 10% power and that only about 1 in 10 tests have at least 80% power to detect the consensus effects reported in the literature. We also find substantial heterogeneity in tests across research areas, with some being characterized by high power but most having very low power. To contextualize our findings, we survey political methodologists to assess their expectations about power levels. Most methodologists greatly overestimate the statistical power of political science research. | |
| https://ideas.repec.org/p/zbw/i4rdps/7.html | Tsai, Trinh, & Liu (2021) in their initial study sought to examine whether anticorruption efforts in authoritarian regimes affected public opinion of these regimes through not just direct effects, but also indirect effects through affecting evaluations of competence and morality. Conducting a con-joint study in China where respondents were asked to choose between two potential local officials, Tsai et al. found that 26% of the total effect of these officials punishing corrupt subordinates was estimated to come through indirect effects that go through evaluations of morality and compe-tence. Using their code, I reproduced their original findings, and did not find any notable coding errors while doing so. Then, taking advantage of the fact that Tsai et al. included several additional covariates beyond punishment in their experiment, I engaged in an extension of the original model, using the same method, to examine whether economic performance characteristics have indirect effects on evaluation through competence and morality as well. I found results that sug-gest that economic performance does have an indirect effect on preferences through competence and morality. I then tested the robustness of Tsai et al.'s original heterogeneous sensitivity tests by varying cut points on two demographic variables and found that their findings of a lack of heterogeneous sensitivity remain robust to different cut-points. In all, my efforts suggest that Tsai et al.'s methods are valid and their findings robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/8.html | Amazon's Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erodes substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility. | |
| https://ideas.repec.org/p/zbw/i4rdps/9.html | 'Stock estimates' of missing women suggest that the problem is concentrated in South and East Asia and among young children. In contrast, 'flow estimates' suggest that gender bias in mortality is much larger, is as severe among adults as it is among children in India and China, and is larger in Sub-Saharan Africa than in India and China. We show that the different stock and flow measure results rely on the choice of the reference standard for mortality and an incomplete correction for different disease environments in the flow measure. Alternative reference standards reconcile the results of the two measures. | |
| https://ideas.repec.org/p/zbw/i4rdps/10.html | Kertzer (2022) conducts a meta-analysis of parallel experiments on samples of political elites and ordinary citizens. He examines whether the average treatment effect for elites is significantly different from the average treatment effect for citizens, finding that only 19 of 162 (11.7%) difference-in-difference estimates are statistically significant after adjusting for the false discovery rate. He also finds that elites and masses hold similar foreign policy attitudes after controlling for their demographic characteristics. In this replication report, we begin by running robustness and heterogeneity tests for the first claim. We find that the results survive many robustness tests. We also find, however, that only a small number of the these treatments significantly affected masses (N=28) or elites (N=30). This low rate suggests the possibility that almost all of these experiments failed to successfully manipulate either masses or elites. If so, we may not be able to conclude that masses and elites respond similarly to experiments with confidence until political scientists produce more experiments with actual treatment effects or with successful manipulation checks in cases of null effects. In the second part of this replication report, we conceptually replicate the second Kertzer analysis, finding a strong correlation between elite and mass political decisions and attitudes, thus confirming Kertzer's analysis. | |
| https://ideas.repec.org/p/zbw/i4rdps/11.html | Hundreds of studies have shown that air pollution affects health in the very short-run. This played a key role in setting air quality standards. Yet, estimated effect sizes can vary widely across studies. Analyzing the results published in epidemiology and economics, we find that publication bias and a lack of statistical power could lead some estimates to be inflated. We then run real data simulations to identify the design parameters causing these issues. We show that this exaggeration may be driven by a small numbers of exogenous shocks, instruments with limited strength or sparse outcomes. Other literatures relying on comparable research design could also be affected by these issues. Our paper provides a principled workflow to evaluate and avoid the risk of exaggeration when conducting an observational study. | |
| https://ideas.repec.org/p/zbw/i4rdps/12.html | We conduct a replication of Settele (2022), a online survey experiment designed to find out how individual's beliefs about the gender wage gap affect their policy preferences. We reproduce Results 1 and 2 of the study: how prior beliefs around the wage gap are distributed among individuals and how a information treatment causally affects the policy demand. Our re-coded replication shows that the reported results are robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/13.html | Reanalyses of empirical studies and replications in new contexts are important for scientific progress. Journals in economics increasingly require authors to provide data and code alongside published papers, but how much does the economics profession actually replicate? This paper summarizes existing replication definitions and reviews how much economists replicate other scholars' work. We argue that in order to counter incentive problems potentially leading to a replication crisis, replications in the spirit of Merton's 'organized skepticism' are needed - what we call 'policing replications'. We review leading economics journals to show that policing replications are rare and conclude that more incentives to replicate are needed to reap the fruits of rising transparency standards. | |
| https://ideas.repec.org/p/zbw/i4rdps/14.html | The scientific method is predicated on transparency - yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation, and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration practices have gained traction in the natural and social sciences. However, the extent of their adoption in different disciplines are unknown. We introduce procedures to identify the transparency of a research field using large-scale text analysis and machine learning classifiers. Using political science and international relations as an illustrative case, we examine 93,931 articles across the top 160 political science and international relations journals between 2010 and 2021. We find that approximately 21% of all statistical inference papers have open data and 5% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly. | |
| https://ideas.repec.org/p/zbw/i4rdps/15.html | Duflo (2001) exploits a 1970s schooling expansion in Indonesia to estimate the returns to schooling. Under the study's difference-in-differences (DID) design, two patterns in the data-shallower pay scales for younger workers and negative selection in treatment-can violate the parallel trends assumption and upward-bias results. In response, I follow up later, test for trend breaks timed to the intervention, and perform changes-in-changes (CIC). I also correct data errors, cluster variance estimates, incorporate survey weights to correct for en-dogenous sampling, and test for (and detect) instrument weakness. Weak identification-robust inference yields imprecise, positive estimates. CIC estimates tilt slightly negative. | |
| https://ideas.repec.org/p/zbw/i4rdps/16.html | This report presents a replication of Altindag et al. (2022) performed at the Olso Replication Games in 2022. Altindag et al. (2022) estimate the effects of an age-specific lockdown on mental health outcomes and mobility among adults aged 65 and older in Turkey, using a regression discontinuity design. The authors find a decline in mobility with a one-day decrease in the number of days being outside and an increase in the probability of never going out by 30 percentage points. These point estimates are statistically significant at the 1% level. The mobility restrictions lead to a worsening in mental health outcomes of approximately 0.2 standard deviations, statisti-cally significant at the 10% level in their preferred specification. In this paper we accomplish two things. First, we successfully reproduce Altindag et al.'s main findings. Second, we test the ro-bustness of the results to a small number of changes to their preferred estimations by (1) not clustering the standard errors on the running variable, (2) not including control variables, and (3) calculating the optimal bandwidth using another technique. Point estimates for mobility outcomes are stable to all three manipulations, and standard errors only change marginally. Point estimates and standard errors for the mental health outcomes are somewhat more sensitive, especially to changing the optimal bandwidth selection method. However, the observed changes are reason-ably expected when applying data-driven model selection methods to noisy data (to avoid over-fitting, it is likely preferable to apply a less data-driven approach like the original authors did). Our general impression is that the original analyses and results are both theoretically plausible and credible, despite some defensible model dependencies. | |
| https://ideas.repec.org/p/zbw/i4rdps/17.html | In Altindag et al. (2022), we estimate the effects of an age-specific lockdown policy on mobility and mental health outcomes among adults aged 65 and older in Turkey using a regression discontinuity design. Bonander et al. (2023) successfully replicate all our main findings. They argue that the estimates for mobility outcomes are all robust to alternative sensitivity checks while some of the estimates for mental health-which were statistically significant around the 5-9 percent level-lose significance at the conventional level of 10 percent in the more conservative specifications. In this reply, we provide approximately 7,000 additional estimates that comprise a near universe of RD estimates for all our outcomes, each possible monthly bandwidth, and each possible combination of covariate adjustment, kernel selection, estimation methodology, standard error adjustment, and kernel weighting selection. This comprehensive analysis shows that our original results are robust to these choices. We show that Bonander et al. (2023) rely on a selection of very narrow bandwidths that produce highly sensitive and uninformative estimates due to overfitting. We also show that Bonander et al. (2023) report imprecise estimates, which are outliers in the distribution of all estimates that can be reported. We conclude that broader statistical tests are more informative for robustness checks. | |
| https://ideas.repec.org/p/zbw/i4rdps/18.html | The Credibility Revolution advances quantitative research designs intended to identify causal effects from observed data. The ensuing emphasis on internal validity however has enabled the neglect of construct and external validity. This article develops a framework we call causal specification. The framework formally demonstrates the joint necessity of assumptions regarding internal, construct and external validity for causal generalization. Indeed, the lack of any of the three types of validity undermines the Credibility Revolution's own goal to understand causality deductively. Without assumptions regarding construct validity, one cannot accurately label the cause or outcome. Without assumptions regarding external validity, one cannot label the conditions enabling the cause to have an effect. These assumptions ultimately are founded on qualitative and theoretical understandings of a causal process. As a result, causal specification clarifies the central role of qualitative research in underwriting deductive understandings of causality in quantitative research. | |
| https://ideas.repec.org/p/zbw/i4rdps/19.html | Gethin, Martínez-Toledano and Piketty (2022) analyze the long-run evolution of political cleavages using a new database on socioeconomic determinants of voting from approximately 300 elections in 21 Western democracies between 1948 and 2020. They find that, in the 1950s and 1960s, voting for the "left" was associated with lower-educated and low-income voters. After that, voting for the "left" has gradually become associated with higher-educated voters, while highincome voters have continued to vote for the "right". In the 2010s, there is a disconnection between the effects of income and education on voting. In this replication, we first conduct a computational reproduction, using the replication package provided by the authors. Second, we do a robustness replication testing to what extent the original results are robust to i) restricting the sample to "core" left and right parties, ii) analyzing the top 80% versus bottom 20%, iii) weighting by population, iv) dropping control variables, and v) using country fixed effects. The main results of the paper are found to be largely replicable and robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/20.html | Montero (2022) explores a discontinuity in a land reform in El Salvador and reports two main findings. First, relative to outside-owned haciendas operated by contract workers, the productivity of worker-owned cooperatives is higher for staple crops and lower for cash-crop. Second, cooperative property rights increase workers' incomes and compress wage distributions. In this comment, we show that the latter result rests on two mistakes: three-quarters of the observations are duplicates and income inequality is calculated over too few workers to be meaningful. When corrected, the data sources and research design provide no credible evidence regarding the causal effects of ownership structure on income levels and inequality. | |
| https://ideas.repec.org/p/zbw/i4rdps/21.html | Danzer and Lavy (2018) study how the duration of paid parental leave affects children's educational performance using data from PISA. An extension of the maximum duration from 12 to 24 months in Austria had no statistically significant effect on average, but the authors highlight the existence of large and statistically significant heterogenous effects that vary in sign depending on the education of mothers and children's gender. The policy increased the scores obtained by sons of highly educated mothers by 33% of a standard deviation (SD) in Reading and 40% SD in Science. On the contrary, sons of low educated mothers experienced a decrease of 27% SD in Reading and 23% SD in Science. In this article, I replicate their study following the recommended estimation procedure taking into account both the survey's stratified two-stage sample design and the fact that PISA relies on imputation to derive student scores. I show that the estimates of the effects of the parental leave extension become substantially smaller in absolute magnitude and non-significant. | |
| https://ideas.repec.org/p/zbw/i4rdps/22.html | The regression discontinuity (RD) design offers identification of causal effects under weak assumptions, earning it a position as a standard method in modern political science research. But identification does not necessarily imply that causal effects can be estimated accurately with limited data. In this paper, we highlight that estimation under the RD design involves serious statistical challenges and investigate how these challenges manifest themselves in the empirical literature in political science. We collect all RD-based findings published in top political science journals in the period 2009-2018. The distribution of published results exhibits pathological features; estimates tend to bunch just above the conventional level of statistical significance. A reanalysis of all studies with available data suggests that researcher discretion is not a major driver of these features. However, researchers tend to use inappropriate methods for inference, rendering standard errors artificially small. A retrospective power analysis reveals that most of these studies were underpowered to detect all but large effects. The issues we uncover, combined with well-documented selection pressures in academic publishing, cause concern that many published findings using the RD design may be exaggerated. | |
| https://ideas.repec.org/p/zbw/i4rdps/23.html | Macroeconomic variables like unemployment, inflation, trade, or GDP are not set in stone: they are preliminary estimates that are constantly revised by statistical agencies. These data revisions, or data vintages, often provide conflicting information about the size of a country's economy or its level of development, reducing our confidence in established findings. Would researchers come to different conclusions if they used different vintages? To answer this question, I survey all articles published in a top political science journal between 2005 and 2020. I replicate three prominent articles and find that the use of different vintages can lead to different statistical results, calling into question the robustness of otherwise rigorous empirical research. These findings have two practical implications. First, researchers should always be transparent about their data sources and vintages. Second, researchers should be more modest about the precision and accuracy of their point estimates, since these estimates can mask large measurement errors. | |
| https://ideas.repec.org/p/zbw/i4rdps/24.html | Dhar et al. (2022) examine the effect of a gender attitude change program in secondary schools in India. In their preferred specification, the authors show that the program made the students report more gender-egalitarian attitudes by 0.18 of a standard deviation, and shifted self-reported behaviors to be more aligned with gender-progressive norms by 0.20 standard deviations (both significant at 1% level). In contrast, they found no effect on girls' aspirations, as these were already high before the intervention. The effects did not attenuate between the first end-line (right after the programme was completed) and the second (two years later). To put the paper's results in perspective, we first comment on the authors' deviations from their pre-registration and pre-analysis plans, provide detailed power calculations, and add multiple-hypothesis-testing-adjusted standard errors. Second, we show that the paper's results are perfectly reproducible. Third, we show that the results are robust to excluding control variables, and alternative ways of constructing indices and dealing with non-response. | |
| https://ideas.repec.org/p/zbw/i4rdps/25.html | Vellore Arthi, Brian Beach and W. Walker Hanlon (2022) investigate the effect of the Lancashire Cotton Famine on mortality, accounting for the migration response to the downturn. They use difference-in-differences to estimate the effect of the cotton famine on mortality. To account for the migration response to the cotton famine, they construct a linked dataset giving mortality rates by district of residence during the cotton famine, rather than by district of residence at the time of death. They find that the cotton famine increased mortality in cotton-textile producing districts, and that accounting for migration matters, in the sense that their estimates would have been markedly different had they not accounted for it. I check that ABH results are fully reproducible using their data and code, and that their claims are robust to (1) decreasing the age window for building the linked dataset, (2) modifying the specification and (3) computing different standard errors. The only significant discrepancy in results is that I find stronger effects of the cotton famine when I decrease the age window for building the linked dataset, likely because this reduces measurement errors. | |
| https://ideas.repec.org/p/zbw/i4rdps/26.html | This report presents a replication of Andersson (2019) performed at the Toronto Replication Games in 2023. Andersson (2019) estimates the effect of carbon taxes on CO 2 emissions in Sweden using the synthetic control method. His findings indicate a 10.9 percent reduction in emissions during the 1990-2005 period, which equates to -0.29 metric tons of CO 2 per capita in an average year. The results from an in-space placebo test show that Sweden had the highest post/pre-mean squared prediction error (MSPE) ratio, resulting in a placebo-based p-value of 1/15=0.067. We successfully reproduce these findings and conduct a series of pre-specified replication analyses to examine how robust the findings are to model specification choices. We run 14 alternative specifications with various combinations of pre-treatment outcome values, with and without covariates. The median point estimate from our replication analyses is -0.28 metric tons of CO 2 per capita (min: -0.34, max: -0.17). Placebo-based p-values are equal to 1/15=0.067 in seven specifications, 2/15=0.13 in six, and 4/15=0.27 in one. | |
| https://ideas.repec.org/p/zbw/i4rdps/27.html | Borowiecki (2022) studies the influence of teachers on the style of their students in the domain of musical composition. The author finds that realized student-teacher pairs are on average 0.2-0.3 standard deviations more similar to unrealized, but possible, studentteacher pairs. In this report we provide the results of our replication of Borowiecki (2022). We direct our attention to the following tasks: 1) Replicating the outcome variables used in the paper, starting from the raw data, and generating alternative measures of similarity between students and teachers 2) Testing the validity of the random teacher-student pairing, a key assumption for the validity of the estimation strategy employed in the paper. We can replicate most of the outcome variables, but not all of them, due to incomplete raw data. Our alternative measures of similarity confirm the robustness of the original results. We find significantly different characteristics between paired and unpaired students, suggesting that matching between students and teachers does not occur randomly. However, controlling for these characteristics in the main regressions leads to quantitatively similar results to the ones reported in the original paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/28.html | We replicate the analysis conducted by Frederiksen, 2022a. We focus on assessing the computational and robustness replicability of their work. We find that their main exhibits and supplementary analysis are replicable, both when running their original Stata replication package, and when we attempt to replicate their findings from scratch in R. We also conduct additional robustness checks by estimating additional specifications and by subsetting the dataset by the time taken by the respondent to complete the survey. We again find that their work is robust to our battery of alternative specifications. | |
| https://ideas.repec.org/p/zbw/i4rdps/29.html | The relationship between social status and ethical behavior is a widely debated topic in research. In their study, Gsottbauer et al. (2022b) investigate whether higher socio-economic status is linked to lower ethical behavior, using data from two large survey experiments involving over 11,000 participants. In this replication project, we test the computational reproducibility and robustness to the replication of their study, using the provided data and code from the replication package (Gsottbauer et al., 2022a). Nearly all the figures and tables were reproducible-in the process of reproducing the results, some minor rounding or transcription errors were discovered. In testing the robustness replicability, we find consistent results for our extensions. The effort for the replication was manageable, even though the authors treat categorical variables as numeric, or use manually-coded interaction variables (i.e. in regression models). In summary, we applaud the transparency of Gsottbauer et al. (2022b) in facilitating replications, and make some general recommendations for further improvements for data-analysis studies. | |
| https://ideas.repec.org/p/zbw/i4rdps/30.html | Hjort and Poulsen (2019) examine how fast Internet affects employment in Africa. Their difference-in-differences estimates exploit differences in the time at which locations were connected to the network of fast Internet cables. The authors find that fast Internet increases employment rates and that this effect is driven by high-skilled occupations. Authors show that, if anything, employment inequality falls when fast Internet becomes available. This study uses replication materials made available with the original article. It first attempts to reproduce results of the original paper from available replication materials. Most results are reproducible, but some are not. Second, this study presents a sensitivity analysis that tests how reported results vary depending on whether a specific country (or region) is excluded from the sample. The paper's results are found to be differently sensitive to the composition of the sample of observations. This analysis also helps to uncover that some specifications that use a large number of fixed effects might actually be too demanding for reasonable identification to be achieved from the data. | |
| https://ideas.repec.org/p/zbw/i4rdps/31.html | Placebo tests, where a null result is used to support the validity of the research design, is common in economics. Such tests provide an incentive to underreport statistically significant tests, a form of reversed p-hacking. Based on a pre-registered analysis plan, we test for such underreporting in all papers meeting our inclusion criteria (n=377) published in 11 top economics journals between 2009-2021. If the null hypothesis is true in all tests, 2.5% of them should be statistically significant at the 5% level with an effect in the same direction as the main test (and 5% in total). The actual fraction of statistically significant placebo tests with an effect in the same direction is 1.29% (95% CI [0.83, 1.63]), and the overall fraction of statistically significant placebo tests is 3.10% (95% CI [2.2, 4.0]). Our results provide strong evidence of selective underreporting of statistically significant placebo tests in top economics journals. | |
| https://ideas.repec.org/p/zbw/i4rdps/32.html | Williams (2022) ties the political participation of Blacks to historical lynchings that occurred in the United States. Her findings document lower Black voter registration rates in southern counties with greater number of historical lynchings. We show that this effect is driven by four outlier counties with relatively high Black lynching rates. Excluding these counties from the analysis yields a point estimate that is no longer statistically significant. Dropping the ninety-fifth percentile lynching rates and correcting the errors in voter registration rates rule out the effect size reported by Williams (2022), which now becomes close to zero and statistically insignificant. We also show that the main results are highly sensitive to the way lynching and voter registration rates are measured. | |
| https://ideas.repec.org/p/zbw/i4rdps/33.html | Atwood (2022) analyzes the effects of the 1963 U.S. measles vaccination on longrun labor market outcomes, using a generalized difference-in-differences approach. We reproduce the results of this paper and perform a battery of robustness checks. Overall, we confirm that the measles vaccination had positive labor market effects. While the negative effect on the likelihood of living in poverty and the positive effect on the probability of being employed are very robust across the different specifications, the headline estimate-the effect on earnings-is more sensitive to the exclusion of certain regions and survey years. | |
| https://ideas.repec.org/p/zbw/i4rdps/34.html | Henry, Zhuravskaya, and Guriev (2022) examine whether people are willing to share "alternative facts" espoused by right-wing populist parties before the 2019 European elections in France and how this interacted with the availability of fact-checking information. They find that both imposed and voluntary fact-checking reduce the likelihood of sharing false statements by approximately 45%, and that imposed and voluntary fact-checking have similar effect sizes. We reproduce these findings and introduce several alternative estimates to assess the robustness of the original results, including resolving an inconsistency in the handling of pre-treatment controls. Overall, our results align with the results of the original paper. The differences we find are small in absolute magnitude but, since many effects were small, not always trivial in terms of relative differences. This replication supports the conclusions of the original paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/35.html | We test the reproducibility and replicability of Dincecco et al. (2022), which reports a positive relationship between pre-colonial interstate warfare and long-run development patterns across India. Overall, we confirm that all of the study's estimates are computationally reproducible by using both the provided replication package in Stata and code written by the present authors in R. We test for and find no evidence of data manipulation in the final datasets. Concerning direct replicability, we consider different ways of measuring distance to conflicts and also alternative proxies for both the dependent variable and variables which capture channels by which the main effects operate. We are able to replicate the magnitude and significance of the estimated coefficient on conflict exposure in most of the tests, noting that while most estimates are substantively in line with the original study, some alternative measures of distance to conflict imply different magnitudes for estimates, and proxy estimates are sensitive to both the time period and type of conflict considered. | |
| https://ideas.repec.org/p/zbw/i4rdps/36.html | Bisbee and Honig (2022) examine the effect of the COVID-19 pandemic on voting for Bernie Sanders in the 2020 Democratic Party primary using a difference-in-differences design, finding evidence that exposure to COVID-19 resulted in a 7-15 percentage point increase in voting for Biden. The study also uses a regression design with district-level fixed effects to estimate the effect of the COVID-19 pandemic on voting for anti-establishment candidates during the US 2020 House primaries. It finds evidence that an increase in COVID cases was associated with a decline in voting for anti-establishment candidates in general, and for those endorsed by the Tea Party. We re-run the code for all tests in this paper, successfully reproducing its results in a preliminary replication. We then use the De Chaisemartin and D'Haultfoeuille difference-in-differences estimator to replicate their main results, finding that though the coefficient remains negative, the results are not statistically significant. We also replicate their tests regarding US House primary candidates using a different measure of anti-establishment candidates. Here, we find that the interaction term between anti-establishment candidates and COVID-19 remain statistically significant, with the same sign. Finally, we employ an expanded dataset that includes Congressional primary candidates that were omitted in the initial dataset, as well as a re-coded extremism variable that also includes candidates endorsed by Donald Trump. These updated findings corroborate the paper's initial results. However, due to a restrictive number of observations that interfered with our application of the De Chaisemartin and D'Haultfoeuille estimator, we believe that the expanded U.S. House primary results constitute the more robust half of our replication. | |
| https://ideas.repec.org/p/zbw/i4rdps/37.html | We perform a robustness replication analysis of Laffitte and Toubal (2022), which considers how multinational corporations shift profit to "tax havens", jurisdictions where they face lower tax burdens. We find that the main results of Laffitte and Toubal (2022), are fairly robust to alternative versions of three important researcher choices: i) the definition of tax havens; ii) the use of a continuous measure of tax-friendliness rather than a binary classification of tax havens; and iii) a sample that omits two small but "extreme" tax havens: Bermuda and Barbados. In all cases, results remain of the same sign and retain statistical significance, though the magnitudes are somewhat attenuated in our robustness exercises. | |
| https://ideas.repec.org/p/zbw/i4rdps/38.html | A fundamental question to the scientific enterprise is to what extent published scientific findings are credible. This question is related to the reproducibility and replicability of scientific findings where reproducibility is defined as testing if the results of an original study can be reproduced using the same data and replicability is defined as testing if the results of an original study hold in new data. We provide a framework for evaluating reproducibility and replicability in economics and divide reproducibility and replicability studies into five types: computational reproducibility, recreate reproducibility, robustness reproducibility, direct replicability and conceptual replicability, and we propose indicators to be reported for each type. | |
| https://ideas.repec.org/p/zbw/i4rdps/39.html | The politically motivated replacement in local governments is a pervasive fact in our modern democracies. Whether it has causal effects on the quality of public services, such as education, is a critical question and yet understudied. This paper uses a regression discontinuity design (RDD) for close elections to replicate Akthari, Moreira and Trucco (2022) who find negative effects on the quality of public education in Brazil (.05-.08 standard deviations of lower test scores). I first reproduce these main results, finding minor computational differences that have no effect on the conclusions. I also show that the estimates for Brazil are in general robust to different specifications following Brodeur, Cook and Heyes (2020). Finally, I implement the same RDD framework now applied to Chilean administrative records to find null effects on test scores. Taken together, these results suggest that political turnover has weakly negative effects on service quality. | |
| https://ideas.repec.org/p/zbw/i4rdps/40.html | Alesina et al. (2023) examine how people perceive the number and characteristics of migrants and how those perceptions affect their support for redistribution. They find that respondents from the United States, United Kingdom, Sweden, Italy, Germany and France markedly overestimate the share of immigrants in each country, with the average respondent in all countries except Sweden overestimating by more than a factor of two. We reproduce these results using the original code and data and test the robustness by (i) including participants excluded for time to complete the survey, (ii) extending the analysis of misperceptions to all survey respondents, and (iii) using alternative authoritative estimates of the proportion of immigrants. We find that these checks marginally change the estimates of the size of the misperception but do not change the conclusions to be drawn from the analysis. Alesina et al. (2023) also test the effect on support for redistribution of showing videos on immigrant characteristics. We computationally reproduced the treatment effects on support for redistribution. | |
| https://ideas.repec.org/p/zbw/i4rdps/41.html | Holman et al. (2022; HMZ) propose women (compared to men) political leaders experience significant drops in public approval ratings after a transnational terrorist attack. After documenting how survey-based evaluations of then-Prime Minister Theresa May suffered after the 2017 Manchester Arena attack, HMZ assemble a country-quarter level panel database to explore the generality of their hypothesis. They report evidence suggesting women (compared to men) leaders systematically experience decreased public approval rates after major transnational terrorist attacks (p-value of 0.020). We find that result disappears once any of the following adjustments is implemented: (i) excluding election quarter covariates (p = 0.104); (ii) correcting objective coding errors in the election quarter covariates (p = 0.058); (iii) excluding the May-Manchester observation (p = 0.098); or (iv) clustering standard errors at the country level (p = 0.558). Exploring all 2⁵ combinations of the five control groups HMZ incorporate in their specification, none of them clears the 5% threshold of statistical significance once the corrected election quarter variables are employed. We conclude that the empirical evidence does not provide sufficient support for HMZ's abstract claim that "conventional theory on rally events requires revision: women leaders cannot count on rallies following major terrorist attacks." | |
| https://ideas.repec.org/p/zbw/i4rdps/42.html | Dertwinkel-Kalt et al. (2022) examine the effect of concentration bias - the tendency to overweight advantages that are concentrated in time relative to costs that are spread over multiple time periods - on intertemporal choice in a laboratory experiment. In their preferred empirical specification, the authors report that concentration bias leads to a 22.4% higher willingness to work than explained by a standard model of intertemporal discounting. We conduct a computational replication of the main results of the paper using the same procedures and original data. Our results confirm the sign, magnitude and statistical significance of the author’s reported estimates across each of their five main findings. | |
| https://ideas.repec.org/p/zbw/i4rdps/43.html | In the paper of, Altmann et al. (2022) the authors investigate whether positive effects which are due to behavioral policy interventions in policytargeted domains come along with negative effects in policy non-targeted domains. Using lab and online experiments where subjects have to solve one policy-focused decision task and one non-focused background task, the authors show that increasing incentives or steering attention to the former led to higher attention spans, lower default adherence rates, and a higher choice quality in the decision task. However, because of steering participants focus to the decision task, lower choice quality and lower attention spans in the background task emerged as a consequence, which was particularly pronounced among individuals with lower cognitive capabilities and complex decision tasks. Essentially, the authors also describe that the negative effects in the background tasks offset the positive effects in the decision task, ultimately yielding a net-zero effect overall. Therefore, the authors emphasize policymakers to also consider the potential negative cognitive spillovers in order to not overestimate the benefits of behavioral policy interventions. All the results the authors in the main text report are significant on 5% and 1% significance levels. All findings presented in the main text of the paper can be replicated using the original Stata code and verified thoroughly using R. Additionally, we performed two robustness tests to ensure the reliability of the paper’s main results, and they remained consistent. Hence, the reported findings in the paper appear to be robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/44.html | Jetter and Stockley (2023) successfully replicate nearly all 140 analyses we report in the original paper and appendix. In the process, they identified two errors. We appreciate this effort and made corrections to the data and code. Revising the analyses to correct these errors results in small changes to the output but does not change the significance, direction, or substantive effects of the central variables in the paper and does not alter our conclusions. The authors of the replication paper then extend their efforts beyond replication and, based on this work, conclude our work "does not provide sufficient support" for a gendered revision to the conventional rally 'round the flag framework. We respectfully disagree with their conclusion because it ignores theory, disregards key components of the critical test case, ignores evidence provided in the article and supplementary materials, revises the empirical approach, and commits to strict p-value cut-offs that risk Type II errors. | |
| https://ideas.repec.org/p/zbw/i4rdps/45.html | We replicate the analysis provided in Bokobza et al. (2022). They identify a causal effect of failed coup attempts on cabinet minister removals in autocracies on both the country and individual minister level and show that higher-ranking ministers and those holding strategic positions are more likely to be purged than more loyal and veteran ministers using fixed effects panel models. We focus on computational reproducibility and robustness replicability. In addition to reproducing the original results using Stata and R, we replicate analyses using random effects panel models and ordered beta regression models, reproduced analyses performed in R using different packages, replaced the main independent variable, clustered standard errors on a different level, and added independent variables related to coup-proofing. We find that the original findings were reproducible and robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/46.html | Marcus, Siedler and Ziebarth (2022 American Economic Journal: Economic Policy) examine the long-run health effects of a universal sports-club voucher program that was introduced in Saxony for primary school children in 2009. In 2018, the authors designed a survey that targeted the affected cohorts and nearby cohorts in Saxony and two neighboring states, and use a differences-in-differences identification strategy that exploits variation across states and cohorts in policy exposure. The authors document that treated individuals have knowledge of the program and recall receiving and redeeming the vouchers at higher rates, but find no effects on any health outcomes or behaviors. We successfully reproduce the main results of the paper exactly using data available in the paper's replication package and new Stata and R code. We also verify the robustness of the results using different outcomes, different control variables, different sample restrictions and different inference methods. | |
| https://ideas.repec.org/p/zbw/i4rdps/47.html | Goni (2022) relies on a novel data on peerage marriages in Britain to ex- amine the impact of matching technology on marital sorting. He relies on the London Season interruption (1861{1863) as a natural experiment that raised search costs and reduced market segregation. In his preferred specification, he exploits exogenous variation in women's probability to marry during the interruption for their age in 1861 and finds that the interruption increased the probability of marrying a commoner; reduced the probability of marrying an heir, increased the difference in spouses' family landholdings (in absolute value); decreased the difference in spouses' family landholdings (husband - wife); and increased the likelihood of never getting married (See Table 2, columns 1 to 6, respectively). First, we reproduce the paper's main findings and find no coding errors. Second, we test the robustness of the results to (1) the use of additional fixed effects and (2) sample restrictions. Finally, we examine the heterogeneous effects of this interruption by age and year. We find that original estimates are robust and are not significantly affected using these alternative specifications. | |
| https://ideas.repec.org/p/zbw/i4rdps/48.html | Hanushek et al. (2021) test how country-level measures of patience and risk-taking from the Global Preference Survey predict student performance on the Programme for International Student Assessment (PISA) math test. They find that country-level patience positively predicts math test scores and country-level risk-taking negatively predicts math test scores. They find similar results when holding country of residence characteristics constant and focusing on the preferences of the country of origin of migrants. We have checked the computational reproducibility and find that the data and analysis script provided by the authors allowed us to exactly reproduce the main tables in the paper. We also checked the robustness replicability by testing how robust the results are to decisions about imputation, weighting, operationalization of dependent variables, choice of control variables, and the inclusion of highly leveraged observations. We see that results are generally robust, though statistical significance of the risk-taking coefficient in the migrant analysis hinges on whether a control for OECD country of residence is included. Finally, we check the conceptual replicability of the results by using data from the Trends in International Mathematics and Science Study (TIMSS) instead of PISA - a different dataset with a different standardized test. This exercise shows that their results are robust to expanding the analysis to countries participating in both PISA and TIMSS. | |
| https://ideas.repec.org/p/zbw/i4rdps/49.html | Hussam et al. (2022a) use a cash grant experiment in India to demonstrate that community knowledge can help target high-growth microentrepreneurs. In their preferred specification, the authors find that the average marginal return to the grant is 9.4 percent per month, while estimated returns for entrepreneurs reported by peers to be in the top third of the community are between 24 percent and 30 percent. First, we reproduce the paper's main findings and uncover one minor coding error, which affects the estimates for one of the main tables but does not change the overall conclusions of the paper. Second, we test the robustness of the results to: (1) different treatment of outliers, (2) dropping surveyor and survey month fixed effects, and (3) using quartiles instead of terciles for grouping the ranking of entrepreneurs. The paper's results are robust to these robustness checks. Finally, we test heterogeneity of results by gender, which was not reported in the original study. | |
| https://ideas.repec.org/p/zbw/i4rdps/50.html | Instructions: Hariri & Wingender add new nuance to the traditional wisdom that economic modernisation is a path to democracy. They show that the diffusion of repressive, military technologies, causes a decline in the number of democratisations in the following years, and argue that this is because of a greater ability to forcefully oppress popular dissent. We conduct a robustness replication exercise, focussed on three tests: i) Are findings robust to alternative weightings of individual technologies in the instrument for country-aggregate military technology? ii) Is high leverage in individual countries, regions or time periods driving the global findings? iii) Are the strength of the IV and its independence of important macroeconomic indicators a chance occurrence? The main findings of the paper are largely robust to these tests. | |
| https://ideas.repec.org/p/zbw/i4rdps/51.html | Görtz et al. (2022) estimate the effects of innovations to future total factor productivity (TFP) on financial markets. In a Bayesian vector autoregression, they identify a TFP news shock as one that explains the largest share of 40- quarter ahead forecast error variance (FEV) of TFP. Their estimated impulse responses functions show that a positive news shock significantly decreases credit market spreads and increases credit market supply. They also find that a shock that explains the maximum of the FEV of the "excess bond premium" (EBP) (Gilchrist and Zakrajsek 2012) causes similar responses. These results are consistent with an estimated DSGE model with financial frictions. We estimate the main IRFs of the study using the original data and a frequentist estimation approach. We obtain similar point estimates for the dynamic responses to TFP news and EBP max-share shocks. We also update their macroeconomic and financial time series, as some of the data has been revised substantially since their original estimate. We use the updated data to re-estimate the above-mentioned IRFs, and we find that the results are robust to this change in the data. Finally, we investigate the computational reproducibility of their DSGE results, and find that their provided code (consistent with warnings in their README file) does not execute in the most recent version of Dynare or Matlab. Using the version indicated in their replication files, we encounter issues estimating the posterior mode. | |
| https://ideas.repec.org/p/zbw/i4rdps/52.html | We use unique data from journal submissions to identify and unpack publication bias and p-hacking. We find that initial submissions display significant bunching, suggesting the distribution among published statistics cannot be fully attributed to a publication bias in peer review. Desk-rejected manuscripts display greater heaping than those sent for review i.e. marginally significant results are more likely to be desk rejected. Reviewer recommendations, in contrast, are positively associated with statistical significance. Overall, the peer review process has little effect on the distribution of test statistics. Lastly, we track rejected papers and present evidence that the prevalence of publication biases is perhaps not as prominent as feared. | |
| https://ideas.repec.org/p/zbw/i4rdps/53.html | Okeke (2023) evaluates a policy experiment conducted in Nigeria, whereby communities were randomly allocated to receive a new doctor at the local public health center. The performance of these centers was compared to other sites which were allocated either a new midlevel health-care provider, or no additional staff. The study finds that communities assigned a new doctor were associated with a decrease in seven-day infant mortality, such a decrease was not observed in communities assigned a midlevel health-care provider. This suggests that it is the 'quality' of the additional doctor driving the effects rather than due to a quantity increase of an additional health worker. The size of the mortality reduction increased with increased exposure to the intervention. We first conduct a computational reproduction, rerunning the original code and data, finding that the results reported in the original study are reproducible. Second, we test the robustness of the results in several ways, by 1) adapting the existing controls to make the results robust to contamination bias, 2) altering and adding to the control variables included, 3) changing the specification or regression technique used, and 4) testing coding grouping and changing how service use was coded. These changes cause little change to the point estimates, although we find that the original paper's standard errors were overly conservative, and thus the statistical significance of some results was understated. | |
| https://ideas.repec.org/p/zbw/i4rdps/54.html | Hollyer, Klašnja, and Titiunik (2022) analyse the trade-off that political parties face between running programmatic campaigns and fielding charismatic candidates, whose electoral appeal may come at the cost of undermining the party brand. They argue that higher electoral volatility prompts parties to rely on charismatic candidates, even though they might not be as loyal to the party's programmatic stance. They substantiate their argument with a cross-national dataset and a quantitative case study in Brazil. We computationally reproduced and conducted further robustness tests for their cross-national study by translating the Stata code to R. Next, we conducted a computational reproduction and some additional robustness tests for the quantitative case study. We find that their cross-national analysis is reproducible, albeit with some minor discrepancies. The quantitative case study is also largely reproducible and both are robust in several ways. We conclude by making some suggestions about data dissemination and robustness checks for authors of regression discontinuity designs. | |
| https://ideas.repec.org/p/zbw/i4rdps/55.html | Peter Leeson, August Hardy and Paola Suarez (2022) test maximizing behaviour of panhandlers at several Metrorail stations in Washington, D.C. Their main findings are that "stations with more panhandling opportunities attract more panhandlers" (the first statement) and that "cross-station differences in hourly panhandling receipts are statistically indistinguishable from zero" (the second statement). We test computational reproducibility and robustness replicability of their results. We can reproduce both statements, in Stata and R. Our robustness replications for the first statement confirm the authors' results in the vast majority of cases (replication was successful in 91% of the cases). Our robustness replications for the second statement might raise doubts on this finding. We run weighted ANOVA tests, we change the bounds in minutes used by authors by 5 minutes in their robustness checks, we run Bartlett's tests of equality of variances of means, and run pair-wise tests of equality of means. In three out of four cases we cannot replicate the results, and the differences (of either means, medians or variances of donations) across Metrorail stations are statistically different from zero. We hypothesize that panhandlers have a general idea about which stations have more passers-by, and will rationally go more often there. However, they are unlikely to have information about smaller variations in the number of passers-by (e.g., variations in passers-by at the same station over time due to non-public events), and therefore might find it difficult to perfectly maximize donations. | |
| https://ideas.repec.org/p/zbw/i4rdps/56.html | This paper replicates the study "A Model of Secular Stagnation: Theory and Quantitative Evaluation" by Eggertsson et al. (2019) using the Dynare toolkit. Replication is important as it confirms the results of the original article, provides a user-friendly version using Dynare (Adjemian et al., 2022), and shows how to deal with large-scale models with occasionally binding constraints. The results show that the original Matlab code was fully replicated, but minor discrepancies were found between the paper's equations and the code. The two models produce similar dynamics but with small differences, particularly at the beginning of the simulation. | |
| https://ideas.repec.org/p/zbw/i4rdps/57.html | In their study, Grier et al. (2023) explore the causal relationship between campaign contributions and roll-call voting. Their analysis focuses on the influence of campaign contributions on two specific anti-sugar votes conducted in 2013 and 2018. The authors identify a substantial increase in inflationadjusted sugar contributions from the sugar industry to incumbent politicians between these two voting events. The aim of our research is to replicate and validate the authors' main models. In addition to cross-platform replication, we conduct several robustness checks to further examine the reliability of their findings. These include (1) clustering the standard errors, (2) utilizing an Ordinary Least Squares (OLS) model instead of the authors' logistic regression, and (3) altering the dependent variable to represent the change in the vote from 2013 to 2018. Our results largely confirm the authors' findings and reveal additional insights regarding the money buys vote hypothesis. | |
| https://ideas.repec.org/p/zbw/i4rdps/58.html | Douenne and Fabre (2022) implement a representative survey following the Yellow Vests movement in France that started in opposition to the carbon tax in 2018. They find that a majority of French citizens would oppose a carbon tax and dividend program with proceeds paid equally to each adult. The authors further find that respondents have pessimistic beliefs about several aspects of the policy. They then show how informational treatments cause respondents to update these beliefs, and they finally estimate the causal effect of these beliefs on support for the policy. In this note, we focus on the second section of this paper: the causal effects of feedback on beliefs. Based on elicited household characteristics, Douenne and Fabre (2022) estimate whether each household "wins" or "loses" from the carbon tax and dividend reform. They provide this binary (win vs. lose) information to households and subsequently ask households to evaluate whether they believe they would financially benefit from the policy. By exploiting the discontinuity in win vs. lose feedback, they assess the degree to which feedback affects subjective beliefs, finding that a household that is told it will "win" as a result of the reform increases its subjective belief that it will not lose by about 25 percentage points. The subset of households that is part of the Yellow Vests movement, however, revises its subjective belief of not losing upwards by only 10 percentage points after being told that it will "win" from the carbon tax reform. Conversely, households who initially support the tax increase this belief by 41 percentage points when told they will "win." In this note we replicate this second section of the paper-the causal effects of feedback on beliefs- using the processed data provided by the authors. We successfully replicate the average treatment effect, but we find that the heterogeneous treatment effects may be biased due to model misspecification. While our results support the conclusion that these estimated effects depend on a household's attitudes toward the policy, we find that the source of heterogeneity differs. Further, we note two changes to the analysis that we believe are appropriate (which do not affect the conclusions drawn): first, some (1.8%) of observations in the dataset appear to be misclassified-wrongly coded as if a household would "lose" when in fact they would "win"-and second, the main causal analysis is based on a regression discontinuity design, but does not include standard components of such a design (e.g., a RD plot, optimal selection of bandwidth, density analysis, placebo tests). We update the design to address both of these points. We find results that generally support the main conclusions of Douenne and Fabre (2022), but we urge caution when interpreting the heterogeneous treatment effects. | |
| https://ideas.repec.org/p/zbw/i4rdps/59.html | Rossi (2022) examines the relative efficiency of skilled workers across countries. He finds the elasticity of skill efficiency with respect to GDP per worker is 1.4 and that the relative human capital accounts for only about 9 percent. We reproduce the paper's main findings and test the sensitivity of the results to (1) alternative samples and (2) additional controls for determining wages. We find the results remain robust to these alternative specifications, and the estimated values of the key elasticities remain nearly unchanged. | |
| https://ideas.repec.org/p/zbw/i4rdps/60.html | We study how author-editor and author-reviewer network connectivity and "match" influence editor decisions and reviewer recommendations of economic research at the Journal of Human Resources. Our empirical strategy employs several dimensions of fixed effects to overcome concerns of endogenous assignment of papers to editors and reviewers. Authors who attended the same PhD program, were ever colleagues with, are affiliates of the same National Bureau of Economic Research program(s), or are more closely linked via coauthorship networks as the handling editor are significantly more likely to avoid a desk rejection. Likewise, authors from the same PhD program or who previously worked with the reviewer are significantly more likely to receive a positive evaluation. We also find that sharing "signals" of ability, such as publishing in the "top five", attending a high ranked PhD program, or being employed by a similarly ranked economics department, significantly influences editor decisions and/or reviewer recommendations. We find some evidence that published papers with greater author-editor connectivity subsequently receive fewer citations. | |
| https://ideas.repec.org/p/zbw/i4rdps/61.html | In this replication study, we revisit the main empirical claims of Hamel and Wilcox-Archuleta's (HW) 2022 study on the impact of daytime racial diversity on White Americans' voting behavior and racial attitudes. HW introduce a novel zip code level measure of racial diversity that accounts for the influx of Black workers during daytime, showing that conventional purely residential based measures often underestimate the true degree of experienced racial diversity. Using survey data from the CCES, their findings suggest a negative correlation between racial flux and White Americans' Democratic voting tendencies and a positive correlation with racial resentment and opposition to affirmative action, all while controlling for the residential share of Blacks in the zip code. We assess the replicability of these findings by: (1) replicating the main results using the provided replication code, (2) reconstructing the racial flux measure and survey from raw data, (3) conducting multiverse analyses, and (4) replicating the analysis using an alternative data source. Our replication validates the robustness and accuracy of HW's initial conclusions, emphasizing the role of daytime racial diversity in shaping White Americans' political and racial attitudes. | |
| https://ideas.repec.org/p/zbw/i4rdps/62.html | Dickens (2022) studies the role of trade on long-run inter-ethnic linguistic differences. He establishes that neighboring ethnolinguistic groups have smaller (lexicostatistical) linguistic distances when there is a larger agricultural productivity variation between them. Specifically, he establishes that pre-1500 land productivity variation (CSI SD) and its change due to Columbian Exchange in the post-1500 (CSI SD CHANGE) era decrease linguistic distances between groups. In what can be considered his main specification, which includes geographical controls, spatial controls, and language family fixed effects (Table 1 column 5), he estimates that a one standard deviation increase in the change in land productivity variation (post-1500) decreases linguistic distances by 0.11 standard deviations (p-value | |
| https://ideas.repec.org/p/zbw/i4rdps/63.html | Gonzalez and Özak (2023) provide a direct and successful replication of Dickens (2022). Using a reconstructed version of the main independent variables from the same original sources, in addition to an updated version of the source data, the replicators confirm the main finding of the original study. In addition to the replication, Gonzalez and Özak (2023) develop an alternative measure of potential gains from inter-ethnic trade. They use this new measure in an interesting extension that delves deeper into the the specifics of the inter-ethnic trade mechanism proposed and tested by Dickens (2022). In this response, I clarify two minor points about how the original data set was constructed, and contrast the potential shortcomings of the original and alternative measures of inter-ethnic gains from trade. | |
| https://ideas.repec.org/p/zbw/i4rdps/64.html | This replication report examines and extends the research conducted by Butera, Metcalfe, Morrison, and Taubinsky (2022) on "The Welfare Effects of Pride and Shame." The original paper explores the welfare implications of public recognition as a motivator for desirable behavior and introduces an empirical methodology to measure Public Recognition Utility (PRU), which quantifies the utility individuals experience when their actions are publicly recognized. This report focuses on the real effort experiment reported in the paper that was conducted using a classroom sample, a lab sample, and an online sample. I computationally reproduce the original results and verify their robustness. While reproducing the results, I found two minor coding errors in the replication package. Correcting these errors slightly changes some estimates reported in the paper but does not turn over any results. The main treatment effect findings are further robust to using different sets of controls and sample selection criteria. Moreover, I conduct a heterogeneity analysis which reveals significant variations in how participants value public recognition. Overall, the replication study confirms the original conclusions while providing additional insights into the heterogeneity of PRU shapes on an individual level. | |
| https://ideas.repec.org/p/zbw/i4rdps/65.html | Drobner (2022) examines the effect of manipulating experimental subjects' expectations about uncertainty resolution in learning about their performance on their belief updating patterns in an ego-relevant domain. In their preferred empirical specification, the author finds that individuals update their beliefs optimistically as they exhibit a higher belief adjustment in response to good compared to bad news only when they do not expect resolution of underlying uncertainty about their performance in an IQ test and neutrally when they know they will find out their relative performance at the end of the experiment. First, we reproduce the all of the paper's findings without identifying any coding errors. Second, we test the robustness of the results to (1) adding individual covariates and (2) excluding subjects who exhibit a fundamental error in their belief updating from the analysis. We find no substantial changes in the main coefficients of interest with the inclusion of demographic variables in the analysis, consistent with demonstrated balance in covariates between the two experimental groups. Yet, several of the main estimates lose statistical significance and change from conservatism (under-updating) to over-inference (over-updating) in some conditions on the subset of participants excluding those who exhibit fundamental errors in belief updating. | |
| https://ideas.repec.org/p/zbw/i4rdps/66.html | Bianchi and Giorcelli (2022) study the long-term and spillover effects of a management intervention program on firm performance in the US, between 1940 and 1945. The authors find that the Training Within Industry (TWI) program led to positive effects which lasted for at least 10 years. Firm sales of treated firms increasedd by 5.3% in the first year after implementation, peaking at 21.7% after 8 years, before reducing to 16% gains after a decade. The authors claim that the program generated long-lasting changes in man- agerial practices. Finally, the program also led to positive spillover effects on the supply chain of treated firms. First, we reproduce the paper's main findings. Second, we test the ro- bustness of the results to (1) changing the main specification sample and (2) testing other difference-in-differences estimators, using the same data, pro- vided by the authors. We find that the results are robust to these changes. All point estimates in the study remain statistically significant and of similar magnitude. While the paper's finding reproduce and replicate, challenges in reproduc- ing results we encountered lead us to recommend improvements to journals' code policies. | |
| https://ideas.repec.org/p/zbw/i4rdps/67.html | Córdova and Kras (2022) examine how the existence of a women's police station (WPS) in the place of residence influences citizens' attitudes toward gender-based violence in Brazil. In their analytical specification, the authors find that men are more likely to reject violence against women (VAW) and support bystander intervention in municipalities with a WPS, especially if the WPS has been operating for a long time. This paper examines the replicability and robustness of Córdova & Kras' (2022) findings. First, we reproduce the paper's main findings and uncover one minor coding error and three estimates that have been reported with the opposite sign compared to that in our reproduction; neither is of consequence for the study's main results. Second, we test the robustness of the results by (1) recoding one of the main explanatory variables and several of the control variables to account for non-linear trends, (2) using alternative techniques to estimate clustered standard errors, (3) consistently applying a 95% confidence level in the presentation of the results, (4) altering the propensity score matching (PSM) procedure as well as the composition of the variables used in the PSM robustness check, (5) using an alternative technique to test for multicollinearity, (6) excluding potential endogenous control variables, and (7) using an alternative coding for computing margins. Reassuringly, the results are robust to most of these tests. However, two of the robustness checks challenge parts of the paper's main findings. First, allowing for non-linearity in the effect of time since the establishment of WPS shows (a) a non-linear effect on VAW and (b) no apparent changes in either male or female attitudes over time once the WPS has been established. Second, the inclusion of other variables in the PSM procedure renders part of the main estimates of interest statistically nonsignificant (p | |
| https://ideas.repec.org/p/zbw/i4rdps/68.html | This paper reviews the impact of replications published as comments in the American Economic Review between 2010 and 2020. We examine their citations and influence on the original papers' subsequent citations. Our results show that comments are barely cited, and they do not affect the original paper's citations - even if the comment diagnoses substantive problems. Furthermore, we conduct an opinion survey among replicators and authors and find that there often is no consensus on whether the original paper's contribution sustains. We conclude that the economics literature does not self-correct, and that robustness and replicability are hard to define in economics. | |
| https://ideas.repec.org/p/zbw/i4rdps/69.html | In Talking Shops: The Effects of Caucus Discussion on Policy Coalitions, Zelizer analyzes the causal effect of caucus deliberations on legislative policy coalitions. In practice, political scientists have little empirical evidence on how policy discussions actually work among sitting legislators and whether these discussions have an effect on policy making and policy opinion. Taking on this challenge, Zelizer conducted two field experiments in an American state legislature. In short, the experiments randomized whether a bill was selected for discussion among a bi-partisan legislative caucus. The paper then measures and reports the corresponding effects of that discussion around the bill. Zelizer finds that deliberation increased the amount of co-sponsorship for a given bill, among both co-partisans and counter-partisans, but deliberation did not effect whether a bill was passed by the legislature or whether the bill received more amendments. We conduct a robustness replication of the main results of Talking Shops. Specifically, we reproduce Tables 3 and 4 of the paper under alternative specifications. We find that the main results of the paper are reproducible and robust to multiple alternative specifications. | |
| https://ideas.repec.org/p/zbw/i4rdps/70.html | This paper reanalyzes Khanna (2023), which studies labor market effects of schooling in India through regression discontinuity designs. Absent from the data are four dis-tricts close to the discontinuity; restoring them cuts the reduced-form impacts on schooling and log wages by 57% and 63%. Using regression-specific optimal band-widths and a robust variance estimator clustered at the geographic unit of treatment makes impacts statistically indistinguishable from 0. That finding is robust to varying the identifying threshold and the bandwidth. The estimates of general equilibrium effects and elasticities of substitution are not unbiased and have effectively infinite first and second moments. | |
| https://ideas.repec.org/p/zbw/i4rdps/71.html | Roodman (2023) (henceforth R23) re-evaluates Khanna (2023) (henceforth K23). R23 is able to replicate K23's results, highlighting no mistakes in K23's analysis. R23 argues that K23's results may be sensitive to recreating part of the underlying district-level sample, using a subset of K23's datasets. In this reply, I show that despite concerns with R23's sample construction, K23's results are robust to evaluating R23's sample as given. R23 raises other secondary questions, which this reply answers. I also address R23's misinterpretations of K23's general equilibrium model. | |
| https://ideas.repec.org/p/zbw/i4rdps/72.html | Bold et al. (2022b) investigate the effect of providing access to a larger, centralized market where quality is rewarded with a premium on farm productivity and framing incomes from smallholder maize farmers in western Uganda, using a series of randomized experiments and a difference-in-differences approach. We successfully reproduce the results of this study using the publicly provided replication packet. Then test the robustness of these results by re-defining treatment and outcome variables, testing for model misspecification and the leverage of outliers, and testing for non-random selection in the Fisher-permutation process. Our results show that the findings in Bold et al. (2022b) are robust to a variety of decisions in the research process. This evokes confidence in the internal validity of the findings. | |
| https://ideas.repec.org/p/zbw/i4rdps/73.html | Mangonnet et al. (2022) examine whether political alignment at the national and sub-national levels explain the spatial designation of Protected Areas (PAs) in Brazil. Their identification relies on spatial discontinuities in political alignment across municipalities. They find that a president-mayor coalition alignment reduces the incidence of PAs by about one percentage point, whereas they find no party alignment effects. We were able to reproduce the paper's findings using the same code and software. Alternative software routines reproduce their results with small and inconsequential numerical differences. Moreover, robustness replications find consistent results for one out the two treatments. Finally, we find no evidence of fabrication of data. | |
| https://ideas.repec.org/p/zbw/i4rdps/74.html | Ngangoué and Schotter (2023) investigate common-probability auctions. By running an experiment, they find that, in contrast to the substantial overbidding found in common-value auctions, bidding in strategically equivalent common-probability auctions is consistent with the Nash equilibrium. We reproduce their results in R, conduct robustness checks on how their sample was constructed, and consider possible heterogeneity. We confirm their documented qualitative results. | |
| https://ideas.repec.org/p/zbw/i4rdps/75.html | Gagliarducci and Paserman (2022) study gender differences in cooperative behavior among politicians using information from the U.S. House of Representatives between 1988 and 2010 on (i) the number of co-sponsors on bills and (ii) the share of co-sponsors from the rival party. Through different empirical strategies, they show that women-sponsored bills tend to have more co-sponsors, but the gap is only statistically significant among Republicans. Moreover, Republican women recruit a significantly larger share of co-sponsors from the rival party than Republican men, whereas the opposite is true among Democrats. GP argue that the observed pattern is consistent with a commonality of interest driving cooperation, rather than gender per se, since during this period Republican women were ideologically closer to the rival party than their male colleagues, while female Democrats were further away. We examine the robustness of these findings to (i) the correction of some errors in two control variables of the dataset used by GP and (ii) clustering the standard errors at the individual level, instead of individual-term. These changes have a relatively minor impact on results: most coefficients are still statistically significant and the main conclusions from the analysis are confirmed. Furthermore, we extend the analysis to the 2011-2020 period. The analysis of gender differences in bipartisan cooperation confirms GP's hypothesis that ideological distance plays an important role. However, results are slightly different when we analyze overall cooperation. The gender gap in favor of women is larger in magnitude than in GP and it is statistically significant in several specifications, providing support for the hypothesis that gender also matters for cooperation. | |
| https://ideas.repec.org/p/zbw/i4rdps/76.html | This report replicates and examines Bauer et al.'s (2021) paper on monetary policy transmission to financial markets. The paper introduces novel measures of monetary policy uncertainty and analyses its drivers. It also investigates the impact of uncertainty changes on interest rates and financial asset prices. We assess reproducibility, consolidate market uncertainty measures using PCA and Factor Analysis, and rigorously test the reduction of uncertainty after Federal Market Open Committee (FOMC) announcements. Our findings support the paper's claim of reduced uncertainty on meeting days. Additionally, we explore the implications of the uncertainty channel on various financial assets, such as Gold, the Swiss Franc, European stock indexes, and Bitcoin. | |
| https://ideas.repec.org/p/zbw/i4rdps/77.html | Bauer et al. (2022) derive market-based monetary policy uncertainty and uncover an 'FOMC uncertainty cycle' characterized by a fall of uncertainty after FOMC announcements and its subsequent built-up. Then, the authors show that the financial markets' response to monetary policy announcements depends on the level of short-rate uncertainty on the day before the FOMC announcement. First, we reproduced the paper's findings, though with Matlab version-specific issues. Second, we tested the robustness of the two main results of the paper. We show that the uncertainty cycle in the monetary policy uncertainty is confirmed when the crisis period is included in the sample or when the median instead of the average of changes in the monetary policy uncertainty is considered. However, the FOMC uncertainty cycle does not appear when the monetary policy uncertainty index (Husted et al. 2020) or the daily economic policy uncertainty index (Baker et al. 2016) are used as uncertainty proxies. | |
| https://ideas.repec.org/p/zbw/i4rdps/78.html | Figueiredo (2022) examines wage cyclicality across the skill mismatch distribution finding large differences. Some key results include finding that wages are acyclical in good labor market matches but procyclical in poor matches. Using the public replication material provided by the authors, we were able to exactly duplicate the results of the study. Further, using several further robustness checks, such as subtracting (potentially correlated) covariates in the regressions, using different standard errors (rather than clustered ones), or different time periods of the data left the key results largely unchanged with some minor caveats. | |
| https://ideas.repec.org/p/zbw/i4rdps/79.html | Guay and Johnston (2022) examine asymmetric politically motivated reasoning on the part of liberals and conservaites. In our replication of the paper we examine four potential issues with the analysis: confounding in the numeracy task, heterogeneity across ideological constraints, the use of control variables, and heterogenity in the moderator index items. None of these potential issues are in fact issues. The results are quite robust. We found only one minor issue with the codebook, which does not affect the results. | |
| https://ideas.repec.org/p/zbw/i4rdps/80.html | Jones and Marinescu (2022) study the employment effects of a universal cash transfer in Alaska. Using a synthetic control method, they find that the transfer had no negative effects on employment. We reproduce the results using their replication package and investigate if the results hold when using a different software to run the analysis. We also use different estimation techniques and perform sensitivity checks to assess robustness of the results. We find some differences in the size and significance of the average treatment effects on labor force participation and hours worked when we use a different software (R) and various extensions of the synthetic control method. We also find smaller coefficients on part-time employment when including more covariates. However, these differences do not contradict the main conclusion of the paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/81.html | Carlson et al. (2022) examine the causal impact of banking competition by investigating a unique circumstance in the National Banking Era of the nineteenth century in the US, where a discontinuity in bank capital requirements occurred. On the one hand, their findings suggest that banks operating in markets with fewer barriers to entry tend to increase their lending activities, promoting real economic growth. On the other hand, banks in less restricted markets also exhibit a higher propensity for risk-taking, posing risks to financial stability. First, we fully reproduce the paper's outcomes apart from a minor discrepancy in the estimate of Table 9 attributed to issues in the provided codes. Second, we test the robustness of the results by (i) changing the ranges used to select the sample of cities included in the analysis, (ii) adopting different options to address outliers' potential issues and (iii) introducing additional control variables. We observe that the estimation results remain mostly consistent when subjecting them to various robustness checks. However, it is worth highlighting that the results can be partially influenced by the criteria used to select the sample of cities and the inclusion of control variables. | |
| https://ideas.repec.org/p/zbw/i4rdps/82.html | [Introduction:] This is a replication of Mayshar et al. (2022) (henceforth MMP).1 The article posits that the state (defined as societal hierarchy such as tax-levying elites) originated from cultivation of appropriable cereal grains, contrary to the conventional theory that the state originated from increased land productivity following the adoption of agriculture. The article uses multiple datasets to demonstrate a causal effect of cereal cultivation on hierarchy (Claim 1) without finding a similar effect for land productivity (Claim 2), and that societies based on roots or tubers display levels of hierarchy similar to nonfarming societies (Claim 3). (...) | |
| https://ideas.repec.org/p/zbw/i4rdps/83.html | The paper estimates the effect that changes in household vulnerability have on citizens' participation in clientelist relationships. The authors exploit two sources of variation in household vulnerability: rainfall shocks, and a randomized intervention that provided cisterns in drought-prone areas. We reproduce all the findings presented in the four main results tables presented in the paper. The results of our robustness replication show that the results in the original paper are robust to variations in the rainfall period used as a baseline to assess changes in household vulnerability, and to exclusions that eliminate individuals in the sample who may have been substituted with others at different survey points. However, some of the original results that explain the underlying mechanisms are sensitive to how "clientelist relationships" are defined. When more frequent interactions with politicians are used as the defining characteristic of households in clientelist relationships, we find that the original results suggesting clientelism as a significant mechanism are no longer statistically significant at any standard significance level. We note, however, that the authors, in a reply to questions we sent them after the Replication Games, convincingly show that their results are robust to changing the definition of the clientelist marker. | |
| https://ideas.repec.org/p/zbw/i4rdps/84.html | Bobonis, Gertler, Gonzalez-Navarro, and Nichter (2022) conducts a randomized control trial in rural Northeast Brazil designed to reduce the vulnerability of sampled households. In this development intervention, we constructed residential water cisterns across 425 neighborhood clusters in 40 municipalities, and examine effects using a longitudinal panel survey and electoral data at the precinct level. Ma, Monpetit, and Nordstrom's (2023) comment confirms the reproducibility of our results. Moreover, their comment does not challenge any of our article's primary findings: the cisterns treatment significantly r educed c itizens' vulnerability (Table 2), it decreased citizens' requests for private goods from politicians (Table 3), and it significantly decreased votes for incumbent mayors (Table 4). The comment by Ma, Monpetit, and Nordstrom (2023) discusses three aspects of robustness: (1) the matching of individuals in the panel over time, (2) how clientelist relationships are defined, a nd ( 3) t he c hoice of historical rainfall period. With regards to the first aspect, the comment reports some age inconsistencies across waves for a relatively small subsample, even though it states that results remain "stable in terms of both magnitude and statistical significance" when excluding these observations. As discussed below, our longitudinal rostering procedure accurately identifies i ndividuals a cross s urvey waves, though some minor measurement error exists in reported ages. With regards to the second aspect, the comment challenges Section VI of our article, which presents additional heterogeneity analyses in Table 5 to explore the role of clientelism in our primary results. More specifically, the comment argues that those results are not robust to a more restrictive coding of the binary clientelism marker employed to test heterogeneity. Contrary to their critique, we show that analyses in Table 5 of our article are indeed robust to a more restrictive coding. With regards to the third aspect, their comment indicates that halving the window of historical data used to normalize rainfall affects only a single, ancillary result: the cistern treatment's impact on one of three well-being measures we examine (Column 3 in Table 2). Since Ma, Monpetit, and Nordstrom (2023) indicate that "the overall message remains the same" - and it is not obvious that their approach is preferable - we do not discuss the third aspect below. | |
| https://ideas.repec.org/p/zbw/i4rdps/85.html | Xu (2022) estimates the causal impact of bank failures on the level of trades with a staggered difference-in-differences design and an IV strategy with Bartik instrument, using the 1866 banking crisis as a quasi-natural experiment. Findings, based on historical data on the trades and loans between London banks and banks around the world, show that countries exposed to bank failures in London immediately exported significantly less and did not recover their lost growth relative to unexposed places. Moreover, the effect lasted for decades. First, we reproduce the paper's main findings by running the original code and uncover three issues, one of which that slightly affects the main estimates reported in the study. Second, we test the robustness of the results to (1) removing weights from the regressions, (2) using a spatial HAC correction for the standard errors, and (3) implementing a method for possibly heterogeneous treatment effects with a staggered difference-indifferences design. Overall, we conclude that the main findings are valid and robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/86.html | Manekin and Mitts (2022) investigate the success chances of minority ethnic groups when engaging in non-violent protests demanding political change. First, using observational data, the authors find that the success rate for nonviolent campaign tactics is lower for excluded/minority ethnic groups than for non-excluded/majority ethnic groups. Second, the authors use two original survey experiments to show that non-violent protest by ethnic minorities is perceived as more violent and requiring more policing than identical protest by majorities. This report reproduces the paper computationally and conducts several sensitivity analyses for both the observational and the experimental parts of the paper. We can confirm the general direction of the postulated effects, but evidence becomes less consistent (effect magnitudes and significance levels are not robust to some of the changes). | |
| https://ideas.repec.org/p/zbw/i4rdps/87.html | Leininger et al. (2023) study the political consequences of temporary disenfranchisement. Taking advantage of differentiated voting elegibility thresholds applying in different elections in Germany, they analyze how first-time voters react when losing eligibility in a follow-up election. They exploit this setting in a difference-in-differences design using panel data. They find that temporary disenfranchisement decreases perceived external efficacy by 0.19 points on a five-point Likert scale and satisfaction with democracy by 0.14 points. Both results are statistically significant at the five-percent level. In contrast, internal efficacy and political interest remain unaffected by the treatment, and regaining voting eligibility is not associated with statistically significant changes in respondents' attitudes. This report focuses on the computational reproducibility and robustness replicability of these findings. To assess the paper's reproducibility, we first attempt to reproduce the paper's estimates and figures using the author's replication materials. In a second step, we perform several robustness checks by means of alternative difference-in-differences specifications using coarsened exact matching and entropy balancing, and a closer examination of panel attrition. Overall, we find complete reproducibility of the original replication materials. Our robustness checks confirm the sign congruence and significance of coefficients reported in the original paper. We raise the issue of potential bias due to differential panel attrition rates between treated and untreated respondents. | |
| https://ideas.repec.org/p/zbw/i4rdps/88.html | Baron (2022) explores the independent effects of operational expenditure and capital expenditure on student outcomes in school districts across Wisconsin from the outcomes of close referendum approvals. By utilizing a dynamic regression discontinuity framework and cubic specification, the author finds that narrowly passing an operational referendum, increases operational expenditure per pupil by $298 each year on average, following the referendum over a ten year period. From this $198 are spent on instructional expenses. These point estimates are statistically significant at the 10% and 5% level, respectively. We first reproduce the main results from the paper without any issues arising. Secondly, we conduct a robustness replicability to (1) dropping school districts from the top and bottom 5% of the revenue limits distribution, categorically, and (2) dividing the time frame of the study into two periods: 1996-2005 and 2005-2014. We find that dropping the top 5% of the school districts by revenue limits reduces the additional operational expenditure by $140 per pupil (lower by 50 percent) and the effects of passing an operational referendum were nearly double in the former period compared to the latter period. Lastly, we find that the estimated effects on student outcomes rely heavily on recent observations. | |
| https://ideas.repec.org/p/zbw/i4rdps/89.html | Mahmood and Jetter (2023) rely on daily wind conditions as an exogenous source of variation to assess the effects of 420 US drone strikes conducted in Pakistan from 2006 to 2016. The findings indicate that these drone strikes promote a subsequent surge in terrorism over the following days and weeks, contributing significantly to as much as 19% of all terrorist incidents and resulting in over 3,000 casualties in Pakistan during the specified period. In this comment, we successfully reproduce all the results from Mahmood and Jetter (2023), including tables and figures. We then conduct four sensitivity analyses to confirm the primary findings outlined in the original paper. We document the robustness of the main results in three out of four sensitivity checks, involving the omission of all controls across various specifications, utilization of the fixest package in R, and the inclusion of control variables determined through Lasso regressions. However, we show that the addition of year fixed effects substantially reduces the first-stage F-statistics and challenges the established negative relationship between wind gusts and drone strikes. | |
| https://ideas.repec.org/p/zbw/i4rdps/90.html | Cohen and Dechezleprêtre (2022) investigate the heterogeneous impact of temperature on mortality across Mexico, and how affordable healthcare services that target the low-income population attenuate the mortality effects of weather events. They find that while extreme temperatures are more dangerous than less extreme temperatures, the increased frequency of non-extreme temperatures mean these temperatures cause more deaths. First, we reproduce the paper's main findings, uncovering a minor coding error that has a trivial effect on the main results. Second, we test the robustness of the results to clustering at the state level, omitting precipitation, and using a different weighting scheme. The original results are robust to all of these changes. | |
| https://ideas.repec.org/p/zbw/i4rdps/91.html | In this article, I perform a verification and a reproduction of the main results in Fernández and Fogli (2009), which estimates the role of culture in explaining the labor and fertility decisions of second generation immigrant women to the United States in 1970. While I am able to verify Fernández and Fogli's (2009) main results as well as their robustness relative to both labor and fertility decisions, I am unable to reproduce them relative to labor decisions in alternative samples drawn from the same underlying population. | |
| https://ideas.repec.org/p/zbw/i4rdps/92.html | Rogowski et al. (2022) use secondary data to study the impact of historic postal infrastructure on economic development, both cross-country and within the US. Their results suggest a large positive effect of post offices on economic development that is robust across various sensitivity checks. We successfully computationally reproduce all results. In a robustness assessment, we find the results to be robust to simple changes in the analysis but observe some sensitivity to accounting for spatial trends in the cross-country analysis. Additionally, we correct a coding inconsistency, showing that in the corrected version, one main robustness check for the US-analysis is no longer supporting the result. Despite this, we find the results to be overall robust given the numerous analyses and robustness checks in the original paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/93.html | Using a Diamond-Mortensen-Pissarides (DMP) model with noisy signals on worker-firm match quality calibrated on data from 30 US states for 1999 and 2017, Pries and Rogerson argue that improved screening may explain the decrease in short-term employment spells observed in the US labor market. Using a decomposition exercise in a "reduced form" model, the authors show that changes in short-term employment spells (δ₁ and δ₂) are almost entirely accounted for by changes in the rate of learning on match quality α and in the probability of a good match πᵍ. Then, using a decomposition exercise in a "structural" model, they show in their main calibration strategy that changes in δ₁ and δ₂ are mainly driven by changes in α and σϵ, parameters pertaining to learning about match quality. First, we reproduce the authors' codes in R and Python, two popular free open source programming languages. We find identical results to the paper. Second, we test the robustness of results to (1) using an earlier starting year, (2) adding additional states in the analysis, and (3) increasing the value of the 1999 mean vacancy duration parameter. The direction and relative size of the effect of each parameter on δ₁ and δ₂ is preserved in all robustness tests, corroborating the authors' argument. | |
| https://ideas.repec.org/p/zbw/i4rdps/94.html | Cheeseman and Peiffer (2022) field a survey experiment in Nigeria to test the effect of five different anti-corruption messages on participants' willingness to bribe public officials. They find that these messages generally fail to reduce bribes and could, in fact, increase bribes. They further show that these counterproductive effects of anti-corruption messages are especially pernicious for participants who believe corruption is widespread, whom they call "Pessimistic Perceivers." We find that Cheeseman and Peiffer's findings are computationally reproducible: using the same data and estimation procedures, we arrive at the same output reported in the original article. Furthermore, we find that following Cheeseman and Peiffer's strategy to dichotomize a three-item scale used as a moderating variable, their results are robust to different estimation strategies. However, we draw attention to several shortcomings of the original analysis. First, the distribution of the moderating variable is highly skewed: on a 0-1 scale, the mean value is 0.81. Cheeseman and Peiffer's dichotomization procedure is also sensitive to the cutoff threshold and produces unstable results. Similarly, when we employ more flexible estimation strategies for heterogeneous treatment effects when the moderator is measured on a continuous scale, the results appear less robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/95.html | De Haas and Popov (2023) estimate the effect of country-level financial sector size and structure on decarbonization to show that countries with relatively more equity versus debt financing have more emission-efficient economies. We uncover multiple coding errors that change the magnitude and the precision of the coefficients of interest. These coding errors include misreporting of standard errors, and misspecifying generalized method of moments (GMM) estimators. We further provide robustness tests of the results to (1) restricting the sample to consistent sets of countries across the country and country-byindustry samples, and (2) using a limited information maximum likelihood (LIML) estimator to address a weak-instrument problem. We find that the results from the robustness checks are qualitatively different from the original results but similar to the corrected results. | |
| https://ideas.repec.org/p/zbw/i4rdps/96.html | Listo, Saberian and Thivierge (2023) conduct a careful replication of De Haas and Popov (2023) using the data, code, and instructions we made available at the time of publication. They highlight an inconsistency between the table notes and the main text in how the clustering of standard errors is described; uncover a coding mistake in the GMM regressions; and show that weak instruments are unlikely to bias our 2SLS coefficients. In this reply, we show that our results remain economically meaningful and statistically precise (though smaller in magnitude) when we (i) cluster standard errors by country in the country panel; (ii) correct the GMM code; and (iii) include or exclude China in the country and industry samples. | |
| https://ideas.repec.org/p/zbw/i4rdps/97.html | Herzog, Baron, and Gibbons (2022) explore the effects of exposure to official elite rhetoric and group cues on public support against the international nuclear weapons prohibition norm. The authors find that elite cues, in particular security and institutional cues, increase individuals' opposition to the Treaty on the Prohibition of Nuclear Weapons (TPNW). However, elite cues do not seem to have an effect on changing individuals' broader attitudes towards nuclear weapons, as measured by individuals' existing opposition to nuclear arms. We replicate and expand the authors' methods and results to test the robustness of the effects found in the study. First, we reproduce the main finding using the authors' original data and method. We do not find any coding errors that undermine the authors' analysis or conclusions. Second, we test the robustness of the results by (1) using a different operationalization of party identity, and (2) calculating additional subgroup analysis for gender. We find no significant differences between our replicated and the original results, however females' support for the TPNW is more responsive to security cues, while males' support is more responsive to institutions cues. | |
| https://ideas.repec.org/p/zbw/i4rdps/98.html | No abstract is available for this item. | |
| https://ideas.repec.org/p/zbw/i4rdps/99.html | Bouton et al. (2022) compare the properties of majority run-off and plurality rule elections in a laboratory setting, focusing on Duverger's prediction that plurality rule leads to higher levels of strategic voting. They produce a causal estimate of the difference in incidence of strategic voting across systems, finding more strategic voting under the plurality rule. However, they find that coordination is only higher under the plurality rule when voters are sufficiently divided over which candidate they prefer. They conclude that differences in electoral outcomes and voters' welfare are modest. We are able to computationally reproduce the original study's main findings using the authors' replication package. The replication package contained both raw data and a cleaned dataset, but did not include a script for cleaning the raw data or a codebook to make sense of it. Therefore, the majority of our work focused on producing code to evaluate and clean the authors' raw data. The authors sent a very helpful response to an earlier draft of this report and their communication improved the quality of our replication effort. | |
| https://ideas.repec.org/p/zbw/i4rdps/100.html | This article reviews and summarizes current reproduction and replication practices in political science. We first provide definitions for reproducibility and replicability. We then review data availability policies for 28 leading political science journals and present the results from a survey of editors about their willingness to publish comments and replications. We discuss new initiatives that seek to promote and generate highquality reproductions and replications. Finally, we make the case for standards and practices that may help increase data availability, reproducibility, and replicability in political science. | |
| https://ideas.repec.org/p/zbw/i4rdps/101.html | Pre-registration is regarded as an important contributor to research credibility. We investigate this by analyzing the pattern of test statistics from the universe of randomized controlled trials (RCT) studies published in 15 leading economics journals. We draw two conclusions: (a) Pre-registration frequently does not involve a pre-analysis plan (PAP), or sufficient detail to constrain meaningfully the actions and decisions of researchers after data is collected. Consistent with this, we find no evidence that pre-registration in itself reduces p-hacking and publication bias. (b) When pre-registration is accompanied by a PAP we find evidence consistent with both reduced phacking and publication bias. | |
| https://ideas.repec.org/p/zbw/i4rdps/102.html | A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty not accounted for in reported standard errors and confi dence intervals. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population heterogeneity, design heterogeneity, and analytical heterogeneity. We estimate each type's heterogeneity from multi-lab replication studies, prospective meta-analyses of studies varying experimental designs, and multi-analyst studies. Our results suggest that population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. A conservative interpretation of the estimates suggests that incorporating the uncertainty due to heterogeneity would approximately double sample standard errors and confi dence intervals. We illustrate that heterogeneity of this magnitude-unless properly accounted for-has severe implications for statistical inference with strongly increased rates of false scientifi c claims. | |
| https://ideas.repec.org/p/zbw/i4rdps/103.html | Summan, Nandi, and Bloom (2023; SNB) finds that exposure of babies to India's Universal Immunization Programme (UIP) in the late 1980s increased their weekly wages in early adulthood by 0.138 log points and per-capita household consumption 0.028 points. But the results are attained by regressing on age, in years, while con-trolling for year of birth-two variables that, as constructed, are nearly collinear. The results are therefore attributable to trends during the one-year survey period, such as inflation. A randomization exercise shows that when the true impacts are zero, the SNB estimator averages 0.088 points for wages and 0.039 points for consumption. | |
| https://ideas.repec.org/p/zbw/i4rdps/104.html | In this paper, we explore the reproducibility of the König et al. (2022) paper on the timing of bill initiation under coalition governments and validate its scope condition by expanding the analysis to an additional government and country, namely the United Kingdom's Conservative-Liberal Democrat coalition government of 2010 to 2015. We find that König et al. (2022)'s main analysis is robust to reproduction, and that König et al. (2022)'s results do not travel to the UK's typical majoritarian system. Our additional contribution also highlights the potential for future research to further address the endogeneity of legislative institutions to coalition governance, and possible institutional confounders to coalition policing. | |
| https://ideas.repec.org/p/zbw/i4rdps/105.html | Chowdhury, Sutter and Zimmermann (2022) assessed the risk, time, and social preferences of family members in rural Bangladesh, presenting two main findings. First, there is a strong and positive association between family members' preferences, even when controlling for personality traits and family background. Second, families can be grouped into two clusters: approximately 20% of the families are characterized by relatively impatient, risk-averse, and spiteful members, while the rest of the families have relatively patient, risk-tolerant, and prosocial members. Recognizing the pivotal role of cluster analysis in deriving the second result, we first successfully computationally reproduced the results, and then we conducted two types of robustness checks. The first examines the transformation of variables (continuous or categorical), affecting the proximity measure that is crucial to cluster analysis. The second assesses the effect of varying the number of clusters on the findings. Some results are robust, as we consistently find the small cluster of families identified by Chowdhury et al. (2022). However, divergent outcomes emerge with categorical variables (a logical choice given their nature) and a larger number of clusters (3 or 4). We conclude that, although the cluster analysis by Chowdhury et al. (2022) is valid, its outcomes significantly depend on the researcher's assumptions and choices. Careful consideration of several alternatives is essential in exploratory cluster analysis to identify stable groups. | |
| https://ideas.repec.org/p/zbw/i4rdps/106.html | This paper is a replication study of Brouwer, T., Galeotti, F., & Villeval, M. C. (2023), using the original data. The study explores how social norms are transmitted from one generation to another, specifically from parents to children. The authors conducted a field experiment involving 601 parents of children aged 3 to 12 in Lyon, France, to examine whether parents engage more in norm enforcement in the presence of their child, and whether the nature of punishment changes in the presence of the child. The study found that parents do engage more in norm enforcement in the presence of their child, and tend to use more indirect punishment when their child is present. This study highlights the role that parents play in transmitting social norms to their children. The replication analysis was successful, with the results of the original study being robust to changes in the model specification. | |
| https://ideas.repec.org/p/zbw/i4rdps/107.html | This study pushes our understanding of research reliability by reproducing and replicating claims from 110 papers in leading economic and political science journals. The analysis involves computational reproducibility checks and robustness assessments. It reveals several patterns. First, we uncover a high rate of fully computationally reproducible results (over 85%). Second, excluding minor issues like missing packages or broken pathways, we uncover coding errors for about 25% of studies, with some studies containing multiple errors. Third, we test the robustness of the results to 5,511 re-analyses. We find a robustness reproducibility of about 70%. Robustness reproducibility rates are relatively higher for re-analyses that introduce new data and lower for re-analyses that change the sample or the definition of the dependent variable. Fourth, 52% of re-analysis effect size estimates are smaller than the original published estimates and the average statistical significance of a re-analysis is 77% of the original. Lastly, we rely on six teams of researchers working independently to answer eight additional research questions on the determinants of robustness reproducibility. Most teams find a negative relationship between replicators' experience and reproducibility, while finding no relationship between reproducibility and the provision of intermediate or even raw data combined with the necessary cleaning codes. | |
| https://ideas.repec.org/p/zbw/i4rdps/108.html | This comment revisits the analysis in Christensen and Timmins (2022). We identify two critical errors used in the original analysis, one with the data and the other with coding. When either error is corrected several major results in the paper change, either in statistical significance or in effect size. The data error is a result of including fixed effects for the string variable 'city'. The raw variable is case sensitive and has many spelling mistakes. The coding error involves assigning a value of zero for the variable "of color" to both individuals identified as 'white' and as 'other' in the raw data. The level of clustering in the paper is also arguably too fine. Many of the results are not robust to clustering at the city level, as opposed to the subject pair level. In total, we affirm the authors' overarching claim of substantial and nuanced housing discrimination against racial minorities generally, and African Americans in particular; however, the effect sizes and significance are generally (although not always) smaller than the original authors findings. Additionally, there are several instances where the effects of discrimination on African Americans are no longer statistically significant but the effect of discrimination on Hispanics becomes significant. | |
| https://ideas.repec.org/p/zbw/i4rdps/109.html | We thank the authors of this report for their careful re-analysis of our paper,"Sorting or Steering: The Effects of Housing Discrimination on Neighborhood Choice" and the Institute for Replication for supporting this work. In the process of replicating the results published in Christensen and Timmins (2022), Chen et al. raise two concerns and provide an independent set of analyses to address the concerns. In particular, the authors report findings from two changes to the original analysis: (1) regressions that make use of a variant of the 'city' variable that is used throughout the paper as a control, and (2) dropping observations for testers identified as 'other' (not white, LatinX, or Asian). The authors find that several of the coefficients and standard errors reported in the study are sensitive to these changes. They conclude that while significance and magnitude are affected in certain instances, their re-analysis affirms the paper's primary finding of substantial and nuanced housing racial discrimination. This document provides responses to the concerns discussed by Chen et al., additional analysis that addresses the concerns, and a discussion of the implications for the interpretation of findings in Christensen and Timmins (2022). (...) | |
| https://ideas.repec.org/p/zbw/i4rdps/110.html | Ambuehl et al. (2022) explore ways to evaluate interventions designed to enhance decision-making quality when individuals misjudge the outcomes of their choices. The authors propose a novel outcome metric that can distinguish between interventions better than conventional metrics such as financial literacy and directional behavioral responses. The proposed metric, which transforms price-metric bias into interpretable welfare loss measures, can be applied to evaluate various training programs on financial products. Table 4 of the paper reports the authors' significant main point estimates at the 1% level. In this replication exercise, we first replicate the main findings of the original paper. Then, we modify the clustering method by using k-means with demographic variables as inputs, then we re-calculate standard errors with jackknife estimators. Finally, we include subjects who were excluded by the authors due to multiple switching in the multiple price lists. We find that all of these replications result in robust findings. Additionally, we successfully replicate Figure 4 from the paper. Notably, this replication demonstrates the insensitivity of the results to the choice of distance metric. | |
| https://ideas.repec.org/p/zbw/i4rdps/111.html | Wu et al. (2023) estimate the effect of classroom seating arrangements in China using a randomized control trial with two treatment schemes. The first treatment scheme involves seating high and low achieving students together, and the second treatment involves this same seating arrangement with financial incentives for the high-achieving students, if their deskmates' test scores improved. All statistically significant impacts come from the incentivized treatment scheme. Wu et al. (2023) find that low-achieving students sitting next to incentivized high-achieving students perform 0.24 SD (p-value=0.018) better on math exams. In addition, being assigned to the incentive treatment scheme increased extraversion and agreeableness for low and high achieving students. Lastly, they do not find much evidence of peer effects on test scores nor personality traits. This study is computationally reproducible using their provided replication package. We ran their code using Stata 14, 17, and 18. After running their replication package, we further investigated Tables 2-5. The main conclusions are generally robust to various coding decisions. Notably, in investigating the peer effects, when we change the specification to also control for the difference in baseline scores between the student and their deskmate, we find that the more dissimilar deskmates are at baseline, the bigger the peer effects. | |
| https://ideas.repec.org/p/zbw/i4rdps/112.html | Zaklan (2023) examines the coasean independence property in the EUETS. To test this property, the author studies whether emissions are independent from the free allowance allocation. Some allowances were given for free to all EU Member States until 2012. From 2013 allowances were fully auctioned, apart in 10 countries that were granted an exception to continue to give free allowances to their firms. Treated firms are firms located in countries that do not receive free allowances anymore. Control firms are firms located in countries that continue to receive free allowances. The main analysis is led at the firm level using annual data from 2009 to 2017. Two way fixed effects estimators are combined with 1 to 1 matching to estimate the impact of the treatment on firms' emissions. The main claim is that the independence property holds overall and on large emitters. Moreover, there is suggestive evidence that the independence property does not hold for small emitters. The study is reproducible. The STATA code runs smoothly and enough information is available to reproduce the main results using the R software. We apply different robustness checks on: the matching strategy, the specification, the level of clustering, the definition of the treatment and the definition of the cutoff that differentiates small and large emitters. We generally align with the author's assertion that the independence property is not rejected both overall and for large emitters. However, in most instances, we do not confirm the suggestive evidence that the independence property is rejected for small emitters. Moreover, the change in the definition of the treated firms is a robustness check to be considered separately as it leads to sign reversal in most regressions. | |
| https://ideas.repec.org/p/zbw/i4rdps/113.html | Greenstone et al. examine the effect of the introduction of automatic air pollution monitoring on the reporting of local air pollution in China. Using 654 regression discontinuity designs (RDDs) based on city-level variation in the day that monitoring was automated, they find an immediate and lasting increase of 35 percent in reported PM10 concentrations post-automation. Moreover, they find that automation's introduction increases online searches for face masks and air filters by 200 percent and 28 percent, respectively, using an RDD. Results are consistent when using an event study design. First, we were able to computationally replicate the results. Second, we find that results are robust to more flexible specifications of the weather variables, to re-constructed weather variables using the same matching procedure as the authors (i.e., closest station) and meteorological data with additional weather stations, to alternative construction of the weather variables using an inverse distance weighted approach of the surrounding weather stations, and to more flexible choices of fixed effects (up to the city level). Finally, we find limited evidence of discontinuity in objective measures of ground pollution (i.e., AOD) for a sub-sample using alternative weather variables. The estimate, however, is economically insignificant. Moreover, no discontinuity is observed in the full sample. Therefore, we believe this result does not invalidate the original study's findings. | |
| https://ideas.repec.org/p/zbw/i4rdps/114.html | In their paper, Sampson (2023) introduces a theoretical framework and conducts empirical testing to elucidate the impact of gaps in countries' innovative efficiencies on income, wages, and trade dynamics. We successfully replicate the paper's findings by running the provided codes, and confirm the absence of any coding errors in the process. We also provide an extensive battery of robustness checks, which confirms the resilience of their results. We then scrutinize two key aspects of their study: the choice of developing countries and the innovation measure employed. The outcomes of this refined analysis partly temper the original paper's message of technology gaps driving inequality, underscoring the need for additional research in this domain. | |
| https://ideas.repec.org/p/zbw/i4rdps/115.html | Bai et al. (2023) examine the impact of individual networks on state building, focusing on the role of the leader Zeng Guofan during the Taiping Revolution in China between 1850 and 1864. In their main results, the authors demonstrate that being connected to Zeng increases the number of fatalities during the war after his assumption of power, with point estimates being significant at the 1% or 5% level. They also find a positive and significant effect of connections to Zeng among Hunan people on the number of national-level office positions, with point estimates significant at the 1% level. First, we reproduce the paper's main findings and identify minor inaccuracies in the codes that need fixing for the proper reproduction of some tables. However, these issues do not significantly impact the overall results. Second, we conduct additional checks and argue that the results are robust to variations in the number of fixed effects but highly dependent on the choice of econometric specification. We employ alternative models more suitable for data with a substantial number of zeros, revealing a decrease in the magnitude and significance of the estimates. Last, we perform spatial robustness checks, confirming the absence of spatial correlation between Hunan county and its neighboring regions, as suggested by the authors. | |
| https://ideas.repec.org/p/zbw/i4rdps/116.html | Giuliano and Nunn (2021), GN henceforth, provide econometric evidence that ancestral climatic variability is negatively associated with the current importance of tradition using a variety of data sources. This replication focuses on the results that use individual-level data and identifies major discrepancies between several econometric specifications described in the article and their corresponding code. We are able to correct most of these mistakes by realigning the code with the text. Once corrections are implemented, we obtain almost invariably a smaller and non-significant coefficient for climatic variability. | |
| https://ideas.repec.org/p/zbw/i4rdps/117.html | This note addresses the questions, concerns, and issues raised in "Understanding cultural persistence and change: a replication of Giuliano and Nunn (2021)." In terms of replicability, all of the tables in Giuliano and Nunn (2021) are correct, and the replication files match the output reported in the tables. In their note, the authors suggest alternative, more-restricted samples (e.g., omitting observations: under five years of age, under 16 years of age, living in rural locations, first or second-generation immigrants, with unmarried spouses, from specific ancestral groups, from the 1930 Census, etc.) and also less-restrictive samples (e.g., including grandchildren in analyses of parent-to-child cultural transmission for households that comprise three generations). We re-explain the logic of our baseline samples and why these samples are the most natural, as well as discuss the issues, complications, and incorrect reasoning associated with the authors' suggested alternatives. We also show, reproducing all relevant tables in full for each alternative raised, that our conclusions do not depend on these decisions. | |
| https://ideas.repec.org/p/zbw/i4rdps/118.html | Alan et al. (2023) carry out a field experiment where they randomly allocate 20 corporations in Turkey to a treatment group or a control group. White-collar employees at the headquarters of the corporations are invited to participate in a training program to improve the workplace environment. They report that the program reduces separation (workers quitting) and improves prosocial behavior, workplace quality and support networks. We test the robustness reproducibility of these results, focusing on the results reported in Table 8 of the original paper. We first successfully reproduce the results in Table 8 computationally based on the posted code and data, and we then carry out five robustness tests. We do not find robust support for an effect of the treatment on any of the four primary outcome variables (separation, prosocial behavior, workplace quality and support networks). The relative effect size of the robustness tests averaged across the primary hypotheses is 0.62, suggesting some inflation in the original effect sizes. The effects reported in the paper are driven by the additional employees added to the sample about one year after the initial baseline data collection and after the randomization of firms to treatment and control (and this sample is not balanced on observables across the treatment and control group). Not having access to the raw data limited the possible robustness tests. | |
| https://ideas.repec.org/p/zbw/i4rdps/119.html | Schwardmann et al. (2022) provide evidence from real-world debating competitions, that being randomly assigned to, and arguing for a given motion, increases one's own beliefs in the merit of the motion, and increases beliefs that factual statements in support of the motion, are correct. We conduct a robustness replication, focused on three main tests: i) Are results robust to the inclusion of controls for baseline beliefs via a differencesin-differences specification? ii) As error terms are plausibly correlated across outcome variables, are results robust to addressing this dependence through seemingly unrelated regression? iii) Whether results are robust to inclusion of team-level fixed effects? All findings of the paper are robust to these tests, and to a suite of other robustness exercises. We close our comment with a discussion of possible extensions which indicate potential heterogeneity in self-persuasion by gender, and by side of the debate. | |
| https://ideas.repec.org/p/zbw/i4rdps/120.html | We replicate the primary results from Ahlquist and Downey (2023, AD), who examine the effects of Chinese import competition on both industryand state-level unionization in the US. We are able to directly replicate the main results in AD Tables 1 and 2. We consider two main extensions. First, we consider a version of the industry-level analysis that uses log union share instead of the level. We again find a significant negative effect on union share, although the effect on log union share explains a larger fraction of the total drop between 1990 and 2014. Second, for the state-level results, we segment the manufacturing employment share into unionized and nonunionized manufacturing. We find that at the state level, the impacts of import exposure are concentrated entirely in non-union manufacturing. The estimated impact on union manufacturing employment is actually positive, but small and statistically insignificant. This is contrast with the results at the industry level where the effects are negative for both union and non-union manufacturing and larger in magnitude for union manufacturing. | |
| https://ideas.repec.org/p/zbw/i4rdps/121.html | Gendron-Carrier et al. (2022) studies the effect of subway openings on urban air pollution. The authors find a null average effect, but a negative effect in cities with high initial pollution. In this comment, I perform several robustness checks on the negative effect for high-pollution cities, and repeat the main analyses for low-pollution cities. I show that the main finding for high-pollution cities is robust, and find mixed results for low-pollution cities. I implement an alternative back-of-the-envelope calculation for the effect of subway openings on infant mortality, and find a smaller number of averted deaths. | |
| https://ideas.repec.org/p/zbw/i4rdps/122.html | Barro et al. (2022) investigate the quantity of safe assets held in the cross-section of developed countries and find that the average safe-asset ratio (ratio of safe assets to total assets) was 37% in 2015 and has remained relatively stable over time. They also document a crowding-out coefficient for private bonds relative to public bonds of around −0.5. In the second part of the analysis, they simulate a heterogeneous agent model with rare disasters and risk aversion to match the empirical findings. This report seeks to reproduce and confirm their results. Overall, we were largely able to replicate their findings and propose a few robustness checks. Apart from two regression outputs for which the signs and significance do not change, our results are very close to those of the original paper. Alternative models and estimators do not change the signs or significance levels. A more systematic approach to the parameter values in the simulations also points towards solid conclusions. | |
| https://ideas.repec.org/p/zbw/i4rdps/123.html | The so-called credibility revolution dominates empirical economics, with its promise of causal identification to improve scientific knowledge and ultimately policy. By examining the case of rural electrification in the Global South, this opinion paper exposes the limits of this evidencebased policy paradigm. The electrification literature boasts many studies using the credibility revolution toolkit, but at the same time several systematic reviews demonstrate that the evidence is divided between very positive and muted effects. This bifurcation presents a challenge to the science-policy interface, where policymakers, lacking the resources to sift through the evidence, may be drawn to the results that serve their (agency's) interests. The interpretation is furthermore complicated by unresolved methodological debates circling around external validity as well as selective reporting and publication decisions. These features, we argue, are not particular to the electrification literature but inherent to the credibility revolution toolkit. | |
| https://ideas.repec.org/p/zbw/i4rdps/124.html | We estimate the robustness reproducibility of key results from 17 non-experimental AER papers published in 2013 (8 papers) and 2022/23 (9 papers). We find that many of the results are not robust, with no improvement over time. The fraction of significant robustness tests (p﹤0.05) varies between 17% and 88% across the papers with a mean of 46%. The mean relative t/z-value of the robustness tests varies between 35% and 87% with a mean of 63%, suggesting selective reporting of analytical specifications that exaggerate statistical significance. A sample of economists (n=359) overestimates robustness reproducibility, but predictions are correlated with observed reproducibility. | |
| https://ideas.repec.org/p/zbw/i4rdps/125.html | Equivalence testing methods can provide statistically significant evidence that relationships are practically equal to zero. I demonstrate their necessity in a systematic reproduction of estimates defending 135 null claims made in 81 articles from top economics journals. 37-63% of these estimates cannot be significantly bounded beneath benchmark effect sizes. Though prediction platform data reveals that researchers find these equivalence testing 'failure rates' to be unacceptable, researchers actually expect unacceptably high failure rates, accurately predicting that failure rates exceed acceptable thresholds by around 23 percentage points. To obtain failure rates that researchers deem acceptable, one must contend that nearly half of published effect sizes in economics are practically equivalent to zero. Because such a claim is ludicrous, Type II error rates are likely quite high throughout economics. This paper provides economists with empirical justification, guidelines, and commands in Stata and R for conducting credible equivalence testing in future research. | |
| https://ideas.repec.org/p/zbw/i4rdps/126.html | Schulz (2022) shows how weak kin networks contributed to the rise of participatory institutions and how the medieval Catholic Church marriage regulations prohibitions contributed to the process by destroying European clan-based kin networks. Three pieces of evidence construct the argument. First, a cross-country level analysis shows that countries with cousin-term differentiation score between 2.83 and 7.66 units less in modern democracy than non-differentiating countries. The point estimates are statistically significant at the 5% level using Conley SEs either at the genetic distance or geographical distance level. Second, a historical analysis shows that one additional century of exposure to the Western Church increased the probability of a city being a commune by 12.2 and is statistically significant at the 1% level using Conley SEs with distance cutoffs of 500km or 2,500 km. Third, a 20th century analysis of voter turnover and kin network within European countries shows that doubling cousin marriage rate decreases the probability to vote by about 1.8 percentage points. Following an epidemiological approach that links the kin-network of migrant mothers country of origin to the second-generation migrant's political participation in Europe, Schulz (2022) shows that cousin-term differentiation in the country of origin of the second-generation migrant mother reduces the probability of voting. The above results are all computationally reproducible. We only identify two minor coding errors: the SE in reported in Table 3 correspond to SE clustered at the city level rather than Conley SE, and the sample size in Table 5 is incorrect. None of the errors affects the point estimates or their statistical significance. We also provide the missing code for the two figures in the paper. For the historical analysis, we conduct a robustness check on alternative sample of cities. The magnitude of the coefficients exhibits a very small variation and statistical significance of the results remains unchanged. | |
| https://ideas.repec.org/p/zbw/i4rdps/127.html | We reproduce Shoub, Kelsey, Katelyn E. Stauffer, and Miyeon Song (May 2021). "Do Female Officers Police Differently? Evidence from Traffic Stops," with alternative specifications and interpretation of the results. While our reproduction confirms that female police officers are less likely to search drivers than male officers and female officers are more likely to find contraband upon a search, we re-evaluate the authors' claims on the equality of effectiveness between male and female officers and find that female officers in the dataset confiscated less contraband than male officers. | |
| https://ideas.repec.org/p/zbw/i4rdps/128.html | Moya Chin's (2023) paper argues that politicians in two-round majoritarian systems have to appeal more broadly than those in single-round elections. The author uses data for mayoral elections in Brazil. The key findings of the paper conclude that of two-round systems (1) fostering inclusiveness, (2) resulting in higher levels and wider distribution of public goods, and (3) leading to better immediate societal outcomes in terms of drop-out and elementary literacy rates. The author uses regression discontinuity design to test her hypotheses. We test computational reproducibility and successfully duplicate the key results of the study. We also test for result replicability by modifying the data sample used by Chin (2023) using the same method. In nearly all cases, we find that our results are very close (in terms of direction of effect, magnitude, and statistical significance) to those obtained by the original author with only some relationships losing statistical significance. We reproduce and then replicate all the three key empirical results obtained by the author, meaning that there is an effect on inclusiveness, distribution of public goods, and more immediate societal outcomes (although, our study does not find a statistically significant effect of a two-voter system on elementary literacy rates). | |
| https://ideas.repec.org/p/zbw/i4rdps/129.html | Morris and Shoub (2024) study whether fatal police shootings mobilize voter participation in presidential elections. They use a discontinuity-in-time design to causally estimate the effect of a police killing on turnout, comparing the voter participation of communities near a killing before and after election day. Morris and Shoub (2024) find that police killings spurred increased turnout, especially in Black communities, where the killing trended on Google, where the community was plurality Black, and where the victim's race was Black. They find that the local average treatment effect on participation within a quarter-mile radius of a police killing is upwards of 7 percentage points and statistically significant at the 95% level of confidence. We encounter difficulties when attempting to reproduce the analysis, but are able to replicate the main results using similar data. In fact, we find the effect of a proximate police killing on participation to be upwards of 8 percentage points. | |
| https://ideas.repec.org/p/zbw/i4rdps/130.html | Naidu and Yuchtman (2013) find that labor demand shocks in 19th-century Britain had an impact on master and servant prosecutions, as breaking an employee contract was a criminal offense until 1875. We first reproduce all regression tables in Naidu and Yuchtman (2013) and then test for robustness by using a triple difference where we compare the impact of labor demand shocks on master and servant prosecutions relative to other prosecutions, changing the functional form of key variables, including region*year interactive fixed effects, and conducting influential analysis. We find that the results are sensitive to the triple difference specification and to region*year FEs, and otherwise robust. Overall, we find the results are robust in 50% of the checks we ran, and the t/z scores were on average 74% as large as the original study. | |
| https://ideas.repec.org/p/zbw/i4rdps/131.html | Berger, Easterly, Nunn and Satyanath (2013) find that increased US political influence, arising from Cold War interventions, was used to create a larger export market for American products. They find that after CIA interventions, US imports increased dramatically, and the authors rule out other explanations. We first reproduce all regression tables in Berger et al. (2013), and then test for robustness by controlling for imports from other NATO countries and various forms of US aid, sanctions, by multi-way clustering the errors, and by conducting influential analysis. We find that the impact of CIA interventions on US exports is sensitive to additional controls and omitting outliers, although adding in region*year interactive fixed effects tends to strengthen the results. Overall, we find that the paper's original results are robust with a coefficient in the same direction and significant at 5% in 17% of the robustness checks we ran (although 58% were significant at 10%). We find t/z scores 58% as large as the original study on average. | |
| https://ideas.repec.org/p/zbw/i4rdps/132.html | Cloyne (2013) constructs a novel dataset documenting fiscal tax shocks in the United Kingdom using the narrative approach developed by Romer and Romer (2010), and estimates the impact of tax changes on GDP. He finds that a tax cut of one percent of GDP causes a 0.6 percent increase in output in the initial quarter of the policy, rising to a peak of 2.5 percent over three years. We first reproduce all of the VAR tables and figures in the original paper, and then test for robustness through a number of changes to the baseline regression model, particularly: changes in lag structure, changes in the control set, alternative estimation procedures, and excluding influential observations. In 60% of robustness the impact effect is significant at the 95% level, with a mean estimated coefficient of 0.63, while in 70% of robustness tests the peak response remains significant at the 95% level, with a mean peak response of 2.27. | |
| https://ideas.repec.org/p/zbw/i4rdps/133.html | Pop-Eleches and Urquiola (2013) apply a regression discontinuity to the Romanian secondary school system, and notably find that (a) students who go to a better school get higher scores on an exam used for university admission, (b) parents of students who get into a better school help their kids less with homework, and (c) kids who go to a slightly better school report more negative interactions with peers. We first reproduce all regression tables in Pop-Eleches and Urquiola (2013), and then test for robustness by unstacking the data, multi-way clustering, altering the cutoffs, altering control variables, and conducting influential analysis. Overall, we find the results for finding (a), (b), and (c) are robust in 100%, 42%, and 60% of the robustness checks we ran, and the t/z scores were on average 93%, 69%, and 92% as large as the original study. | |
| https://ideas.repec.org/p/zbw/i4rdps/134.html | Aghion, Van Reenen and Zingales (2013) find that institutional ownership causes an increase in innovation as measured by citation-weighted patent counts. To identify a causal effect, they use membership in the S&P 500 as an instrument for institutional ownership in a panel regression. We first replicate all regression tables in Aghion et al., and then test for robustness, mainly by adding in firm and sector*year fixed effects. We find that the positive relationship between institutional ownership and innovation is robust in 22% of robustness checks. On average, 2nd stage z-scores were just 42.7% of the original study. We find that when we include firm fixed effects, membership in the S&P 500 actually has a negative (though significant only at the 10% level) impact on institutional ownership (among non-indexed funds). Lastly, we find that the original control-function IV regression suffers from multi-collinearity, complicating inference. | |
| https://ideas.repec.org/p/zbw/i4rdps/135.html | We computationally reproduce the central findings in Mehmood (2022), which studied the effect of a 2010 reform in Pakistan replacing the presidential appointment of high-court judges with peer appointments. Mehmood leveraged judicial records interpreted and coded by lawyers in Pakistan at the levels of cases, districts, benches, and individual judges. We successfully execute all Stata code in the author's replication archive without any errors, then translate and execute that code in R, again finding no serious errors. Consequently, we reproduce the article's main findings from regressions in Tables 2-4. Additionally, we successfully reconstruct the primary treatment variables of these regressions, after corresponding with the author to clarify precisely how to do so. We then replicate the main findings from regressions in Tables 2-10. Finally, we identify several minor errors which left the article's findings intact. Overall, this report reveals no serious defects in Mehmood (2022). We publicly archive our replication code and a spreadsheet of our results. | |
| https://ideas.repec.org/p/zbw/i4rdps/136.html | Researchers utilizing regression discontinuity design (RDD) commonly test for running variable (RV) manipulation around a cutoff, but incorrectly assert that insignificant manipulation test statistics are evidence of negligible manipulation. I introduce simple frequentist equivalence testing procedures that can provide statistically significant evidence that RV manipulation around a cutoff is practically equal to zero. I then demonstrate the necessity of these procedures, leveraging replication data from 36 RDD publications to conduct 45 equivalence-based RV manipulation tests. Over 44% of RV density discontinuities at the cutoff cannot be significantly bounded beneath a 50% upward jump. Bounding equivalence-based manipulation test failure rates beneath 5% requires arguing that a 350% upward density jump is practically equal to zero. Meta-analytic estimates reveal that average RV manipulation around the cutoff is equivalent to a 26% upward density jump. These results imply that many published RDD estimates may be confounded by discontinuities in potential outcomes due to RV manipulation that remains undetectable by existing tests. I provide research guidelines and commands in Stata and R to help researchers conduct more credible equivalencebased manipulation testing in future RDD research. | |
| https://ideas.repec.org/p/zbw/i4rdps/137.html | This report inspects the reproducibility of a study by Dizon-Ross and Jayachandran (2023), which focused on differences in parents' spending on their daughters relative to sons on a large sample of 6,673 observations in 1,084 households in Uganda. The original study found that the willingness to pay (WTP) of fathers for different goods for their daughters was lower than for their sons. We were able to computationally reproduce all original results using the original data and code. To test for recreate reproducibility, we tried to reproduce the results of the main analyses using a new code and different software. We were not able to complete the reproduction without analyzing the original code and processed dataset. It was not clear from the manuscript nor the online appendix how the authors dealt with the multilevel structure of the data and how they controlled for different goods, which served as stimulus material. Because the raw data did not have clear labels and the replication package did not include a codebook, we were also unable to identify the variables needed for each analysis. However, after analyzing the original code, we were able to reproduce the original results in MPLUS. The missing code book and missing transcription of survey questions caused complications for investigating robustness reproducibility. Although the authors collected a large number of variables and provided them in the dataset, it was not possible to identify their meaning. Therefore, we were not able to conduct further analyses regarding the main findings of the study. Consequently, we only focused on multicollinearity checks and different constellations of the control variables reported in the paper within the robustness checks. Our analyses showed that the results of the study are robust in this respect. In addition, the missing code book and transcription of survey questions did not allow for direct replicability of the study. Conceptual replicability was not investigated. | |
| https://ideas.repec.org/p/zbw/i4rdps/138.html | Blair et al. (2023) examine the effect of UN peacekeeping on democratization in conflict-affected countries. They use fixed effects and instrumental variable estimators and find evidence that "UN missions with democracy promotion mandates are strongly positively correlated with the quality of democracy in host countries but that the magnitude of the relationship is larger for civilian than for uniformed personnel, stronger when peacekeepers engage rather than bypass host governments when implementing reforms, driven in particular by UN election administration and oversight, and more robust during periods of peace than during periods of civil war". Since the authors provide an impressive list of robustness checks, we focus on computational and robustness reproducibility. We replicate the findings using the Stata code provided in the replication material and reproduce all main analyses in R. We add year fixed effects to country fixed effects, cluster standard errors, use fixed and random panel regression estimators and ordered Beta regression estimators. We furthermore reproduce instrumental variable estimators with two different packages. We find that the original findings were reproducible and robust. | |
| https://ideas.repec.org/p/zbw/i4rdps/139.html | Esguerra, Vollmer and Wimmer (2023) examined respondents' desire to influence other people's choices via their own behaviour. They conducted a field study on German residents' registration for the COVID-19 vaccination, where 1,401 "Senders" made their registration decision, which could then be shared with a peer before or after that peer's decision, providing an understanding of motives and social pressure on decisions. The authors found that individual influence motives increase a participant's likelihood to register for vaccination, but social pressure effects do not alter it. We reproduced the results using the original code and data. We tested the robustness of the primary analysis by (i) using a logistic regression model, (ii) limiting the analysis to participants who inform their partner of their decision, and (iii) changing the criteria by which participants are recorded as "verified registered". We found that these tests did not materially change the effect size estimates or the conclusions to be drawn from the analysis. We also tested the authors' sub-analysis by the level of trust in the vaccine. We found that an alternative cutoff for the high-trust group did not materially change the result. | |
| https://ideas.repec.org/p/zbw/i4rdps/140.html | This is a replication study of Cook et al.(2023), a paper that investigates the determinants of access to nondiscriminatory public accommodations for African-Americans before the 1964 Civil Rights Act. They utilize the Negro Motorist Green Books and World War II casualty data to examine the impact of demographic shifts caused by wartime casualties on the prevalence of nondiscriminatory establishments. Using a difference-in-differences approach, they show that a 10% increase in white casualties led to a 0.6% increase in nondiscriminatory businesses. Further, an instrumental variable strategy indicates that a 10% rise in the Black population share correlated with increased nondiscriminatory services. Our replication study shows that the difference-in-differences estimates remain stable even after excluding states with the highest average white World War II casualties or Southern states. However, the instrumental variable estimates become sensitive to the use of robust standard errors. The reproduction of the figures and tables of the paper is mostly accurate, with a minor discrepancy in Table 3 Panel A column 3, where the original coefficient is stated as 0.0191, and the replicated coefficient is found to be 0.0263. | |
| https://ideas.repec.org/p/zbw/i4rdps/141.html | Graduating economics PhDs face intense competition when seeking faculty or research positions at universities and research institutions. We examine the relationship between statistically significant results, arguably used as indicators of research quality in a competitive academic market, and academic hiring outcomes. We start by investigating the determinants of academic success by analyzing 604 job market papers (JMPs) from 2018-2019 to 2020- 2021. We then turn to the practice of p-hacking focusing on 150 empirical JMPs. We find evidence that marginally significant results in JMPs are associated with higher academic placement likelihoods. During the COVID-19 pandemic, a tighter job market strengthened this relationship without altering the p-hacking behavior of PhD candidates, suggesting that our results reflect a recruitment bias by academic employers. We also find evidence of publication bias, suggesting that recruiters may use statistical significance to gauge candidates' potential for future publications, thus influencing recruitment decisions. Overall, our findings provide insights into the dynamics of the academic job market and the factors influencing career trajectories in academia. | |
| https://ideas.repec.org/p/zbw/i4rdps/142.html | Banerjee, Duflo, and Sharma (BDS, 2021a) conduct a ten-year follow-up of a randomized transfer program in West Bengal. BDS find large effects on consumption, food security, income, and health. We conduct a replicability assessment. First, we successfully reproduce the results, thanks to a perfectly documented reproduction package. Results are robust across alternative specifications. We furthermore assess the paper's pre-specification diligence and the reporting in terms of external and construct validity. While the paper refers to a pre-registration, it lacks a pre-analysis plan. Assessing the validity of findings for other contexts is difficult absent necessary details about the exact treatment delivery. | |
| https://ideas.repec.org/p/zbw/i4rdps/143.html | Robustness reproductions and replicability discussions are on the rise in response to concerns about a potential credibility crisis in economics. This paper proposes a protocol to structure reproducibility and replicability assessments, with a focus on robustness. Starting with a computational reproduction upon data availability, the protocol encourages replicators to prespecify robustness tests, prior to implementing them. The protocol contains three different reporting tools to streamline the presentation of results. Beyond reproductions, our protocol assesses adherence to the pre-analysis plans in the replicated papers as well as external and construct validity. Our ambition is to put often controversial debates between replicators and replicated authors on a solid basis and contribute to an improved replication culture in economics. | |
| https://ideas.repec.org/p/zbw/i4rdps/144.html | Axbard and Deng (2024) exploit the rollout of new pollution monitors in China in 2015 in 177 medium-size cities to study the effect of air-quality monitors on enforcement actions by local governments and air quality. In their main difference-in-difference analysis, they identify the change in the probability of enforcement for firms that are close to versus further away from the monitor. They find that being within 10km of a monitor increases the probability that a firm receives any enforcement action by 0.0033 (standard error 0.00056) relative to a mean of 0.0046. Computationally, we successfully reproduce the main claims of the paper. We observe minor coding anomalies that do not have a material impact. We find that the main result on all enforcement is robust to all robustness checks: (1) randomization inference (2) alternative fixed effects and (3) multiple hypothesis testing. | |
| https://ideas.repec.org/p/zbw/i4rdps/145.html | Elisa Macchi (2023) investigates the impact of obesity on the perceived wealth of individuals using primary data collected in Kampala, Uganda. The study includes two complementary experiments: a beliefs experiment and a credit experiment. In the beliefs experiment, individuals assess the wealth of others based on weight-manipulated portraits, while in the credit experiment, loan officers evaluate creditworthiness using similar portraits. In this paper, we reproduce the author's results using the freely accessible replication package. Additionally, we test the robustness of the findings by (1) proposing different nutritional status categorizations, (2) applying alternative estimation strategies via ordered probit/logit models, (3) using different levels of clustering, and (4) excluding extreme values. Overall, our findings supports the main conclusions of the original paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/146.html | Kao et al. (2024) use phone-based survey experiments in Jordan, Tunisia and Morocco to test whether established theories about the effect of descriptive representation on perceived democratic legitimacy hold in the Middle East. They find that the presence of women in deliberative bodies legitimizes decision-making even in more socially conservative, less democratic societies. We blindly reproduced their study, and then extend their analysis with five additional robustness checks. We find that their analysis is reproducible and robust in several ways, although there were ambiguities in the original text which prolonged this process. Finally, we also extended their analysis by using iterative machine learning models to study heterogeneous treatment effects. We find that marital status as well as pre-treatment attitudes on related issues affect the response to the treatment. | |
| https://ideas.repec.org/p/zbw/i4rdps/147.html | Tappin, Berinsky, and Rand (2023) find that the effectiveness of persuasive messaging is not diminished by countervailing in-party leader cues, using a survey experiment fielded in the United States. In this robustness reproduction, we briefly summarize the original design and results before blindly reproducing the main results and conducting several additional robustness checks. We find that the original results are reproducible and robust to several additional checks. In so doing we contribute to the collaborative effort between the Institute for Replication (I4R) and Nature Human Behaviour to replicate recent findings published in the latter, and more broadly to advancing replication in political science. | |
| https://ideas.repec.org/p/zbw/i4rdps/148.html | Hjort and Poulsen (2019) frames the staggered arrival of submarine Internet cables on the shores of Africa circa 2010 as a difference-in-differences natural experiment. The paper finds positive impacts of broadband on individual- and firm-level employment and nighttime light emissions. These results largely are not robust to alternative ge-ocoding of survey locations, to correcting for a satellite changeover at end-2009, and to revisiting a definition of the treated zone that has no clear technological basis, is narrower than the spatial resolution of nearly all the data sources, and is empirically suboptimal as a representation of the geography of broadband. | |
| https://ideas.repec.org/p/zbw/i4rdps/149.html | When do populist radical-right parties (PRRP) foster the (descriptive) representation of women? In a recently published paper, Weeks et al. (2023) coin the concept of 'strategic descriptive representation'. When facing electoral struggles, PRRP would exploit the existing gender gap and strategically increase the descriptive representation of women to attract female votes and fare better in the election. Using data on 58 elections across 19 countries, the authors test their argument and find conclusive evidence supporting it. In this paper, we offer a replication of the study. First, we assess the numerical reproducibility of the published findings ('verification'). Second, we investigate the 'robustness' of the findings and evaluate the results under alternative model specifications. While our replication study identifies minor issues with the verification and some of themodel specifications, itmost importantly shows that the main results of the paper are driven by a single outlier. The paper's key finding is hence contingent on the inclusion of a single observation (French Front National in 2012), which is a questionable observation as it only elected two MPs, one of whom was a woman. Additionally, this woman's election was seemingly caused by a combination of idiosyncratic factors discussed in the study. Once the case is excluded from the analysis the key model parameter shrinks close to zero and loses its statistical significance. Accordingly, in light of our findings, there is no clear evidence supporting strategic descriptive representation and electoral pressures do not seem sufficient to encourage PRRP to increase their share of female representatives. Correcting this empirical finding has important implications for both understanding PRRP's electoral strategies and women's representation. | |
| https://ideas.repec.org/p/zbw/i4rdps/150.html | Guinaudeau and Jankowski reassess our recent study on the use of strategic descriptive representation among political parties in Europe. The authors successfully replicate the vast majority of our findings and perform a number of additional robustness checks. They claim that one of our key findings is sensitive to the inclusion of one observation (the Front National, FN, 2012), and that alternative measurement or modeling strategies return different results. In this response, we address each claim in turn. We apply influential case diagnostics to detect all influential cases in our multilevel models, so as not to arbitrarily delete one influential observation but not another. On removing all influential cases, our results remain substantially the same. More importantly, because we do not agree with arbitrarily dropping observations, our findings are robust to different handling techniques for influential cases in multilevel models which downweight influential cases. Further, and in line with our original mixed methods approach, we provide an additional influential case study of the use of strategic descriptive representation by the FN in 2012, which is supportive of our theory and quantitative evidence. Finally, we respond to questions about our measurement and modeling decisions by highlighting the theoretical framework and scholarly literature that informs these decisions, which is largely disregarded by GJ. | |
| https://ideas.repec.org/p/zbw/i4rdps/151.html | We conduct a computational replication of Atanasov et al. (2023). In total, our analysis covers three variations: we use the cleaned dataset provided in the replication package, we clean the original data ourselves, and finally we extend the dataset to encompass an additional three years of data using the webscraper provided by the authors. The additional data boosts the final observation count by approximately one-quarter. We find that the results are robust; the data in the replication package results in nearly the same estimates and an extension of the data and specifications reduces the effect size and statistical significance, but does not change the conclusions. We further conduct a wide range of robustness checks. While some estimates have smaller effect sizes and lower statistical significance, all results support the original findings. | |
| https://ideas.repec.org/p/zbw/i4rdps/152.html | We provide a reproduction and replication of Brutger (2024), which examines the effects of the University of California, Berkeley's Pipeline Initiative in Political Science (PIPS) program on five self-reported outcomes related to interest and preparation towards pursuing graduate school. We are able to reproduce the author's results but do note some minor coding challenges. Our additional replication analysis confirms that the study's original results are robust to different model specifications. In future analysis of PIPS, we suggest that the author address our suggestions regarding the wording of the survey questions, sample selection, and statistical power. Overall, we commend the author on a good study of an important topic. | |
| https://ideas.repec.org/p/zbw/i4rdps/153.html | Overall, the results of this study are replicable using the clean dataset provided by the authors. However, the process of transforming the raw data into this clean dataset is at times unclear, and certain essential codes are missing from the replication materials, which may hinder complete replication. Despite these challenges, the paper's findings remain robust to a small number of robustness checks that we conducted, providing confidence in the reliability and stability of the results. Moving forward, efforts to enhance transparency in data transformation steps and inclusion of all necessary codes would facilitate more seamless replication and validation of the study's outcomes. | |
| https://ideas.repec.org/p/zbw/i4rdps/154.html | Sanders et al. (2024) made the central claim that effects found in eight meta-analyses are "strong evidence" (P | |
| https://ideas.repec.org/p/zbw/i4rdps/155.html | Guo et al. (2023) examine the impact of early-life experiences of politicians on their policy implementations. They utilize differential exposure of Chinese county party secretaries to the Great Famine of 1959-1961 as a natural experiment and investigate the impact on their policy preferences, in particular fiscal expenditure on agriculture and social security. In their baseline analytical specification, the authors find that exposure to a one percentage point more severe famine led counties governed by these politicians to a 0.8% higher fiscal expenditure on agriculture and a 1.1% higher expenditure on social security subsidies. Their point estimates are statistically significant at the 1%, respective 5% levels depending on the included set of controls. First, we successfully computationally reproduce all quantitative claims, more precisely all tables and figures, of the paper, using the provided replication files. We uncover a minor coding error in a specification in a robustness check, though correcting it does actually strengthen the studies' main result, as well as a typo and rounding error in another robustness check. Additionally, the summary statistics and exploratory data analysis of the paper were also computationally reproduced using a different software package. Second, we directly replicate the results by systematically varying the sample size in two ways. One, we drop one individual control variable which increases the sample by retaining the observations that have missing values for that control variable, and two, we restrict the sample for the estimation of the impact on social security subsidies to only those observations that also report values for the agricultural fiscal subsidies. We find that retaining the observations with missing educational information values reduces the magnitude of all coefficients of interest (so interactions of famine severity with birth year indicators) with the impact on agricultural expenditure remaining statistically significant while the impact on social security subsidies is no longer statistically significant in three specifications and remains weakly significant in one. Similarly, the impact of estimating the impact on social security subsidies with the restricted sample of observations that have agricultural subsidy values also reduces the magnitude substantially and turns the coefficient statistically insignificant. | |
| https://ideas.repec.org/p/zbw/i4rdps/156.html | The study provides empirical evidence that a targeted policy can backfire because information signals affect non-targeted units. Specifically, the analysis of the policy aimed at regulating the harvesting of juvenile fish in Peru's Anchovy Fishery, by temporarily closing areas with high juvenile catch percentages, reveals an unintended increase of 48% in the overall seasonal juvenile catch percentage. This appears to be due to substantial spatial and temporal spillovers generated by the policy that reduces search costs for fishers. The study combines administrative micro-data used by the regulator to generate closures with biologically richer data from fishing firms. All results are easily computationally reproducible within a 5-hour time frame, except for the synthetic controls robustness check, which takes a considerable amount of time (appr. 64 hours) but works. We stress the robustness and reproducibility of the study by testing whether the analysis is robust to the use of different types of standard errors, and the findings appear unaffected. Overall, the full analysis and graphic outputs of the paper are reproducible using the publicly available complementary data and code from the AEJ website despite minor code interpretability challenges. | |
| https://ideas.repec.org/p/zbw/i4rdps/157.html | Funke, Schularick, and Trebesch (2023) investigate the impact of populist leaders on GDP growth in 60 countries. They build an original dataset identifying populist presidents and prime ministers from 1900 to 2020. They then examine changes in countries' GDP growth rates following a populist leader using various empirical methods. They find that 5-15 years after a populist leader, the GDP per capita in that country is lower. Focusing on the panel regression results (Table 2), which we replicate, the authors find a reduction in GDP growth rates of 0.8-1 percentage point per year, with p-values ranging from 0.000 to 0.023. We successfully computationally reproduce these estimates. Second, we recode the variable identifying populist leaders from the authors' source and examine the sensitivity of the estimates to changing the sample time period to include the "war" years of 1915-1945. We find that the results in our main change - using the extended time-period sample - are qualitatively similar to the original results, though with smaller and noisier point estimates. Specifically, the 5-year estimate in Table 2 column 3 changes from -0.97 (p-value 0.02) to -0.43 (p-value 0.2), and the 15-year estimate in Table 3 column 3 changes from -0.73 (p-value 0.01) to -0.53 (p-value 0.17). We then turn to sensitivity analysis regarding small differences in research choices about how to code the start of populist spells and which spells to include in the sample. We find the original results are highly robust to these changes. For example, in our Table 3 column 3, the estimated effect changes from the original -0.73 (p-value 0.01) to -0.75 (p-value | |
| https://ideas.repec.org/p/zbw/i4rdps/158.html | Moscona & Sastry (2023, Quarterly Journal of Economics) - henceforth MS23 - find that cropland values are significantly less damaged by extreme heat exposure (EHE) when crops are more exposed to technological innovation. However, MS23's 'innovation exposure' variable does not measure innovation, instead proxying innovation using a measure of crops' national heat exposure. A re-examination of MS23's replication data - which permits a close but inexact reproduction of MS23's published findings - shows that this proxy moderates EHE impacts for reasons unrelated to innovation. The proxy is practically identical to local EHE, so MS23's models examining interaction effects between their proxy and local EHE effectively interact local EHE with itself. I document extensive evidence that MS23's findings on 'innovation exposure' are simply artefacts of nonlinear impacts in local EHE, and uncover robustness issues for other key findings. I then construct direct measures of innovation exposure from MS23's crop variety and patenting data. Replacing MS23's proxy with these direct innovation measures decreases MS23's moderating effect estimates by at least 99.8% in standardized units; none of these new estimates are statistically significantly different from zero. Similar results arise from an instrumental variables strategy that instruments my direct innovation measures with MS23's heat proxy. These results cast doubt on the general capacity for market innovations to mitigate agricultural damage from climate change. | |
| https://ideas.repec.org/p/zbw/i4rdps/159.html | No abstract is available for this item. | |
| https://ideas.repec.org/p/zbw/i4rdps/160.html | Badinger and Schiman (2023) use a narrative high-frequency analysis of news and financial markets to develop a small set of restrictions on the structural shocks of a VAR of the Euro area. Their approach does not uniquely identify a structural representation, so their results are based on the distribution of a randomly generated set of parameters that satisfies the restrictions. Their method generates impulse responses that are consistent with macroeconomic theory, but that differ from previous studies that use alternative highfrequency identification strategies. They use this difference to argue that, unlike previous studies, their method is able to separate monetary policy surprises from confounding central bank information shocks - an important new contribution to the literature. I conducted two replication studies of their work on behalf of the Institute for Replication (I4R). First, I used the code provided in their replication package to replicate all of their main results, aside from the small variations expected in replicating a Monte Carlo study. Second, I attempted to use their original data to recreate their results using a different statistical software (Eviews 13). I was unable to replicate their results for two reasons. First, my program is unable to exactly replicate the custom prior they used to generate their reduced-form results. Second, my models routinely generated nonstationary VARs that nevertheless satisfied the identification restrictions. This differs from the author's results, but is not surprising given the ambiguous stationarity of the underlying macro variables. | |
| https://ideas.repec.org/p/zbw/i4rdps/161.html | This report evaluates the computational reproducibility and analytical robustness of Exley and Kessler's (2024) investigation into "motivated errors," which suggests that individuals may rationalize selfish behavior by attributing their errors to confusion. Using the original data and code, we could regenerate all results reported in the manuscript and online appendices with full precision. However, our re-analysis identified significant limitations, including insufficiently annotated code, ambiguous variable naming, and the absence of essential participant-level data, which obstruct comprehensive robustness checks. These challenges underscore the importance of best practices in data and code sharing to enhance the transparency and credibility of economic research. Our reflection not only contributes to discussions on empirical rigor but also advocates for improved standards in sharing scholarly resources. | |
| https://ideas.repec.org/p/zbw/i4rdps/162.html | Liu, Shamdasani, and Taraz (2023) examine, among other things, the effect of temperature and precipitation in India during the growing season (June-February) on the agricultural and non-agricultural worker share in Indian districts in the medium run (decades) and in the long run (30 years). In their preferred analytical specification, they find that a 1°C increase in temperature leads to a 17% increase in the agricultural labor share (corresponding to a logarithmic coefficient of 0.157) and an 8.2% decrease in the nonagricultural labor share (corresponding to a logarithmic coefficient of -0.086) in the medium term. The effects are significant at the 5% and 1% level (5% and 5% with Conley standard errors), respectively. For precipitation, they do not find effects significantly different from 0. First, we rerun the code with neither execution nor coding errors. Second, we reproduced the main tables in a different software language and did find the same results. Lastly, we tested the robustness by weighting the districts with the population size. Here, we find that the effects in the medium run become significantly smaller and not statistically significant anymore. However, the effects in the long run stay roughly the same. By splitting the sample in low and highly populated districts we find that the medium run effects are only present in low populated districts with no effect on highly populated ones. | |
| https://ideas.repec.org/p/zbw/i4rdps/163.html | No abstract is available for this item. | |
| https://ideas.repec.org/p/zbw/i4rdps/164.html | Masatlioglu et al. (2023) show a strong intrinsic preference for positively skewed information over negatively skewed information through three laboratory and two field experiments. Using the provided replication package, we successfully computationally reproduce these results. Additionally, we test the robustness of the findings by employing alternative statistical tests, which confirmed the original conclusions. We also make minor comments about the paper that may be useful to researchers building on Masatlioglu et al. (2023)'s work. | |
| https://ideas.repec.org/p/zbw/i4rdps/165.html | Zárate, Quezada-Llanes, and Armenta (2024) examine whether Hispanic and Anglo voters change voting behavior if a political candidate speaks to them in Spanish and whether it matters how proficient in Spanish the candidate sounds. They find that Hispanic support for the Anglo and Hispanic candidates is higher in the native-like Spanish condition compared with the English-only condition. Relative to the English condition, non-native Spanish does not increase support for the Anglo candidate, but it decreases support for the Hispanic candidate. They find mixed effects for the Anglo participants. We conducted computational and robustness reproductions. First, we successfully computationally reproduced all the main results. Second, we constrained the analysis to participants that passed the manipulation checks (this was not done by the authors). We find the same results hold. Notice we constrained ourselves to study 1, which is based on data by Prolific. | |
| https://ideas.repec.org/p/zbw/i4rdps/166.html | Grenet et al. (2022) examine the effect of quasi-random early offers on the probability of accepting an offer in the Germany's university admission process. The authors demonstrate that the early offers lead to a statistically significant increase in the likelihood of accepting an offer. Their preferred explanation for this early-offer effect is that students gradually discover their preferences over time, a hypothesis also supported by survey data. First, we successfully computationally reproduce the main claims of the paper in STATA. Second, we reproduce the results in R, including producing the analysis data from scratch. Third, we test the robustness of the results by checking the identification assumption with new data, applying different standard errors for the main estimation, and using an alternative empirical model specification. | |
| https://ideas.repec.org/p/zbw/i4rdps/167.html | Metcalf and Stock (2023) find that an increase in carbon tax has a weakly positive effect on output and employment, along with a negative effect on C02 emissions over a 6-year horizon. The paper identifies a carbon tax shock and uses it to quantify the effect of a permanent unexpected increase in the carbon tax rate. The effect of this increase is obtained using a policy counterfactual exercise based on dynamic effects estimated using panel local projections. We use the authors' own Stata replication package to reproduce the main results of the paper and carry out additional robustness tests. We also conduct these empirical analyses using popular open-source econometric libraries in R. We compare the original permanent carbon tax increase policy counterfactual impulse responses to standard one-time carbon tax shock impulse responses. The justification for this robustness test is that carbon tax rate changes are persistent, so that a transitory shock effectively mimics a permanent shock. We find that (1) the authors' replication package successfully reproduces the results of the paper; (2) alternative local projection specifications and policy counterfactuals largely exhibit the same qualitative properties as the main results of the paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/168.html | Colantone et al. (2024a) use survey data to examine how a major ban on combustion engine cars in Milan, Italy affected voting behavior of treated car owners. The authors find that the ban raised the probability of voting for the populist right wing Lega party by 15.4-18.3 percentage points, a 70-80% increase relative to the average car owner. The estimate is statistically significant at the 5% level. These effects are driven by dissatisfaction with money losses rather than more antagonistic attitudes towards environmental protection. In this report, we inspect the data and replication package of the paper with two sets of exercises. First, we successfully computationally reproduce all the main results of the paper. Second, we test the robustness of the authors' main results by exploring different definitions of control variables, variations in the regression specifications, and alternative econometric models and research designs. Our results generally confirm the authors' conclusions, but are smaller in magnitude and suggest that the ATTs in the original paper might have been overstated. | |
| https://ideas.repec.org/p/zbw/i4rdps/169.html | In a systematic review and meta-analysis, Wang et al. (2023) estimate the association of social isolation or loneliness with mortality outcomes. In their preferred analytical specification, the authors find an increased risk of mortality from all causes for both exposures: a pooled effect size for social isolation of 1.32; 95% confidence interval 1.26 to 1.39; P ﹤ 0.001; a pooled effect size for loneliness of 1.14; 95% CI, 1.08 to 1.20; P ﹤ 0.001. We computationally reproduce these results by extracting data from the article PDF and re-implementing the original analysis, and we compare the extracted data with data that we later received from the authors. Second, we assess the robustness of the main results against plausible alternative analytic choices in three areas: estimation of the random effects models, heterogeneity, and adjustment for publication bias. We find that the main claims of the original authors are robust, although the majority of methods to adjust for publication bias suggest somewhat smaller effects than the original estimates. | |
| https://ideas.repec.org/p/zbw/i4rdps/170.html | Gill and Prowse (2023) study response times using a repeated p-beauty contest (p = 0.7). Looking at between-subject variation in response times, they found that subjects who think for longer, on average, win more rounds and choose lower numbers. When comparing average response times and level-k behavior, they observed that higher k types think for longer. In general, we are able to reproduce their findings, despite a minor coding error and some missing information. We test the robustness of their results by comparing average and median response times and choices, separating the sample into quick and slow respondents, including additional controls, and different estimation parameters. We do not find differences between choices between slow and quick respondents, somewhat contradicting their conclusions. Moreover, most subjects played faster as the game was repeated. The remaining results are robust to the inclusion of cohort effects and different parameter specifications in their regressions. | |
| https://ideas.repec.org/p/zbw/i4rdps/171.html | Furnas & LaPira (2024) examine the extent to which unelected political elites in the United States misperceive nationwide public opinion on salient policy issues. They find that unelected elites consistently misperceive public opinion in the direction of their own opinions on average. They estimate that unelected elites that strongly oppose (support) a policy perceive public opinion in favor of that policy to be about ten percentage points below (above) the actual level of public support. These results persist when considering the ideological underpinnings of each issue, elite partisanship, the relevance of partisanship in profession, elite professional community, and elite trust in partisan information sources. We attempt to reproduce these findings through three methods. First, we run the data analysis code as it was provided by the authors and successfully reproduced the paper's main findings. Second, we test the robustness of the results by generating estimates of public opinion from a separate nationally representative survey administered at the same time as the authors' survey. Lastly, we run additional robustness tests by examining whether results hold under different model specifications. We find that the authors' results are largely robust to these additional analyses. | |
| https://ideas.repec.org/p/zbw/i4rdps/172.html | Taylor et al. (2023) explored the impact of identity cues on online behavior, employing a large-scale field experiment on a social news aggregation website. Findings reveal that identity cues significantly influence how individuals form opinions and engage with online content, accounting for 28% to 61% of variation in voting associated with commenters' production, reputation, and reciprocity. The results highlight the role of identity cues in perpetuating social content evaluation disparities and suggest anonymized content votes could enhance overall content quality on social platforms. In the replication analysis of this study, we utilized the provided script on the same data, which was provided by the paper's author following non-disclosure agreements. Further the robustness of the results was also tested after applying a mixed effects model instead of the linear probability model. Our replication confirmed the overall reproducibility of the results using the provided script, but there were notable changes in the estimates. In our analysis, the variation in individuals forming opinions and engaging with online content, as measured by voting associated with commenters' production, reputation, and reciprocity, ranged from 15% to 60% due to identity cues. This indicates that a few effects are somewhat smaller than in the original study. Moreover, when using our alternative analytic approach, the results remained generally robust, but there were exceptions. Specifically, the model assessing the impact of identity cues on individuals in voting associated with commenters' production yielded different results: We generally found stronger evidence in form of higher statistical significance for the claims of the authors. | |
| https://ideas.repec.org/p/zbw/i4rdps/173.html | We reproduced West (2024) "Formal designation of Brazilian indigenous lands linked to small but consistent reductions in deforestation," which investigates the impact of formally recognizing Indigenous Lands (ILs) on deforestation rates in Brazil from 1986 to 2021. The original study uses a quasi-experimental design, employing temporal and sectional matching methods to compare deforestation rates before and after IL designation, concluding an average reduction of -0.05% in deforestation. To verify these findings, we conducted three main tests: a logit analysis, the consideration of negative deforestation values in the Atlantic Forest, and the synthetic control method. The logit analysis assessed the relationship between IL designation and covariates like land size, elevation, slope, and proximity to urban centers, confirming that these factors significantly influence IL designation, consistent with the original study. We also examined the treatment of negative deforestation values in the Atlantic Forest, originally treated as zero. By retaining these values, we found no significant impact on the study's overall results, indicating that the original methodological choice did not affect the main conclusions. Finally, the synthetic control method was used to replicate the counterfactual analysis of IL-designated areas, demonstrating that these areas consistently exhibited lower deforestation rates compared to the synthetic control post-2011. These tests confirmed the original study's findings, demonstrating that the formal designation of ILs contributes to small but significant reductions in deforestation, supporting the effectiveness of ILs as a strategy for environmental conservation and indigenous rights protection. The reproducibility of these results reinforces the study's conclusions. | |
| https://ideas.repec.org/p/zbw/i4rdps/174.html | Carvalho et al. (2023) propose a theoretical framework that explains longrun inflation expectations' dynamic using short-run inflation surprises and beliefs about monetary policy. In an empirical exercise, they show that this concise framework predicts long-term inflation expectations well over long periods and across a multitude of countries. In this study we look at the reproducibility of the work and the robustness of the results across two dimensions - the strength of the empirical results and the robustness of the estimation methodology. Across the empirical dimension, we extend the model with data past the global pandemic and study the robustness of the results before 2020 as well as the strength of the conclusion after 2020. With respect to the methodological application, we utilise a different sampler to estimate the main non-linear specification. The original findings remain intact across both dimensions. | |
| https://ideas.repec.org/p/zbw/i4rdps/175.html | Zhang (2023) used an online, pre-registered, large-scale controlled experiment to test the effect of an endorsement of Joe Biden by the scientific journal Nature on several perceptual and behavioural outcomes. The main results of the paper were the following: the endorsement of Biden caused a large reduction in Trump supporters' trust in Nature and a considerably smaller reduction in their 'trust in US scientists'. The estimated effects are larger for individuals who, prior to the treatment, believed that Nature was unlikely to have endorsed a presidential candidate. The endorsement also made Trump supporters less likely to request COVID and vaccine related information from the endorsing journal. For Biden supporters, the respective estimated effects were generally positive, but small and insignificant. In his abstract, the author summarizes his key causal claim as follows: "political endorsement by scientific journals can undermine and polarize public confidence in the endorsing journals and the scientific community" (p.696). In this replication study, we computationally reproduced all results, with few and trivial exceptions. We then tested the robustness of those results that gave rise to Zhang's (2023) main causal claim. These tests include an alternative estimation method, an alternative way to capture support for the candidates, and a series of heterogeneity analyses by demographics. All test results support the author's findings but add interesting nuance. Some of our tests exploit variables from the raw data that were not included in the clean, published dataset, but the author willingly provided: a post-treatment 'manipulation check' that asked respondents to indicate the candidate that Nature actually endorsed, and data on requests for COVID related articles from other outlets besides Nature. We used these variables to conduct an Instrumental Variables (IV) procedure and test a 'causal mediation' model. Overall, and for Trump supporters in particular, our report corroborates the author's main finding of a strong negative effect of the endorsement on the overall perception of the endorser (Nature). However, the additional analysis provides weaker evidence for a reduction in trust in the scientific community more generally. | |
| https://ideas.repec.org/p/zbw/i4rdps/176.html | Carter (2024) examines the historical conditions that shape protection versus assimilation for indigenous communities, arguing that state-led conscription programs are one such factor. In a natural experiment leveraging conscription for a 1920s Peruvian highway designed to replicate a pre-colonial road system (Qhapaq Ñan), Carter finds through a geographic regression discontinuity design that eligibility for state conscription increased the likelihood of a municipality having an indigenous movement by about 30 percentage points (approximately .75 standard deviations) and scores on an omnibus accommodation measure by about .3 items (approximately .4 standard deviations). The omnibus measure includes the number of institutions that an indigenous community reports preserving (increased by .3 items on a 7 point scale, or .25 standard deviations), likelihood of having a communal land title (increased by 12 percentage points, or .3 standard deviations), and likelihood of registration with the government (increased by 9 percentage points, or .3 standard deviations). All point estimates are significant at the .1% level. We successfully computationally reproduce all main claims of the paper but find inconsistencies between the map of the road presented by Carter and that used by Franco et al. (2021) that affect its passage through a small number of municipalities. In order to investigate whether these municipalities drive the main findings without the ability to identify municipalities in the data, we drop municipalities iteratively and re-run the analysis, finding only minor changes in coefficient estimates across subsets. In addition, we explore a number of sensitivity analyses for the regression discontinuity design that vary the functional form, vary the bandwidth window, and use the Rosenbaum method for window selection. While the results remain consistent under all analyses, we recommend for further research to recode treated municipalities on the basis of the alternative road map and explore the as-if random assumption in light of evidence linking proximity to the precolonial road to various economic and political outcomes. | |
| https://ideas.repec.org/p/zbw/i4rdps/177.html | Atwood (2022b) reports a positive effect of the 1963 measles vaccine on long-run economic outcomes. The identifying variation is from pre-vaccine average reported measles incidence, but this plausibly represents reporting capacity or initial health levels, rather than actual disease incidence. I extend the sample and use an event study to test for differential trends, and find trends that are inconsistent with a treatment effect of the vaccine. | |
| https://ideas.repec.org/p/zbw/i4rdps/178.html | Mattingly (2024) investigates how authoritarian leaders select military generals, focusing on the People's Liberation Army of China. Three main findings emerge. First, in general, Chinese leaders consider both personal ties (as a proxy for loyalty to the leader) and combat experience (as a proxy for competence) when promoting military officers. Second, personal ties are particularly relevant during periods of domestic threat. Third, combat experience only matters during periods of foreign threat. We successfully replicate all main results with Mattingly's (2024) database, only identifying minimal differences in calculated standard errors when employing Stata instead of R. However, results differ substantially in sign, magnitude, and statistical precision once we employ alternative, data-driven approaches to defining periods of domestic threat. Alternative specification results pertaining to foreign threat periods are more robust in sign but also vary in terms of magnitude and levels of statistical relevance. | |
| https://ideas.repec.org/p/zbw/i4rdps/179.html | Rigorous replication efforts is crucial for good social science, and I am grateful to Jetter and Swasito (2024), who replicate and extend the results of a recent published paper (Mattingly, 2024). My original paper examined, among other things, the ways in which periods of foreign and domestic threat shaped how the Chinese Communist Party selected officers for the People's Liberation Army (PLA). Jetter and Swasito confirm the core results are computationally replicable. To extend the results, they use alternative data sources to measure foreign and domestic threat. They conclude that the domestic threat results in particular are not robust to the alternative data source they use. I raise questions about the quality of this alternative data source. I also ask whether, even if the data were of higher quality, it would map onto the core concept. Finally, I argue for the importance of substantive knowledge of the case and qualitative scoring of the foreign and domestic threat variable. However, the points raised by Jetter and Swasito in their replication effort are important and well-taken. Measuring concepts such as foreign and domestic threat is challenging, and doing so is a potential avenue for future quantitative research. | |
| https://ideas.repec.org/p/zbw/i4rdps/180.html | Liu et al. (2023) examines the effect of climate change on labor allocation in India over a long time span. The authors find that rising temperatures are correlated with lower shares of workers in non agricultural sectors. They also identify a likely mechanism: falling agricultural productivity leads to a reduction in demand for non-agricultural goods or services, leading to a reduction in labor demand in non-agricultural sectors. We undertake a reproduction and extension of Liu et al. (2023), and find that we are able to computationally reproduce all the numbers produced by the authors up to marginal differences in the calculation of standard errors. We describe a set of data issues that hindered full reproduction of the original dataset, and, in one case, contradicts a claim of data availability made by the authors. Finally, we test the robustness of the main results to a more consistent use of fixed effects and the use of Poisson regression, following Chen and Roth (2024). The Poisson regression approach does not alter the results, but in several of the new fixed effects specifications the author's original results are less conclusive and lose statistical significance. | |
| https://ideas.repec.org/p/zbw/i4rdps/181.html | In this study, we evaluate the reproducibility and replicability of Scott Orr's (2022) innovative approach for identifying within-plant productivity differences across product lines. Orr's methodology allows the estimation of plant-product level productivity, contingent upon a well-behaved pre-estimated demand system, which requires the use of carefully chosen instrumental variables (IVs) for output prices. Using Orr's STATA replication package, we successfully replicate all primary estimates with the ASI Indian plant-level panel data from 2000 to 2007. Additionally, applying Orr's replication codes to a sample from 2011 to 2020 reveals that the suggested IVs do not perform as expected. | |
| https://ideas.repec.org/p/zbw/i4rdps/182.html | I would like to thank Hong and Luparello (2024) for their effort in replicating, as well as partially extending, the results in Orr (2022). This replication report makes what I believe to be three key points: 1. The replication package of Orr (2022) is computationally reproducible (up to two minor caveats, which I very much appreciate the authors catching) 2. The demand estimates are sensitive to a 30% cost-share threshold pursued when con- structing input price based IVs. 3. The demand estimation strategy does not work when applied to a different sample (the same industry, but a later time period). - In this brief note, I provide some minor commentary on each of these findings. I focus primarily on point 2, where my interpretation of their results differs slightly from their own. In particular, I believe these results largely show that the cost-share threshold does not matter much quantitatively | as long one is careful to choose thresholds that make economic sense | although I also acknowledge that the precision of the estimates can be sensitive to this threshold. | |
| https://ideas.repec.org/p/zbw/i4rdps/183.html | We analyze over 44,000 economics working papers from 1980-2023 using a custom language model to construct knowledge graphs mapping economic concepts and their relationships, distinguishing between general claims and those supported by causal inference methods. The share of causal claims within papers rose from about 4% in 1990 to 28% in 2020, reflecting the "credibility revolution." Our findings reveal a trade-off between factors enhancing publication in top journals and those driving citation impact. While employing causal inference methods, introducing novel causal relationships, and engaging with less central, specialized concepts increase the likelihood of publication in top 5 journals, these features do not necessarily lead to higher citation counts. Instead, papers focusing on central concepts tend to receive more citations once published. However, papers with intricate, interconnected causal narratives-measured by the complexity and depth of causal channels-are more likely to be both published in top journals and receive more citations. Finally, we observe a decline in reporting null results and increased use of private data, which may hinder transparency and replicability of economics research, highlighting the need for research practices that enhance both credibility and accessibility. | |
| https://ideas.repec.org/p/zbw/i4rdps/184.html | Our paper examines construct validity, an often neglected yet important element affecting the generalizability of individual study results. Construct validity deals with how the operationalization of a treatment corresponds to the broader construct it intends to speak to. The universe of potential operationalizations is referred to as the design space. As an empirical example, we systematically review 45 microfinance Randomized Controlled Trials to estimate the size of the design space. Variations in the treatment operationalization matter for the observed effect. We also show that most papers generalize from the operationalized treatment to a broad construct, mostly without acknowledging underlying assumptions. | |
| https://ideas.repec.org/p/zbw/i4rdps/185.html | Brock and De Haas (2023) study the effect of randomising applicant gender in small business loan applications that are reviewed by loan officers at a Turkish bank in a lab-in-the-field experiment based on real-life applications. The main re- sults are: first, that loan approval rates are not gendered (direct discrimination); second loan officers are 6 percentage point (26%) more likely to condition loan ap- proval to a guarantor when the applicant is a female rather than a male (indirect discrimination). In our computational replication we obtain the manuscript results. In addition, a robustness replication shows that the main results are partly driven by the role of loan types, job seniority and population differences among cities. | |
| https://ideas.repec.org/p/zbw/i4rdps/186.html | Lee (2022) evaluates whether elected officials update their policy positions based on expert evidence. His cross-subject and within-subject designs run in the American local and state policymaking context both confirm the capacity for politicians to update their beliefs in response to expert evidence, cutting across party-lines and regardless of the valence of the information provided. Our replication finds that the study as published is nearly perfectly replicable following Lee's publicly-available code, with some minor departures that merit revision. Following replication, we offer suggestions for improved clarity of the code, greater transparency with the data accessibility, and clarification of the minor inconsistencies identified between the code and the published work. | |
| https://ideas.repec.org/p/zbw/i4rdps/187.html | Tong and Wei (2020) study the impact of unconventional monetary interventions on credit markets during the 2008 global financial crisis. They find that stock prices increase on intervention days, particularly for firms operating in sectors perceived to be more reliant on external funding. Moreover, they report a positive effect of unconventional interventions on firms' subsequent investment, employment, and R&D expenditure, and find that bank recapitalizations are more effective than other interventions. We replicated the reported findings using the data and programs provided by the authors. However, the results did not hold when using the Rajan-Zingales (1998) metric of firms' external financing needs for capital expenditure-one of the two liquidity measures proposed by Tong and Wei (2020). Furthermore, the results do not hold for U.S. and European firms, and they appear driven by a small subset of Canadian firms operating in the extraction of gold and silver ore. These findings raise questions on the results and policy implications proposed in the original paper. | |
| https://ideas.repec.org/p/zbw/i4rdps/188.html | The experimental study "Letting Down the Team? Social Effects of Team Incentives" by Philip Babcock and colleagues (2015) proposes that team incentives significantly enhance individual performance through social pressure and peer effects. The findings suggest that individuals are motivated by a desire to avoid disappointing their teammates, indicating that social dynamics, such as guilt and social pressure, play a crucial role in shaping behavior in team settings. In this report, we computationally reproduce the results from the original paper and perform several robustness checks. Overall, we ascertain the good reproducibility of the study and find that the results hold across the performed robustness checks. | |
| https://ideas.repec.org/p/zbw/i4rdps/189.html | Karpowitz et al. (2024) examine the effect of racial diversity on group decisionmaking. The authors used OLS regression to analyze 2,694 citizens randomly assigned to 449 mock juries who are tasked to make decisions, and they find that the number of the people of color (POC) affects private decisions, but not so much on group decisions. We successfully replicated the main results of the paper with no coding errors, and we implemented the following robustness checks. First, we reset different seeds and found that they do not change the tables or the graphs, so the findings are not subject to arbitrarily chosen seed. Second, we applied multiple imputations rather than list-wise deletion for a few variables which the authors removed for containing missing values. While the original result continues to hold, our findings suggest that the presence or absence of even one individual of POC on the jury might be more significant than the incremental increases of the number of POC on the jury. Third, we shifted from using POC as the independent variable to using Black as the binary independent variable. When the group's racial composition is framed as Black versus non-Black, we do not observe a significant impact on individual punitiveness after deliberation. In addition, juries with one or no Black members showed greater punitiveness in their initial verdicts compared to juries with two or more Black members, but this effect diminished by the second round. | |
| https://ideas.repec.org/p/zbw/i4rdps/190.html | Experimental asset markets provide a controlled approach to studying financial markets. We attempt to replicate 17 key results from four prominent studies, collecting new data from 166 markets with 1,544 participants. Only 3 of the 14 original results reported as statistically significant were successfully replicated, with an average replication effect size of 2.9% of the original estimates. We fail to replicate findings on emotions, self-control, and gender differences in bubble formation but confirm that experience reduces bubbles and cognitive skills explain trading success. Our study demonstrates the importance of replications in enhancing the credibility of scientific claims in this field. | |
| https://ideas.repec.org/p/zbw/i4rdps/191.html | Balán et al. (2022) evaluate the impact of "local elites" involvement in local tax collection in a large city in the Democratic Republic of Congo. Using a randomized controlled trial to vary the identities of tax collectors, they find that local elites' involvement raises tax compliance and total revenue by 50 and 44 percent, respectively. The paper argues that the primary mechanism behind the results is better targeting made possible by local elites' superior information about property holders' willingness and ability to pay. In this replication comment, we first reproduce the paper's main results. Then, we assess the robustness of the results by (1) employing randomization inference for statistical tests; (2) controlling for baseline characteristics that are not balanced; and (3) using an alternative method to examine the claims supporting the preferred mechanism of better targeting. We find robust estimates in (1). However, the results are less robust both in terms of statistical significance and magnitude for (2) and (3). We conclude that the average treatment effect is robust, while the main claim about mechanisms, the information channel, is less robust to alternative estimation approaches. We contextualize and discuss the significance of these results, including the negligible revenue potential even under full compliance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import requests | |
| from bs4 import BeautifulSoup | |
| import csv | |
| from tqdm import tqdm | |
| # Base URL | |
| base_url = "https://ideas.repec.org/p/zbw/i4rdps/{}.html" | |
| # Open a CSV file to save the abstracts | |
| with open('abstracts.csv', 'w', newline='', encoding='utf-8') as csvfile: | |
| writer = csv.writer(csvfile) | |
| writer.writerow(['url', 'abstract']) | |
| # Iterate over the range of URLs | |
| for i in tqdm(range(1, 192), desc="Scraping abstracts"): | |
| url = base_url.format(i) | |
| response = requests.get(url) | |
| # Check if request was successful | |
| if response.status_code == 200: | |
| soup = BeautifulSoup(response.text, 'html.parser') | |
| abstract_section = soup.find('div', id='abstract-body') | |
| if abstract_section: | |
| abstract = abstract_section.get_text(strip=True) | |
| else: | |
| abstract = "Abstract not found" | |
| # Write URL and abstract to CSV | |
| writer.writerow([url, abstract]) | |
| else: | |
| print(f"Failed to retrieve {url}") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment