Inspired by the session of RC33 Logic and Methodology in Sociology in Toronto during the World Congress 2018.
The original title of this session talked about the replicability crisis in the social sciences. As the program coordinator, I found that title a little harsh. Is it really a crisis? I mean, what is a crisis? I worked under the social psychologist Diederik Stapel when he was exposed as a fraud, fabricating almost all data in his papers. That was a crisis, at least for our Faculty. In my most recent research, the data collection agency failed to provide me with 90% of respondents’ user agent strings. That felt like a crisis, since I needed that information to answer my research question. But a replicability crisis? I don’t know.
I keep hearing people talk about the importance of replication. And it just doesn’t get me excited. Yes, we need to replicate results in order to find out how robust they are. And yes, it is the foundation of science. As Popper argues, we do not take even our owns observations seriously if we cannot replicate them. Replications are the cornerstone of theoretical development. If results of replication match, we have successfully replicated and increase the probability that results are valid. Nonreproducible results jeopardize scientific progress, waste resources, harm society and damage confidence in science. There is a gap between propagated values of Poppers model and actual practice in social science research: many studies are not replicable and significant findings are published more frequently.
Most of my personal academic career is based on nonsignificant results. In my line of research, that is good news. That means that different ways of conducting surveys do not produce different results. And that means, there is less room for bias. I never had any problem in publishing my papers with insignificant results. However, I have learned from colleagues in other disciples, that they do have problems in publishing non-significant results. And that leads to publication bias.
I totally agree that we need replication to get a step further in scientific research. Studies are needed to determine the exact share of empirical results that cannot be replicated, and whether this problem differs on the basis of topical areas (e.g., educational-research), research designs (e.g., experimental-research) and data-collection methods (e.g., online-surveys).
There are many reasons why previous results might not be replicated, however. Cases of scientific misconduct have been identified, with my former boss Diederik Stapel as an example, where results have been intentionally fabricated. Second, confirming studies get more easily published (in some disciplines!), whereas null results might remain invisible. Third, different, but equally plausible methods of data collection and analysis often lead to different conclusions. And fourth, if we do not use the exact same dataset, would it be possible in any sense to replicate results?
So, what is replicability? Replication may take different forms: reanalyzing (a) the same or (b) alternative data with (c) the same or (d) different methods. As Georg Mueller pointed out in this session at the world congress, there are different types of replication. If the true relation is substantial, and both the first result and the meta-analysis are substantial, then we have correct replication. If the true relation is absent, and the first result is substantial and the meta-analysis shows absence of significant results, we have correct non-replication. But in all other instances, we have either wrong non-replication, wrong replication, or ambiguous results. I would love the see actual figures for these different types of replication, because in my view, it would be VERY hard to find either correct replication or correct non-replication, and we end up with our replication studies being wrong (replication or non-replication) or ambiguous. And what’s the point in that?
And if we were to endeavor such a difficult enterprise, how should we design our study. Re-analyse the exact same dataset? What is best? Replicate exactly a study? Or already take into account changes and new knowledge about a certain topic? Neuert replicated a study that is almost 15 years old? We have new knowledge, literature, options, devices, changes in society? Isn’t it then better to take that into account? Does that increase our knowledge in comparison to just replicating results?
One particular (interesting) form of replication methodology represents the so called ‘crowd sourcing data analyses’, where different teams conduct parallel analyses and later compare the methodology and results. That makes sense. But replication with different datasets? How on earth can we exclude alternative explanations? Date, population, method of data collection, visual design, just to name a few. Cannot be overcome. And what if big groups of researchers work with the same data set. What does it mean if the majority says one thing and the minority the other? The minority can still be right.
So I don’t believe in replication. Just a personal opinion and you are more than welcome to prove me wrong. I believe in transparency and open science. It gives us the opportunity to form an opinion about data quality and the reliability and validity of results.
Breitenbach asked authors of 23 articles if they would share their data and syntax and only 6 were willing to do so. Some said co-authors were responsible for all calculations or they said themselves that details were insufficient for replication. I must admit, a couple of years ago I was asked to share my data and syntax. And I did not. I just could not find the time to check if my syntax was readable to other people. And I did not want to share before I had checked. STUPID reason, I know. But it was the real reason. I had nothing to hide and was not scared to share.
Luckily, since a couple of years I am obliged, as all other scientists in the Netherlands, to have a data archive for each published paper I have. It should contain: the raw data set, the complete syntax, the published paper, a codebook, plus a readme file that contains all relevant information for the paper. And luckily, more and more journals ask authors to share this information in order to get the paper published in their journal.
That leads me to the title of this session. Replication crisis. Is it a replicability crisis or crisis of trust in journals? If referees have worked properly, then there would not be a problem of replicability, right? Is it fair to ask fellow scientists to judge papers pro bono? I know I find it difficult to say no to editors. And I know part of my job as scientist is to review papers. But if I want to do my review job properly, I need to sit down for a couple of days, preferably with the dataset and syntax, in order to be able to fully judge a paper. And I don’t know about you, but that just does not work in my schedule. Blind review does not help in this. Maybe if a review is not blinded, reviewers may do a better job to save their good name. Maybe we should pay reviewers, or have professional review jobs. I don’t know. I am in a crisis. Just not a crisis about replication.
Your (future?!) President