Five years ago, Daniel MacArthur set out to build a massive library of human gene sequences—one of the biggest ever. The 60,706 raw sequences, collected from colleagues all over the globe, took up a petabyte of memory. It was the kind of flashy, blockbuster project that would secure MacArthur a coveted spot in one of science’s top three journals, launching his new lab at the Broad Institute into the scientific spotlight. But before all that happened, he did something that counted as an act of radicalism in the world of biology: He put it on the internet.
Posting scientific papers online before peer review—in so-called preprint archives—isn’t a new idea. Physicists have been publishing their work this way, free to the public, for decades. But for biologists, preprints are uncharted territory. And that territory is rapidly expanding as academia and its big-time funders shift toward a culture of openness. As preprints become more popular, they’re throwing the field into a state of uncertainty.
Science usually goes like this: Researcher runs experiment, researcher analyzes data, researcher writes up results. In high school biology, the process stops there. But in real life, that’s when the real slog starts. Researchers submit their results to the most prestigious journal they think might publish them … and then they wait. If the paper is rejected, they try another journal. Then they wait again. Once they get accepted, they go through a cycle of peer review, responding to critiques from an anonymous group of colleagues. On average, it takes biomedical researchers eight months to go from submission to publication, but sometimes it takes up to three years. All the while, scientific progress—the sequential building of knowledge, based on the work of others—gets held up.
That slow, rigorous process leaves academic publishing houses—including big names like Springer Nature, Wiley, and Elsevier—with control over the flow of scientific knowledge. By selling that knowledge back to universities, academics, and the public in pricey subscriptions and per-article fees, the global industry brings in more than $24 billion in revenue every year. But since the early 2000s, scientists and powerful funders like the Gates Foundation, the Ford Foundation, and the Wellcome Trust have championed alternatives to subscription publishing. Grant-givers want to stretch the public impact of their research dollars, which means knocking down pricey paywalls. And researchers want to break out of the brand-name journal merry-go-round, whose incentives, they believe, are distorting the quality of modern science.
“This is a tipping point in biology. It’s a cultural choice, not a technological question.”
Preprints could solve these issues by decoupling distribution of results from their certification via peer review. But publishers and some scientists worry preprints will only further dilute the research literature and endanger fields already struggling with reproducibility failures. And since preprints also threaten to dilute revenues at academic publishing houses, there’s more than just scientific integrity at stake.
Daniel MacArthur, like most scientists trying out the preprint scheme, didn’t totally abandon the traditional scientific publication track. His human exome reference library was eventually published in Nature, and would go on to be cited more than 800 times. But because he posted both the dataset and the preprint explaining it more than nine months before the peer-reviewed version came out, other scientists didn’t have to wait to start using his data. Between October 2015 and August 2016, scientists viewed his newly compiled exome data 3 million times and downloaded the preprint more than 18,000 times. Together, they helped researchers launch new investigations into the genetic factors underlying diseases like schizophrenia, Alzheimer’s, and cancer.
This, then, is the two-fold promise of preprints: Scientists get to demonstrate their scholarly contributions to potential funders while their manuscripts are being peer-reviewed for publication. And at the same time, the scientific community gets to see that work months or even years before they would otherwise. Just how quickly could preprints speed up scientific discovery? According to Stanford bioengineer Stephen Quake, if one preprint inspired the work of just two other people, biologists would see a five-fold acceleration in scientific progress within a decade.
Quake’s interest in cranking up to quintuple-time isn’t just hypothetical. He’s heading up one of the more ambitious biological projects of the 21st century—cataloguing every cell in the human body. In September, Quake was named co-president of the Chan Zuckerberg BioHub, a new $600 million center funded by Silicon Valley’s couple-in-chief. The BioHub’s premier project is the Human Cell Atlas, made possible by inventions (from Quake and others) that let scientists study individual cells on chips. To keep innovations like those flowing, Quake is requiring all of the BioHub’s 47 investigators to post preprints if they’re going to submit to a peer-reviewed journal. “This is a tipping point in biology,” Quake told a crowd at Stanford’s Big Data in Biomedicine Conference in May. “It’s a cultural choice, not a technological question.”
Stand and Deliver
Whether or not Quake’s estimate is correct, he is right about one thing: Biology is at a tipping point. Depending who you talk to, it’s either in the middle of a populist revolution or an existential crisis. In the past year, the popularity of biology preprints has taken off like a SpaceX Falcon 9. But they still only represent 1 percent of all scholarly work in the biomedical fields. In comparison, the preprint server for physicists and mathematicians today hosts 1,275,427 papers—about 70 percent of their academic canon.
For physicists, preprints have been the default method of sharing new work since the ‘90s, when the high energy particle folks first started bringing mimeographed copies of their submitted articles to physics conferences. That tradition eventually yielded a central repository that lived on the internet: arXiv.org.
Biology Preprint Foundings
- 2003 arXiv q-bio
- 2012 F1000Research
- 2013 PeerJ Preprints
- 2013 bioRxiv
- 2014 The Winnower
- 2016 preprints.org
- 2016 Wellcome Open Research
Biology’s nascent network is more fractured. There are currently seven active servers for biology preprints, depending on how you define them, with more showing up all the time. Without clear standards or expectations, scientists usually just choose whichever one they’re most familiar with.
Increasingly though, the go-to server is one called bioRxiv (pronounced bio-archive). In April, the Chan Zuckerberg Initiative agreed to a multi-year funding package—terms of which have not been disclosed—to solidify the future of bioRxiv. The money and engineering resources will also be used to bulk up the server’s automated tools for text mining, to make the repository’s content more accessible to researchers and easier to analyze with a machine. In addition to providing a digital home for preprints, scientists can also submit their work directly from bioRxiv to more than 100 peer-reviewed journals with just a few clicks.
Richard Sever, a molecular biologist at Cold Springs Harbor Lab, co-founded bioRxiv in 2013. When it first started, scientists submitted about 50 papers each month. Most of them were genomics researchers and bioinformaticians—people who used to be physicists. They had switched professions only to find no preprint server, or culture of sharing, to match their previous experience. They were the earliest adopters of Sever’s new repository.
But lately, they’ve been joined by others. The fastest growing adopters are now neuroscientists, also known for their physics and computational backgrounds. “I suspect we’ll see lots of separate waves as new fields start to catch on,” says Sever. The topics are already a lot more diverse than when Sever started; back then, there were so many papers about Crispr, he joked they should rebrand as a journal devoted to the new gene editing technique. “Crispr is a perfect example of why preprints are needed,” he says. “Things are just happening so fast.” In the last year, bioRxiv has grown exponentially; as of March, it hit 1,000 new papers added per month.
That rapid expansion is creating serious friction. One of the concerns with preprints is that scientists will sacrifice accuracy for speed—that in the rush to be first on the scientific record, they’ll wind up filling the internet with crap. Traditional peer review is supposed to catch mistakes and make sure a paper’s scientific reasoning is sound, and uploading a virgin paper means people will see work that could be wrong. But that’s kind of the whole point of preprints: It allows an entire field to weigh in, in public, instead of an anonymous few talking in a vacuum. Sever thinks it will actually make science more rigorous, allowing peer-reviewers to glean broader insights from a paper’s public trial.
And, as UC Denver research librarian Jeffrey Beall points out, there’s already plenty of crap science out there—much of it a result of another, earlier attempt to fix the problems of academic publishing. In the early ’90s, so-called open access journals started to make scientific research free to anyone with working WiFi by shifting costs to scientists, who pay an upfront fee to cover editing. But it’s not a perfect solution: Some “pay-for-play” journals prey on novice scientists, charging exorbitant fees to publish junk papers without careful review. Beall calls these predatory publishers “the biggest threat to science since the Inquisition.” From 2012 to 2017, Beall maintained a blacklist of journals with dubious publishing practices, serving as a resource for scientists, journalists, and hiring committees. But in January, after five years, he was forced to take it down, as his university came under pressure from many of the journals he targeted.
Now, the preprint may succeed where open access journals have floundered. Beall is optimistic: “It’s a little more chaotic, but it doesn’t involve the exchange of money, which is the root of all evil in scholarly publishing,” he says. “I think preprint servers could be a way to make predatory publishers obsolete.”
And so far, the preprint has a pretty good track record. Right now, about 60 percent of the articles on bioRxiv go on to be published in a peer-reviewed journal. Look at someone like George Church, the decorated Harvard geneticist who was an early entrant to bioRxiv (and one of those people populating it with Crispr studies). Starting in 2014, he has posted 28 papers to the preprint server. Of those, 13 went on to be published in peer-reviewed journals like Nature Methods and Science Advances. The other 15 have not, but more than half of those were just added in the first six months of 2017.
Of course, Church is hardly the norm. A well-published—nay, famous—scientist like George Church has a much easier time choosing to preprint than biologists early in their careers. There are risks to publishing online before peer review: Scientists may not acknowledge preprints as establishing priority of discovery. Peer-reviewed journals could reject a manuscript if it has previously appeared as preprint. And opening up the forum could also devolve the quality of the discussion. There’s already an app, developed by biostatisticians at Johns Hopkins, that allows people to swipe right on bioRxiv papers they like. Its creators say the “Tinder for pre-prints” is just for fun, but they do hope to learn from it how scientists value different kinds of work.
It wouldn’t be the first time that people used an online platform in weird ways its creators never intended—Twitter bots pushing fake news and Facebook groups sharing revenge porn. It’s impossible to know how people will use new tools before they actually use them. Are preprints a first step toward publication in a peer-reviewed journal, a working document, or something else entirely? In the absence of consensus, the rules around preprints are as varied as the biologists that publish them.
You may be wondering why scientists would even bother to publish in journals after they’ve posted a preprint—a system intentionally built to subvert the bottleneck of peer-reviewed publication. But the system of academic publishing and all the rewards built into it haven’t disappeared. Which means for now at least, biology careers aren’t made on bioRxiv. Traditional journals still hold the key to postdoc positions, tenure lines, and lab funding.
Some of those publications make no attempt to hide their disdain for preprints. Which forces scientists to choose between sharing their work openly and keeping it offline to give it a shot at a classy publication. The _New England Journal of Medicine_doesn’t accept articles that have been released elsewhere first (though it does make its articles freely available six months after publication). The Proceedings of the National Academy of Sciences won’t take papers that appear as preprints if they have a Creative Commons License, which about 70 percent of bioRxiv papers do. On the other end of the spectrum, the open access journal PLoS Genetics actually sends its editors to scour bioRxiv and other preprint servers to look for papers to publish.
But perhaps no publisher better encapsulates the upheaval than Cell Press. Which is appropriate, given that its namesake journal was the first to propagate the idea that _where_you published mattered more than what you published. In 1974, the Massachusetts Institute of Technology launched Cell to showcase the newly emerging field of molecular biology. At that time, the norm was for scientists to submit to the journal that matched its subject matter best, and for editors to publish any research that could pass peer-review. But Cell’s first editor, a young biologist named Ben Lewin, treated his new journal like an exclusive club, rejecting far more papers than he published. He basically invented prestige publishing.
Within a few years, other titles like Nature and Science followed suit, jumping to the top of the newly established ranking system known as the “impact factor.” It does things like measure how many citations a paper gets and in what kinds of journals those citations appear. Now commonly accepted as the currency of scientific prestige, researchers who publish in “high-impact” journals are more likely to get job offers, grant money, and attention from the mainstream media.
Now with 30 high-impact journals to its name, Cell Press is one of those publishers that can make or break a career. It’s also had a rapidly shifting relationship with the preprint process. Before September of last year, Cell Press’s official policy required scientists who were planning on submitting to a journal to consult an editor before submitting a preprint. Unofficially, some researchers were told they couldn’t post a preprint until after they’d submitted to a Cell journal, some were told it was at the editor’s discretion, and some were discouraged from putting it up at all. This confused scientists and upset advocates for open publishing.
Fighting the blowback, Cell Press updated its language to clarify that any papers previously posted on a preprint server would be considered for publication, and the talking to an editor was just encouraged, not required. But still, scientists weren’t convinced that Cell titles were fully on board with preprints.
In March of this year, things got even murkier when the publisher launched its own early stage platform. Called “Sneak Peek,” it’s not exactly a preprint server (because scientists can only post if Cell Press has accepted their manuscript), and it’s not exactly open access (because you need to register for free to see them). But it does allow scientists to share work ahead of peer review and publication and brag about their high-profile placement at the same time. “Many scientists are concerned about the impact of endorsing and disseminating non-peer reviewed data, others champion speed and decentralization over quality control,” says Emilie Marcus, Cell Press’ CEO and the editor-in-chief of Cell. “Both perspectives hold merit and the challenge is to find a way forward that respects and sustains them both.”
In Valleyspeak, Cell is attempting to disrupt its disruptor. As many scientists have pointed out, it looks like the for-profit company is making moves to undercut a nonprofit model for open information sharing. There’s no real money in preprint servers—Nature Publishing Group tried it a decade ago (before it was Springer Nature)—and shuttered the service after five years. All the other repositories out there are sustained by some combination of grants, donations, and support from academic institutions. But if every name brand journal started hosting their own “sneak peek,” it would further fracture efforts to get all biological preprints in one place with one set of rules.
For more than a year, a scientist-driven initiative to promote preprints in biology, called ASAPbio, has been working with funders like the National Science Foundation, the National Institutes of Health, and the Gates Foundation to develop plans for one central aggregation site—along with bylaws and a community-elected body to govern it. But those conversations were put on hold in April, following the news of the partnership between Chan Zuckerberg Initiative and bioRxiv. Not wanting to duplicate or compete with its efforts, ASAPbio is now re-evaluating its roadmap to a centralized service.
It’s still too early to say whether bioRxiv will emerge as the one preprint server to rule them all, or if it will become just a part of a grander plan. But if it’s up to researchers like Daniel MacArthur, posting your work online won’t always be an act of rebellion. Someday, it’ll just be science as usual. “I don’t think we’re looking at a world where professional journals magically disappear,” he says. “But the publishing business model will no longer depend on being the sole gatekeepers of access to scientific communication.”
For now, MacArthur has a foot in two worlds. In the field of large-scale genetics, preprints have already become the default. But many of his clinical biology colleagues are still wary. So he’s trying to lead the way by example. On a webpage of Massachusetts General Hospital’s Analytic and Translational Genetics Unit, MacArthur and the unit’s other core faculty members have signed a pledge. They promise to deposit every manuscript from their labs to preprint servers like bioRxiv when they submit to a journal. “We believe that it is only a matter of time before the concept of restricted access to the products of scientific research becomes an anachronism,” they wrote. “And we hope that the human genetics and genomics community can play a leading role in the transition to more enlightened models.”
At the bottom, there’s a place where other scientists can add their names and labs, swear their own oaths of openness. Right now the list isn’t very long. But it’s only getting longer.