After more than 3/4 of a million guesses, in over 50,000 games played in 67 countries, the results are clear: Science sounds like gobbledygook.
arXiv vs. snarXiv has been live for 6 months now, and it’s time to take a look at the results. Here’s how the game works. The user sees two titles: one is the title of an actual theoretical high energy physics paper on the arXiv, and the other is a completely fake title randomly generated by the snarXiv. The user guesses which one is real, finds out if they’re right or wrong, and then starts over with a new pair of titles.
I’ve been recording the result of each guess, originally just out of curiosity. I never expected to get reasonable statistics on the over 120,000 high energy theory papers on the arXiv. But after more than 750,000 guesses, that’s exactly what I’ve got, which means we can do some fun stuff.
The Most Fake-Sounding Papers
First, let’s take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the lowest percentage of correct guesses when users try to distinguish them from a randomly generated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, generating better statistics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.
A tip for future arXiv vs. snarXiv players: the snarXiv is more grammatical than the arXiv. If you see “Heterotic on Half-flat” and think “uh… Half-flat what?” then you can be nearly certain it’s a real scientific paper that was written to advance the boundaries of human knowledge.
As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spectacularly low percentage of which were correct.
|2/13||15%||Actions and Fermionic symmetries for D-branes in bosonic backgrounds(June 2003)|
|3/15||20%||CERN LEP2 constraint on 4D QED having dynamically generated spatial dimension(December 2001)|
|6/27||22%||Families as Neighbors in Extra Dimension(January 2000)|
|6/25||24%||The Greening of Quantum Field Theory: George and I(October 1993)|
|3/12||25%||Dyonic solution of Horava-Lifshitz Gravity(April 2009)|
Stump the Experts
People with all sorts of backgrounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely suspicious of ridiculous-sounding words that have been co-opted for technical purposes — “‘Mirage Mediation?’ that can’t be real!” High energy physicists, on the other hand, are used to their own unfortunate verbiage. Unfortunately for them, however, there are still plenty of papers on the arXiv that sound like they were written by a computer.
Let’s define an “expert” game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of September 16, 2010). Tallying up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.
|1/5||20%||An extended model for monopole catalysis of nucleon decay(November 2002)|
|1/5||20%||Space-time symmetry restoration in cosmological models with Kalb–Ramond and scalar fields(July 2004)|
|1/5||20%||Towards Evaluation of Stringy Non-Perturbative Effects(November 1995)|
|1/5||20%||Dimensional Reduction(July 1998)|
|1/5||20%||The Ridge, the Glasma and Flow(December 2008)|
|1/5||20%||Generalized Bunching Parameters and Multiplicity Fluctuations in Restricted Phase-Space Bins(June 1996)|
|2/7||28%||BGWM as Second Constituent of Complex Matrix Model(June 2009)|
|3/10||30%||Testing factorization(November 2001)|
|2/6||33%||Supersymmetric Potentials in Einstein-Cartan-Brans-Dicke Cosmology(April 2001)|
I have to say, some of these definitely sound like they came straight from the snarXiv. I really don’t know what Glasma is, though. The snarXiv does not have Glasma.
Ok, so the papers above sound especially ridiculous. However, the average across all 750,000 guesses on all papers is still only 59% correct. While better than a monkey, this is not particularly good. Who’s responsible? Surely not the world’s top minds?
Here’s a ranking of some of the most-highly cited physicists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), according to the percentage of correct guesses on their papers. A smaller percentage means that their papers sound more like complete flapdoodle. I should note for the sake of my career that this has absolutely nothing to do with the quality of said papers.
I’d especially like to congratulate Frederik on his anomalously low 49 percent. You make us all worse than a monkey, Frederik.
Now let’s turn to even more famous people: physics bloggers (and authors). Here are some of the most prominent, ranked from most fake-sounding papers (smallest percentage) to least fake-sounding papers (largest percentage). I think there’s a lesson here somewhere, though it’s hard to be sure in some cases, due to small statistics.
|117/217||53%||Sean M. Carroll [blog]||68/112||60%||Lubos Motl [blog]|
|129/234||55%||Jacques Distler [blog]||202/331||61%||Mark Trodden [blog]|
|343/609||56%||Clifford V. Johnson [blog]||98/159||61%||Sabine Hossenfelder [blog]|
|56/97||57%||John C. Baez [blog]||220/352||62%||Lee Smolin [site]|
|306/505||60%||JoAnne Hewett [blog]||6/8||75%||Peter Woit [blog]|
Fake-Sounding and Real-Sounding Words
Suppose you’re writing a scientific paper, and you want to ensure that the general public doesn’t think it’s complete malarkey. How do you do it? Here are the 10 words with the lowest percentage of correct guesses (most fake-sounding) for titles containing those words (to ensure no single paper dominates this percentage, I’m requiring that each word appear in at least 5 titles).
Avoid these words! Turns out people don’t believe in “multiskyrmions.” Also, you shouldn’t mention “Saturn,” or use normal english words like “secret” or “enough.” By contrast, here’s a list of the 10 words with the highest percentage of correct guesses (most realistic-sounding) for titles containing those words.
In other words, if you want to be taken seriously as a scientist, you should call your next paper Unusual Naked, but Anomaly-Free.
Incidence of Apparent Hooey in Various Subfields
Papers on the arXiv can be associated with one or more physics subfields. Here’s a ranking of subfields with at least 50 guesses from most fake-sounding to least fake-sounding.
|39/81||48%||Adaptation and Self-Organizing Systems||134/226||59%||Combinatorics|
|172/340||50%||Data Analysis, Statistics and Probability||116/195||59%||Accelerator Physics|
|212/414||51%||History of Physics||140/235||59%||Soft Condensed Matter|
|167/307||54%||Operator Algebras||1898/3172||59%||Differential Geometry|
|136/248||54%||Rings and Algebras||114/190||60%||Group Theory|
|101/184||54%||Disordered Systems and Neural Networks||195/324||60%||Fluid Dynamics|
|282/512||55%||Pattern Formation and Solitons||216/358||60%||Functional Analysis|
|128/227||56%||Classical Analysis and ODEs||94/155||60%||Dynamical Systems|
|295/523||56%||Representation Theory||96/158||60%||Algebraic Topology|
|138/242||57%||Number Theory||206/337||61%||Symplectic Geometry|
|757/1327||57%||Strongly Correlated Electrons||41/67||61%||Symbolic Computation|
|4724/8190||57%||Quantum Algebra||103/168||61%||Classical Physics|
|2696/4666||57%||Exactly Solvable and Integrable Systems||54/88||61%||Complex Variables|
|766/1317||58%||Superconductivity||758/1229||61%||Mesoscopic Systems and Quantum Hall Effect|
|88/151||58%||Computational Physics||50/81||61%||Materials Science|
|1763/3011||58%||Algebraic Geometry||50/80||62%||Category Theory|
|8306/14161||58%||Mathematical Physics||76/120||63%||K-Theory and Homology|
|2072/3511||59%||Statistical Mechanics||59/93||63%||Instrumentation and Detectors|
|279/472||59%||Chaotic Dynamics||55/86||63%||Spectral Theory|
|58/98||59%||Analysis of PDEs||82/121||67%||Optics|
Performance by Country
These last few statistics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have location data for the most recent quarter-million guesses. So let’s look at how performance varies across the the globe. Here’s a ranking of correct guesses from countries with at least 2000 total guesses.
It looks like having English as a first language is not particularly helpful.
Performance by School
Finally, universities account for about 1/8th of the total number of guesses on arXiv vs. snarXiv. Altogether, their performance is almost exactly average (59%). However, there are variations… Here’s a ranking of schools with at least 400 total guesses.
|1145/1388||82%||University of Colorado at Boulder||560/963||58%||The University of Chicago|
|553/785||70%||University of Regensburg||278/481||57%||UC Santa Barbara|
|317/481||65%||University of Washington||264/461||57%||Madison|
|718/1097||65%||Penn State||542/967||56%||University of Cambridge|
|549/849||64%||Princeton University||476/861||55%||UC Davis|
|444/691||64%||Imperial College London||859/1577||54%||California Institute of Technology|
|349/544||64%||Monash University||393/723||54%||Harvard University|
|376/597||62%||University of Illinois at Urbana-Champaign||363/671||54%||Stanford University|
|287/457||62%||Hebrew University of Jerusalem||219/423||51%||Yale University|
|284/461||61%||The University of Edinburgh||308/599||51%||University of Minnesota|
|261/435||60%||Boston University||281/551||50%||University of Warwick|
Congratulations to the University of Colorado at Boulder, which is the clear winner here. Also, I just wanted to say: seriously Harvard? Seriously?
Finally, before heading into the comments, let me do a crapload of disclaiming. This is obviously the least scientific survey of science ever conducted. The ranking of a paper as “fake-sounding” or “realistic-sounding” has as much to do with the peculiarities of the snarXiv as with the arXiv itself. Also, although 750,000 guesses is a lot in total — such that I’m fairly certain that the 59% overall average isn’t going anywhere — the statistics get dicey when chopped into small bits (see what I did there?). To be sure of anything, I guess we’ll just have to wait until the blogosphere writes more papers.
- I couldn’t find an h-index ranking of physicists that was more recent than this one from 2005. I’m probably missing lots of names. Let me know and I’ll add them. [↩]
- In addition to being a top-notch physicist, Frederik also happens to be one of the world’s best arXiv vs. snarXiv players. [↩]
- When I told my software-startup friend about arXiv vs. snarXiv and mentioned that I wasn’t logging ip addresses, he looked at me very seriously and said: “You’re not logging ip addresses? You should always log ip addresses.” [↩]
- The high-scores leader “Ed” is from Portugal, which would have done very well in the rankings had I included his guesses. Unfortunately, “Ed” was cheating (probably already obvious to everyone, though I can also prove it with certitude), so I’ve removed all his guesses from this analysis. [↩]
- Or rather, congratulations to the two dudes at the University of Colorado at Boulder who together played over 118 games with a total of 1215 guesses and an average score of 85%. [↩]
- I already got slammed for this on marginalrevolution.com, and I’ll surely get slammed again. [↩]