Posted in Humor, Physics, Projects

The arXiv According to arXiv vs. snarXiv

After more than 3/4 of a million guesses, in over 50,000 games played in 67 countries, the results are clear: Science sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it’s time to take a look at the results. Here’s how the game works. The user sees two titles: one is the title of an actual theoretical high energy physics paper on the arXiv, and the other is a completely fake title randomly generated by the snarXiv. The user guesses which one is real, finds out if they’re right or wrong, and then starts over with a new pair of titles.

I’ve been recording the result of each guess, originally just out of curiosity. I never expected to get reasonable statistics on the over 120,000 high energy theory papers on the arXiv. But after more than 750,000 guesses, that’s exactly what I’ve got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let’s take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the lowest percentage of correct guesses when users try to distinguish them from a randomly generated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, generating better statistics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

guesses percent paper
36/138 26% Highlights of the TheoryB. Z. Kopeliovich and R. Peschanski(June 1998)
41/153 26% Heterotic on Half-flatSebastien Gurrieri, Andre Lukas and Andrei Micu(August 2004)
13/48 27% Relativistic confinement of neutral fermions with a trigonometric tangent potentialLuis B. Castro and Antonio S. de Castro(November 2006)
13/47 27% Toric Kahler metrics and AdS_5 in ring-like co-ordinatesBobby S. Acharya, Suresh Govindarajan and Chethan N. Gowdigere(December 2006)
9/32 28% Aspects of U_A(1) breaking in the Nambu and Jona-Lasinio modelAlexander A. Osipov, Brigitte Hiller, Veronique Bernard and Alex H. Blin(July 2005)
35/116 30% Energy’s and amplitudes’ positivityAlberto Nicolis, Riccardo Rattazzi and Enrico Trincherini(December 2009)
16/53 30% A covariant diquark-quark model of the nucleon in the Salpeter approachVolker Keiner(March 1996)
51/167 30% Noncommutative Bundles and Instantons in TehranGiovanni Landi and Walter van Suijlekom(March 2006)
38/124 30% Baby steps beyond rainbow-ladderRichard Williams and Christian S. Fischer(May 2009)
13/42 30% Transverse force on a moving vortex with the acoustic geometryPeng-ming Zhang, Li-ming Cao, Yi-shi Duan and Cheng-kui Zhong(January 2005)
44/138 31% Testing factorizationGudrun Hiller(November 2001)
19/59 32% Prospects for Mirage MediationAaron Pierce and Jesse Thaler(April 2006)
49/152 32% Determining the dualArjan Keurentjes(July 2006)
11/34 32% Gravitational Dressing of Renormalization GroupI. R. Klebanov, I. I. Kogan and A. M. Polyakov(September 1993)
53/163 32% Charging Black Saturn?Brenda Chng, Robert Mann, Eugen Radu and Cristian Stelea(September 2008)

A tip for future arXiv vs. snarXiv players: the snarXiv is more grammatical than the arXiv. If you see “Heterotic on Half-flat” and think “uh… Half-flat what?” then you can be nearly certain it’s a real scientific paper that was written to advance the boundaries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spectacularly low percentage of which were correct.

guesses percent paper
2/13 15% Actions and Fermionic symmetries for D-branes in bosonic backgroundsDonald Marolf, Luca Martucci and Pedro J. Silva(June 2003)
3/15 20% CERN LEP2 constraint on 4D QED having dynamically generated spatial dimensionGi-Chol Cho, Etsuko Izumi and Akio Sugamoto(December 2001)
6/27 22% Families as Neighbors in Extra DimensionG. Dvali and M. Shifman(January 2000)
6/25 24% The Greening of Quantum Field Theory: George and IJulian Schwinger(October 1993)
3/12 25% Dyonic solution of Horava-Lifshitz GravityEoin Ó Colgáin and Hossein Yavartanoo(April 2009)

Stump the Experts

People with all sorts of backgrounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely suspicious of ridiculous-sounding words that have been co-opted for technical purposes — “‘Mirage Mediation?’ that can’t be real!” High energy physicists, on the other hand, are used to their own unfortunate verbiage. Unfortunately for them, however, there are still plenty of papers on the arXiv that sound like they were written by a computer.

Let’s define an “expert” game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of September 16, 2010). Tallying up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

guesses percent paper
1/6 16% FieldsW. Siegel(December 1999)
1/5 20% An extended model for monopole catalysis of nucleon decayY. Brihaye, D. Yu. Grigoriev, V. A. Rubakov and D. H. Tchrakian(November 2002)
1/5 20% Space-time symmetry restoration in cosmological models with Kalb–Ramond and scalar fieldsE. Di Grezia, G. Mangano and G. Miele(July 2004)
1/5 20% Towards Evaluation of Stringy Non-Perturbative EffectsR. Brustein and B. A. Ovrut(November 1995)
1/5 20% Dimensional ReductionCorinne A. Manogue and Tevian Dray(July 1998)
1/5 20% The Ridge, the Glasma and FlowLarry McLerran(December 2008)
1/5 20% Generalized Bunching Parameters and Multiplicity Fluctuations in Restricted Phase-Space BinsS. V. Chekanov, W. Kittel and V. I. Kuvshinov(June 1996)
2/7 28% BGWM as Second Constituent of Complex Matrix ModelA. Alexandrov, A. Mironov and A. Morozov(June 2009)
3/10 30% Testing factorizationGudrun Hiller(November 2001)
2/6 33% Supersymmetric Potentials in Einstein-Cartan-Brans-Dicke CosmologyL. C. Garcia de Andrade(April 2001)

I have to say, some of these definitely sound like they came straight from the snarXiv. I really don’t know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physicists

Ok, so the papers above sound especially ridiculous. However, the average across all 750,000 guesses on all papers is still only 59% correct. While better than a monkey, this is not particularly good. Who’s responsible? Surely not the world’s top minds?

Here’s a ranking of some of the most-highly cited physicists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), according to the percentage of correct guesses on their papers.[1] A smaller percentage means that their papers sound more like complete flapdoodle. I should note for the sake of my career that this has absolutely nothing to do with the quality of said papers.

178/360 49% Frederik Denef 246/429 57% Neil Turok 289/488 59% Joseph Polchinski
129/253 50% A. M. Polyakov 310/539 57% Lisa Randall 573/963 59% Edward Witten
69/135 51% Steven Weinberg 353/612 57% Dimitri Nanopoulos 1194/1981 60% John Ellis
189/357 52% Howard Georgi 209/362 57% Herman Verlinde 368/610 60% Shamit Kachru
764/1419 53% Cumrun Vafa 450/778 57% Gia Dvali 315/520 60% Roman Jackiw
332/615 53% Leonard Susskind 297/513 57% Nima Arkani-Hamed 311/510 60% Hirosi Ooguri
6/11 54% H. David Politzer 416/717 58% Nathan Seiberg 592/966 61% Hitoshi Murayama
63/115 54% Gerard t Hooft 94/162 58% Lawrence M. Krauss 118/192 61% David J. Gross
282/514 54% Michael B. Green 368/633 58% Thomas Banks 381/606 62% Lawrence J. Hall
348/622 55% Frank Wilczek 535/917 58% Igor R. Klebanov 118/186 63% Stephen Hawking
164/293 55% Erik Verlinde 412/705 58% Steven S. Gubser 383/600 63% Aneesh V. Manohar
109/194 56% Sheldon Glashow 356/603 59% Juan Maldacena 170/257 66% Michael Peskin
406/718 56% Savas Dimopoulos 481/813 59% Andrew Strominger 222/326 68% John H. Schwarz
351/615 57% Mark B. Wise 174/294 59% Brian R. Greene

I’d especially like to congratulate Frederik on his anomalously low 49 percent. You make us all worse than a monkey, Frederik.[2]

The Blogosphere

Now let’s turn to even more famous people: physics bloggers (and authors). Here are some of the most prominent, ranked from most fake-sounding papers (smallest percentage) to least fake-sounding papers (largest percentage). I think there’s a lesson here somewhere, though it’s hard to be sure in some cases, due to small statistics.

117/217 53% Sean M. Carroll [blog] 68/112 60% Lubos Motl [blog]
129/234 55% Jacques Distler [blog] 202/331 61% Mark Trodden [blog]
343/609 56% Clifford V. Johnson [blog] 98/159 61% Sabine Hossenfelder [blog]
56/97 57% John C. Baez [blog] 220/352 62% Lee Smolin [site]
306/505 60% JoAnne Hewett [blog] 6/8 75% Peter Woit [blog]

Fake-Sounding and Real-Sounding Words

Suppose you’re writing a scientific paper, and you want to ensure that the general public doesn’t think it’s complete malarkey. How do you do it? Here are the 10 words with the lowest percentage of correct guesses (most fake-sounding) for titles containing those words (to ensure no single paper dominates this percentage, I’m requiring that each word appear in at least 5 titles).

174/521 33% Saturn 76/195 38% multiskyrmions
66/196 33% half-flat 100/252 39% secret
69/189 36% charging 99/249 39% perturbing
54/147 36% caustic 78/194 40% pollution
80/208 38% highlights 87/214 40% enough

Avoid these words! Turns out people don’t believe in “multiskyrmions.” Also, you shouldn’t mention “Saturn,” or use normal english words like “secret” or “enough.” By contrast, here’s a list of the 10 words with the highest percentage of correct guesses (most realistic-sounding) for titles containing those words.

76/100 76% cp-even 87/122 71% spin-spin
75/101 74% Argon 90/127 70% anomaly-free
140/191 73% two-particle 127/180 70% atlas
74/102 72% self-coupling 70/100 70% supersymmetry-breaking
84/117 71% unusual 128/183 69% naked

In other words, if you want to be taken seriously as a scientist, you should call your next paper Unusual Naked, but Anomaly-Free.

Incidence of Apparent Hooey in Various Subfields

Papers on the arXiv can be associated with one or more physics subfields. Here’s a ranking of subfields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/81 48% Adaptation and Self-Organizing Systems 134/226 59% Combinatorics
48/98 48% Popular Physics 638/1075 59% Other
172/340 50% Data Analysis, Statistics and Probability 116/195 59% Accelerator Physics
212/414 51% History of Physics 140/235 59% Soft Condensed Matter
167/307 54% Operator Algebras 1898/3172 59% Differential Geometry
136/248 54% Rings and Algebras 114/190 60% Group Theory
101/184 54% Disordered Systems and Neural Networks 195/324 60% Fluid Dynamics
282/512 55% Pattern Formation and Solitons 216/358 60% Functional Analysis
128/227 56% Classical Analysis and ODEs 94/155 60% Dynamical Systems
295/523 56% Representation Theory 96/158 60% Algebraic Topology
39/69 56% Probability 500/820 60% Atomic Physics
67/118 56% Geophysics 180/295 61% Geometric Topology
138/242 57% Number Theory 206/337 61% Symplectic Geometry
757/1327 57% Strongly Correlated Electrons 41/67 61% Symbolic Computation
4724/8190 57% Quantum Algebra 103/168 61% Classical Physics
2696/4666 57% Exactly Solvable and Integrable Systems 54/88 61% Complex Variables
766/1317 58% Superconductivity 758/1229 61% Mesoscopic Systems and Quantum Hall Effect
88/151 58% Computational Physics 50/81 61% Materials Science
1763/3011 58% Algebraic Geometry 50/80 62% Category Theory
8306/14161 58% Mathematical Physics 76/120 63% K-Theory and Homology
2072/3511 59% Statistical Mechanics 59/93 63% Instrumentation and Detectors
279/472 59% Chaotic Dynamics 55/86 63% Spectral Theory
58/98 59% Analysis of PDEs 82/121 67% Optics
99/167 59% Plasma Physics

Performance by Country

These last few statistics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have location data for the most recent quarter-million guesses.[3] So let’s look at how performance varies across the the globe. Here’s a ranking of correct guesses from countries with at least 2000 total guesses.[4]

1400/2256 62% Austria 2393/4189 57% Japan
10665/17467 61% Germany 89632/158059 56% United States
3111/5183 60% Israel 12137/21471 56% United Kingdom
1658/2825 58% Spain 7233/13053 55% Canada
4134/7080 58% Italy 2281/4183 54% India
5804/9960 58% France 1652/3037 54% Finland
2355/4071 57% Switzerland 2270/4266 53% Russian Federation
3485/6083 57% Australia 3496/6593 53% Netherlands
1690/2958 57% Sweden 1061/2001 53% Argentina

It looks like having English as a first language is not particularly helpful.

Performance by School

Finally, universities account for about 1/8th of the total number of guesses on arXiv vs. snarXiv. Altogether, their performance is almost exactly average (59%). However, there are variations… Here’s a ranking of schools with at least 400 total guesses.

1145/1388 82% University of Colorado at Boulder 560/963 58% The University of Chicago
553/785 70% University of Regensburg 278/481 57% UC Santa Barbara
317/481 65% University of Washington 264/461 57% Madison
718/1097 65% Penn State 542/967 56% University of Cambridge
1001/1538 65% Berkeley 237/426 55% Cornell University
549/849 64% Princeton University 476/861 55% UC Davis
1277/1981 64% MIT 471/855 55% Columbia University
444/691 64% Imperial College London 859/1577 54% California Institute of Technology
349/544 64% Monash University 393/723 54% Harvard University
376/597 62% University of Illinois at Urbana-Champaign 363/671 54% Stanford University
287/457 62% Hebrew University of Jerusalem 219/423 51% Yale University
284/461 61% The University of Edinburgh 308/599 51% University of Minnesota
261/435 60% Boston University 281/551 50% University of Warwick

Congratulations to the University of Colorado at Boulder, which is the clear winner here.[5] Also, I just wanted to say: seriously Harvard? Seriously?


Finally, before heading into the comments, let me do a crapload of disclaiming. This is obviously the least scientific survey of science ever conducted. The ranking of a paper as “fake-sounding” or “realistic-sounding” has as much to do with the peculiarities of the snarXiv as with the arXiv itself.[6] Also, although 750,000 guesses is a lot in total — such that I’m fairly certain that the 59% overall average isn’t going anywhere — the statistics get dicey when chopped into small bits (see what I did there?). To be sure of anything, I guess we’ll just have to wait until the blogosphere writes more papers.

  1. I couldn’t find an h-index ranking of physicists that was more recent than this one from 2005. I’m probably missing lots of names. Let me know and I’ll add them. []
  2. In addition to being a top-notch physicist, Frederik also happens to be one of the world’s best arXiv vs. snarXiv players. []
  3. When I told my software-startup friend about arXiv vs. snarXiv and mentioned that I wasn’t logging ip addresses, he looked at me very seriously and said: “You’re not logging ip addresses? You should always log ip addresses.” []
  4. The high-scores leader “Ed” is from Portugal, which would have done very well in the rankings had I included his guesses. Unfortunately, “Ed” was cheating (probably already obvious to everyone, though I can also prove it with certitude), so I’ve removed all his guesses from this analysis. []
  5. Or rather, congratulations to the two dudes at the University of Colorado at Boulder who together played over 118 games with a total of 1215 guesses and an average score of 85%. []
  6. I already got slammed for this on marginalrevolution.com, and I’ll surely get slammed again. []

7 Responses to “The arXiv According to arXiv vs. snarXiv” Comment Feed, Comments are closed.

  1. Pingback: Quick Links | A Blog Around The Clock Sep 19, 2010 @ 7:01 pm

    […] The arXiv According to arXiv vs. snarXiv […]

  2. Erkcan says: Mar 08, 2011 @ 4:21 am


    Excellent study. One thing as an experimentalist: Why don’t you add some statistical error to the individual scores on arxiv vs. snarxiv? While I was playing it, I felt my score was slowly converging on to a particular value and thus I felt it would be nice to see the uncertainty. I am assuming that my skill is indeed measurable. The best estimator would be indeed by n_correct/n_total. At the very least we can assume binomial errors on this quantity. So when the game reports 3/4=75%, it could instead say 75+-21%.

    Also given that you have the “stump the experts” section, you must be keeping data that allows extracting more detailed statistics of the players? Would you mind publishing histograms of how many guesses per game are played? Distribution of the results? (ie. do the scores distribute like a Gaussian? what is the rms? etc.)

    Anyway, cool stuff. I am upset that I heard about the whole thing so late…


  3. mgary says: Apr 10, 2012 @ 10:01 am

    You might want to check the dates on the submissions from Boulder, I suspect you’ll find that they occurred during TASI 2010. There was a little bit of a contest going on to see who could get the longest streak of consecutive correct guesses.

  4. jelmer says: Apr 23, 2012 @ 3:07 pm

    The first 10 attempts I answered wrong, but after a while you start to see a pattern and improve. Maybe you can only show the score after every N answers, and not show right/wrong to prevent learning the pattern of fake paper names.

  5. Andrew says: Feb 06, 2013 @ 3:18 pm

    14 of 20

    At least for the ones I received, it seemed the real article titles tended to be more declarative of experimental results and specific quantities than theoretical discussion. I also leaned more towards simpler titles with less jargon or jargon-y words (not that I’d have any chance of distinguishing what are and aren’t real terms). While the sampling isn’t random according to the stats page, it makes intuitive sense to me that the fake submissions would strongly tend towards long, jargon-filled titles. Not that physics papers are known for simple, concise titles, but I’d also think that the real titles would be at least slightly biased towards simplicity and concision.

    This line of thought makes me wonder if some familiarity with the subject matter (but below actual expertise) might actually hurt someone’s performance as they might focus more on the titles’ subject matter rather than factors that might be better indicative of if an individual title is real or not.

    It’s probably just cognitive bias, but I think I might be able to at least keep up with the actual physicists (or at least beat that damn monkey) as a layperson over a larger number of trials.

  6. BonnieB says: Jul 02, 2013 @ 6:47 pm

    This is hilarious! After my first 30 or so tries I was a “Physics Grad” but soon dropped off to dumber than a monkey :-) … I wanted to see if there was a pattern and then just picked randomly. Great fun! I’m a retired science librarian with BA in mammalian zoology but loved physics once I was working in a sci/tech library. I am happy to see I still have some talent at picking out snark. Must be thanks to the astrophysicists I worked for at the end of my career :-)

  7. Pingback: ONCOgen: the Clinical Trial Generator | Michael Harrison — The Monograph Oct 25, 2016 @ 7:02 pm

    […] from a database of such papers and the other computer generated. The overall correct guess rate sits around 59%. I went 1 for 6 on my first […]