The arXiv According to arXiv vs. snarXiv

After more than 3/4 of a million guesses, in over 50,000 games played in 67 countries, the results are clear: Science sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it's time to take a look at the results. Here's how the game works. The user sees two titles: one is the title of an actual theoretical high energy physics paper on the arXiv, and the other is a completely fake title randomly generated by the snarXiv. The user guesses which one is real, finds out if they're right or wrong, and then starts over with a new pair of titles.

I've been recording the result of each guess, originally just out of curiosity. I never expected to get reasonable statistics on the over 120,000 high energy theory papers on the arXiv. But after more than 750,000 guesses, that's exactly what I've got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let's take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the lowest percentage of correct guesses when users try to distinguish them from a randomly generated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, generating better statistics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

guessespercentpaper
36/13826%Highlights of the TheoryB. Z. Kopeliovich and R. Peschanski(June 1998)
41/15326%Heterotic on Half-flatSebastien Gurrieri, Andre Lukas and Andrei Micu(August 2004)
13/4827%Relativistic confinement of neutral fermions with a trigonometric tangent potentialLuis B. Castro and Antonio S. de Castro(November 2006)
13/4727%Toric Kahler metrics and AdS_5 in ring-like co-ordinatesBobby S. Acharya, Suresh Govindarajan and Chethan N. Gowdigere(December 2006)
9/3228%Aspects of U_A(1) breaking in the Nambu and Jona-Lasinio modelAlexander A. Osipov, Brigitte Hiller, Veronique Bernard and Alex H. Blin(July 2005)
35/11630%Energy's and amplitudes' positivityAlberto Nicolis, Riccardo Rattazzi and Enrico Trincherini(December 2009)
16/5330%A covariant diquark-quark model of the nucleon in the Salpeter approachVolker Keiner(March 1996)
51/16730%Noncommutative Bundles and Instantons in TehranGiovanni Landi and Walter van Suijlekom(March 2006)
38/12430%Baby steps beyond rainbow-ladderRichard Williams and Christian S. Fischer(May 2009)
13/4230%Transverse force on a moving vortex with the acoustic geometryPeng-ming Zhang, Li-ming Cao, Yi-shi Duan and Cheng-kui Zhong(January 2005)
44/13831%Testing factorizationGudrun Hiller(November 2001)
19/5932%Prospects for Mirage MediationAaron Pierce and Jesse Thaler(April 2006)
49/15232%Determining the dualArjan Keurentjes(July 2006)
11/3432%Gravitational Dressing of Renormalization GroupI. R. Klebanov, I. I. Kogan and A. M. Polyakov(September 1993)
53/16332%Charging Black Saturn?Brenda Chng, Robert Mann, Eugen Radu and Cristian Stelea(September 2008)

A tip for future arXiv vs. snarXiv players: the snarXiv is more grammatical than the arXiv. If you see "Heterotic on Half-flat" and think "uh... Half-flat what?" then you can be nearly certain it's a real scientific paper that was written to advance the boundaries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spectacularly low percentage of which were correct.

guessespercentpaper
2/1315%Actions and Fermionic symmetries for D-branes in bosonic backgroundsDonald Marolf, Luca Martucci and Pedro J. Silva(June 2003)
3/1520%CERN LEP2 constraint on 4D QED having dynamically generated spatial dimensionGi-Chol Cho, Etsuko Izumi and Akio Sugamoto(December 2001)
6/2722%Families as Neighbors in Extra DimensionG. Dvali and M. Shifman(January 2000)
6/2524%The Greening of Quantum Field Theory: George and IJulian Schwinger(October 1993)
3/1225%Dyonic solution of Horava-Lifshitz GravityEoin Ó Colgáin and Hossein Yavartanoo(April 2009)

Stump the Experts

People with all sorts of backgrounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely suspicious of ridiculous-sounding words that have been co-opted for technical purposes — "'Mirage Mediation?' that can't be real!" High energy physicists, on the other hand, are used to their own unfortunate verbiage. Unfortunately for them, however, there are still plenty of papers on the arXiv that sound like they were written by a computer.

Let's define an "expert" game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of September 16, 2010). Tallying up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

guessespercentpaper
1/616%FieldsW. Siegel(December 1999)
1/520%An extended model for monopole catalysis of nucleon decayY. Brihaye, D. Yu. Grigoriev, V. A. Rubakov and D. H. Tchrakian(November 2002)
1/520%Space-time symmetry restoration in cosmological models with Kalb--Ramond and scalar fieldsE. Di Grezia, G. Mangano and G. Miele(July 2004)
1/520%Towards Evaluation of Stringy Non-Perturbative EffectsR. Brustein and B. A. Ovrut(November 1995)
1/520%Dimensional ReductionCorinne A. Manogue and Tevian Dray(July 1998)
1/520%The Ridge, the Glasma and FlowLarry McLerran(December 2008)
1/520%Generalized Bunching Parameters and Multiplicity Fluctuations in Restricted Phase-Space BinsS. V. Chekanov, W. Kittel and V. I. Kuvshinov(June 1996)
2/728%BGWM as Second Constituent of Complex Matrix ModelA. Alexandrov, A. Mironov and A. Morozov(June 2009)
3/1030%Testing factorizationGudrun Hiller(November 2001)
2/633%Supersymmetric Potentials in Einstein-Cartan-Brans-Dicke CosmologyL. C. Garcia de Andrade(April 2001)

I have to say, some of these definitely sound like they came straight from the snarXiv. I really don't know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physicists

Ok, so the papers above sound especially ridiculous. However, the average across all 750,000 guesses on all papers is still only 59% correct. While better than a monkey, this is not particularly good. Who's responsible? Surely not the world's top minds?

Here's a ranking of some of the most-highly cited physicists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), according to the percentage of correct guesses on their papers. ((I couldn't find an h-index ranking of physicists that was more recent than this one from 2005. I'm probably missing lots of names. Let me know and I'll add them. )) A smaller percentage means that their papers sound more like complete flapdoodle. I should note for the sake of my career that this has absolutely nothing to do with the quality of said papers.

178/36049%Frederik Denef246/42957%Neil Turok289/48859%Joseph Polchinski
129/25350%A. M. Polyakov310/53957%Lisa Randall573/96359%Edward Witten
69/13551%Steven Weinberg353/61257%Dimitri Nanopoulos1194/198160%John Ellis
189/35752%Howard Georgi209/36257%Herman Verlinde368/61060%Shamit Kachru
764/141953%Cumrun Vafa450/77857%Gia Dvali315/52060%Roman Jackiw
332/61553%Leonard Susskind297/51357%Nima Arkani-Hamed311/51060%Hirosi Ooguri
6/1154%H. David Politzer416/71758%Nathan Seiberg592/96661%Hitoshi Murayama
63/11554%Gerard t Hooft94/16258%Lawrence M. Krauss118/19261%David J. Gross
282/51454%Michael B. Green368/63358%Thomas Banks381/60662%Lawrence J. Hall
348/62255%Frank Wilczek535/91758%Igor R. Klebanov118/18663%Stephen Hawking
164/29355%Erik Verlinde412/70558%Steven S. Gubser383/60063%Aneesh V. Manohar
109/19456%Sheldon Glashow356/60359%Juan Maldacena170/25766%Michael Peskin
406/71856%Savas Dimopoulos481/81359%Andrew Strominger222/32668%John H. Schwarz
351/61557%Mark B. Wise174/29459%Brian R. Greene

I'd especially like to congratulate Frederik on his anomalously low 49 percent. You make us all worse than a monkey, Frederik.1

The Blogosphere

Now let's turn to even more famous people: physics bloggers (and authors). Here are some of the most prominent, ranked from most fake-sounding papers (smallest percentage) to least fake-sounding papers (largest percentage). I think there's a lesson here somewhere, though it's hard to be sure in some cases, due to small statistics.

Fake-Sounding and Real-Sounding Words

Suppose you're writing a scientific paper, and you want to ensure that the general public doesn't think it's complete malarkey. How do you do it? Here are the 10 words with the lowest percentage of correct guesses (most fake-sounding) for titles containing those words (to ensure no single paper dominates this percentage, I'm requiring that each word appear in at least 5 titles).

174/52133%Saturn76/19538%multiskyrmions
66/19633%half-flat100/25239%secret
69/18936%charging99/24939%perturbing
54/14736%caustic78/19440%pollution
80/20838%highlights87/21440%enough

Avoid these words! Turns out people don't believe in "multiskyrmions." Also, you shouldn't mention "Saturn," or use normal english words like "secret" or "enough." By contrast, here's a list of the 10 words with the highest percentage of correct guesses (most realistic-sounding) for titles containing those words.

76/10076%cp-even87/12271%spin-spin
75/10174%Argon90/12770%anomaly-free
140/19173%two-particle127/18070%atlas
74/10272%self-coupling70/10070%supersymmetry-breaking
84/11771%unusual128/18369%naked

In other words, if you want to be taken seriously as a scientist, you should call your next paper Unusual Naked, but Anomaly-Free.

Incidence of Apparent Hooey in Various Subfields

Papers on the arXiv can be associated with one or more physics subfields. Here's a ranking of subfields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/8148%Adaptation and Self-Organizing Systems134/22659%Combinatorics
48/9848%Popular Physics638/107559%Other
172/34050%Data Analysis, Statistics and Probability116/19559%Accelerator Physics
212/41451%History of Physics140/23559%Soft Condensed Matter
167/30754%Operator Algebras1898/317259%Differential Geometry
136/24854%Rings and Algebras114/19060%Group Theory
101/18454%Disordered Systems and Neural Networks195/32460%Fluid Dynamics
282/51255%Pattern Formation and Solitons216/35860%Functional Analysis
128/22756%Classical Analysis and ODEs94/15560%Dynamical Systems
295/52356%Representation Theory96/15860%Algebraic Topology
39/6956%Probability500/82060%Atomic Physics
67/11856%Geophysics180/29561%Geometric Topology
138/24257%Number Theory206/33761%Symplectic Geometry
757/132757%Strongly Correlated Electrons41/6761%Symbolic Computation
4724/819057%Quantum Algebra103/16861%Classical Physics
2696/466657%Exactly Solvable and Integrable Systems54/8861%Complex Variables
766/131758%Superconductivity758/122961%Mesoscopic Systems and Quantum Hall Effect
88/15158%Computational Physics50/8161%Materials Science
1763/301158%Algebraic Geometry50/8062%Category Theory
8306/1416158%Mathematical Physics76/12063%K-Theory and Homology
2072/351159%Statistical Mechanics59/9363%Instrumentation and Detectors
279/47259%Chaotic Dynamics55/8663%Spectral Theory
58/9859%Analysis of PDEs82/12167%Optics
99/16759%Plasma Physics

Performance by Country

These last few statistics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have location data for the most recent quarter-million guesses.2 So let's look at how performance varies across the the globe. Here's a ranking of correct guesses from countries with at least 2000 total guesses.3

1400/225662%Austria2393/418957%Japan
10665/1746761%Germany89632/15805956%United States
3111/518360%Israel12137/2147156%United Kingdom
1658/282558%Spain7233/1305355%Canada
4134/708058%Italy2281/418354%India
5804/996058%France1652/303754%Finland
2355/407157%Switzerland2270/426653%Russian Federation
3485/608357%Australia3496/659353%Netherlands
1690/295857%Sweden1061/200153%Argentina

It looks like having English as a first language is not particularly helpful.

Performance by School

Finally, universities account for about 1/8th of the total number of guesses on arXiv vs. snarXiv. Altogether, their performance is almost exactly average (59%). However, there are variations... Here's a ranking of schools with at least 400 total guesses.

1145/138882%University of Colorado at Boulder560/96358%The University of Chicago
553/78570%University of Regensburg278/48157%UC Santa Barbara
317/48165%University of Washington264/46157%Madison
718/109765%Penn State542/96756%University of Cambridge
1001/153865%Berkeley237/42655%Cornell University
549/84964%Princeton University476/86155%UC Davis
1277/198164%MIT471/85555%Columbia University
444/69164%Imperial College London859/157754%California Institute of Technology
349/54464%Monash University393/72354%Harvard University
376/59762%University of Illinois at Urbana-Champaign363/67154%Stanford University
287/45762%Hebrew University of Jerusalem219/42351%Yale University
284/46161%The University of Edinburgh308/59951%University of Minnesota
261/43560%Boston University281/55150%University of Warwick

Congratulations to the University of Colorado at Boulder, which is the clear winner here.4 Also, I just wanted to say: seriously Harvard? Seriously?

Disclaimer

Finally, before heading into the comments, let me do a crapload of disclaiming. This is obviously the least scientific survey of science ever conducted. The ranking of a paper as "fake-sounding" or "realistic-sounding" has as much to do with the peculiarities of the snarXiv as with the arXiv itself.5 Also, although 750,000 guesses is a lot in total — such that I'm fairly certain that the 59% overall average isn't going anywhere — the statistics get dicey when chopped into small bits (see what I did there?). To be sure of anything, I guess we'll just have to wait until the blogosphere writes more papers.

  1. In addition to being a top-notch physicist, Frederik also happens to be one of the world's best arXiv vs. snarXiv players.
  2. When I told my software-startup friend about arXiv vs. snarXiv and mentioned that I wasn't logging ip addresses, he looked at me very seriously and said: "You're not logging ip addresses? You should always log ip addresses."
  3. The high-scores leader "Ed" is from Portugal, which would have done very well in the rankings had I included his guesses. Unfortunately, "Ed" was cheating (probably already obvious to everyone, though I can also prove it with certitude), so I've removed all his guesses from this analysis.
  4. Or rather, congratulations to the two dudes at the University of Colorado at Boulder who together played over 118 games with a total of 1215 guesses and an average score of 85%.
  5. I already got slammed for this on marginalrevolution.com, and I'll surely get slammed again.

Comments

  1. ErkcanMar 8 2011 04:21:35
    Hi, Excellent study. One thing as an experimentalist: Why don't you add some statistical error to the individual scores on arxiv vs. snarxiv? While I was playing it, I felt my score was slowly converging on to a particular value and thus I felt it would be nice to see the uncertainty. I am assuming that my skill is indeed measurable. The best estimator would be indeed by n_correct/n_total. At the very least we can assume binomial errors on this quantity. So when the game reports 3/4=75%, it could instead say 75+-21%. Also given that you have the "stump the experts" section, you must be keeping data that allows extracting more detailed statistics of the players? Would you mind publishing histograms of how many guesses per game are played? Distribution of the results? (ie. do the scores distribute like a Gaussian? what is the rms? etc.) Anyway, cool stuff. I am upset that I heard about the whole thing so late... e.
  2. mgaryApr 10 2012 10:01:45
    You might want to check the dates on the submissions from Boulder, I suspect you'll find that they occurred during TASI 2010. There was a little bit of a contest going on to see who could get the longest streak of consecutive correct guesses.
  3. jelmerApr 23 2012 15:07:15
    The first 10 attempts I answered wrong, but after a while you start to see a pattern and improve. Maybe you can only show the score after every N answers, and not show right/wrong to prevent learning the pattern of fake paper names.
  4. AndrewFeb 6 2013 15:18:02
    14 of 20 At least for the ones I received, it seemed the real article titles tended to be more declarative of experimental results and specific quantities than theoretical discussion. I also leaned more towards simpler titles with less jargon or jargon-y words (not that I'd have any chance of distinguishing what are and aren't real terms). While the sampling isn't random according to the stats page, it makes intuitive sense to me that the fake submissions would strongly tend towards long, jargon-filled titles. Not that physics papers are known for simple, concise titles, but I'd also think that the real titles would be at least slightly biased towards simplicity and concision. This line of thought makes me wonder if some familiarity with the subject matter (but below actual expertise) might actually hurt someone's performance as they might focus more on the titles' subject matter rather than factors that might be better indicative of if an individual title is real or not. It's probably just cognitive bias, but I think I might be able to at least keep up with the actual physicists (or at least beat that damn monkey) as a layperson over a larger number of trials.
  5. BonnieBJul 2 2013 18:47:33
    This is hilarious! After my first 30 or so tries I was a "Physics Grad" but soon dropped off to dumber than a monkey :-) ... I wanted to see if there was a pattern and then just picked randomly. Great fun! I'm a retired science librarian with BA in mammalian zoology but loved physics once I was working in a sci/tech library. I am happy to see I still have some talent at picking out snark. Must be thanks to the astrophysicists I worked for at the end of my career :-)