The arXiv According to arXiv vs. snarXiv

The arXiv According to arXiv vs. snarXivSep 17, 2010

After more than 3/4 of a million guesses, in over 50,000 games played in 67 countries, the results are clear: Science sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it's time to take a look at the results. Here's how the game works. The user sees two titles: one is the title of an actual theoretical high energy physics paper on the arXiv, and the other is a completely fake title randomly generated by the snarXiv. The user guesses which one is real, finds out if they're right or wrong, and then starts over with a new pair of titles.

I've been recording the result of each guess, originally just out of curiosity. I never expected to get reasonable statistics on the over 120,000 high energy theory papers on the arXiv. But after more than 750,000 guesses, that's exactly what I've got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let's take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the lowest percentage of correct guesses when users try to distinguish them from a randomly generated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, generating better statistics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

guesses	percent	paper
36/138	26%	Highlights of the TheoryB. Z. Kopeliovich and R. Peschanski(June 1998)
41/153	26%	Heterotic on Half-flatSebastien Gurrieri, Andre Lukas and Andrei Micu(August 2004)
13/48	27%	Relativistic confinement of neutral fermions with a trigonometric tangent potentialLuis B. Castro and Antonio S. de Castro(November 2006)
13/47	27%	Toric Kahler metrics and AdS_5 in ring-like co-ordinatesBobby S. Acharya, Suresh Govindarajan and Chethan N. Gowdigere(December 2006)
9/32	28%	Aspects of U_A(1) breaking in the Nambu and Jona-Lasinio modelAlexander A. Osipov, Brigitte Hiller, Veronique Bernard and Alex H. Blin(July 2005)
35/116	30%	Energy's and amplitudes' positivityAlberto Nicolis, Riccardo Rattazzi and Enrico Trincherini(December 2009)
16/53	30%	A covariant diquark-quark model of the nucleon in the Salpeter approachVolker Keiner(March 1996)
51/167	30%	Noncommutative Bundles and Instantons in TehranGiovanni Landi and Walter van Suijlekom(March 2006)
38/124	30%	Baby steps beyond rainbow-ladderRichard Williams and Christian S. Fischer(May 2009)
13/42	30%	Transverse force on a moving vortex with the acoustic geometryPeng-ming Zhang, Li-ming Cao, Yi-shi Duan and Cheng-kui Zhong(January 2005)
44/138	31%	Testing factorizationGudrun Hiller(November 2001)
19/59	32%	Prospects for Mirage MediationAaron Pierce and Jesse Thaler(April 2006)
49/152	32%	Determining the dualArjan Keurentjes(July 2006)
11/34	32%	Gravitational Dressing of Renormalization GroupI. R. Klebanov, I. I. Kogan and A. M. Polyakov(September 1993)
53/163	32%	Charging Black Saturn?Brenda Chng, Robert Mann, Eugen Radu and Cristian Stelea(September 2008)

A tip for future arXiv vs. snarXiv players: the snarXiv is more grammatical than the arXiv. If you see "Heterotic on Half-flat" and think "uh... Half-flat what?" then you can be nearly certain it's a real scientific paper that was written to advance the boundaries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spectacularly low percentage of which were correct.

guesses	percent	paper
2/13	15%	Actions and Fermionic symmetries for D-branes in bosonic backgroundsDonald Marolf, Luca Martucci and Pedro J. Silva(June 2003)
3/15	20%	CERN LEP2 constraint on 4D QED having dynamically generated spatial dimensionGi-Chol Cho, Etsuko Izumi and Akio Sugamoto(December 2001)
6/27	22%	Families as Neighbors in Extra DimensionG. Dvali and M. Shifman(January 2000)
6/25	24%	The Greening of Quantum Field Theory: George and IJulian Schwinger(October 1993)
3/12	25%	Dyonic solution of Horava-Lifshitz GravityEoin Ó Colgáin and Hossein Yavartanoo(April 2009)

Stump the Experts

People with all sorts of backgrounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely suspicious of ridiculous-sounding words that have been co-opted for technical purposes — "'Mirage Mediation?' that can't be real!" High energy physicists, on the other hand, are used to their own unfortunate verbiage. Unfortunately for them, however, there are still plenty of papers on the arXiv that sound like they were written by a computer.

Let's define an "expert" game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of September 16, 2010). Tallying up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

guesses	percent	paper
1/6	16%	FieldsW. Siegel(December 1999)
1/5	20%	An extended model for monopole catalysis of nucleon decayY. Brihaye, D. Yu. Grigoriev, V. A. Rubakov and D. H. Tchrakian(November 2002)
1/5	20%	Space-time symmetry restoration in cosmological models with Kalb--Ramond and scalar fieldsE. Di Grezia, G. Mangano and G. Miele(July 2004)
1/5	20%	Towards Evaluation of Stringy Non-Perturbative EffectsR. Brustein and B. A. Ovrut(November 1995)
1/5	20%	Dimensional ReductionCorinne A. Manogue and Tevian Dray(July 1998)
1/5	20%	The Ridge, the Glasma and FlowLarry McLerran(December 2008)
1/5	20%	Generalized Bunching Parameters and Multiplicity Fluctuations in Restricted Phase-Space BinsS. V. Chekanov, W. Kittel and V. I. Kuvshinov(June 1996)
2/7	28%	BGWM as Second Constituent of Complex Matrix ModelA. Alexandrov, A. Mironov and A. Morozov(June 2009)
3/10	30%	Testing factorizationGudrun Hiller(November 2001)
2/6	33%	Supersymmetric Potentials in Einstein-Cartan-Brans-Dicke CosmologyL. C. Garcia de Andrade(April 2001)

I have to say, some of these definitely sound like they came straight from the snarXiv. I really don't know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physicists

Ok, so the papers above sound especially ridiculous. However, the average across all 750,000 guesses on all papers is still only 59% correct. While better than a monkey, this is not particularly good. Who's responsible? Surely not the world's top minds?

Here's a ranking of some of the most-highly cited physicists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), according to the percentage of correct guesses on their papers. ((I couldn't find an h-index ranking of physicists that was more recent than this one from 2005. I'm probably missing lots of names. Let me know and I'll add them. )) A smaller percentage means that their papers sound more like complete flapdoodle. I should note for the sake of my career that this has absolutely nothing to do with the quality of said papers.

178/360	49%	Frederik Denef	246/429	57%	Neil Turok	289/488	59%	Joseph Polchinski
129/253	50%	A. M. Polyakov	310/539	57%	Lisa Randall	573/963	59%	Edward Witten
69/135	51%	Steven Weinberg	353/612	57%	Dimitri Nanopoulos	1194/1981	60%	John Ellis
189/357	52%	Howard Georgi	209/362	57%	Herman Verlinde	368/610	60%	Shamit Kachru
764/1419	53%	Cumrun Vafa	450/778	57%	Gia Dvali	315/520	60%	Roman Jackiw
332/615	53%	Leonard Susskind	297/513	57%	Nima Arkani-Hamed	311/510	60%	Hirosi Ooguri
6/11	54%	H. David Politzer	416/717	58%	Nathan Seiberg	592/966	61%	Hitoshi Murayama
63/115	54%	Gerard t Hooft	94/162	58%	Lawrence M. Krauss	118/192	61%	David J. Gross
282/514	54%	Michael B. Green	368/633	58%	Thomas Banks	381/606	62%	Lawrence J. Hall
348/622	55%	Frank Wilczek	535/917	58%	Igor R. Klebanov	118/186	63%	Stephen Hawking
164/293	55%	Erik Verlinde	412/705	58%	Steven S. Gubser	383/600	63%	Aneesh V. Manohar
109/194	56%	Sheldon Glashow	356/603	59%	Juan Maldacena	170/257	66%	Michael Peskin
406/718	56%	Savas Dimopoulos	481/813	59%	Andrew Strominger	222/326	68%	John H. Schwarz
351/615	57%	Mark B. Wise	174/294	59%	Brian R. Greene

I'd especially like to congratulate Frederik on his anomalously low 49 percent. You make us all worse than a monkey, Frederik.¹

The Blogosphere

Now let's turn to even more famous people: physics bloggers (and authors). Here are some of the most prominent, ranked from most fake-sounding papers (smallest percentage) to least fake-sounding papers (largest percentage). I think there's a lesson here somewhere, though it's hard to be sure in some cases, due to small statistics.

117/217	53%	Sean M. Carroll [blog]	68/112	60%	Lubos Motl [blog]
129/234	55%	Jacques Distler [blog]	202/331	61%	Mark Trodden [blog]
343/609	56%	Clifford V. Johnson [blog]	98/159	61%	Sabine Hossenfelder [blog]
56/97	57%	John C. Baez [blog]	220/352	62%	Lee Smolin [site]
306/505	60%	JoAnne Hewett [blog]	6/8	75%	Peter Woit [blog]

Fake-Sounding and Real-Sounding Words

Suppose you're writing a scientific paper, and you want to ensure that the general public doesn't think it's complete malarkey. How do you do it? Here are the 10 words with the lowest percentage of correct guesses (most fake-sounding) for titles containing those words (to ensure no single paper dominates this percentage, I'm requiring that each word appear in at least 5 titles).

174/521	33%	Saturn	76/195	38%	multiskyrmions
66/196	33%	half-flat	100/252	39%	secret
69/189	36%	charging	99/249	39%	perturbing
54/147	36%	caustic	78/194	40%	pollution
80/208	38%	highlights	87/214	40%	enough

Avoid these words! Turns out people don't believe in "multiskyrmions." Also, you shouldn't mention "Saturn," or use normal english words like "secret" or "enough." By contrast, here's a list of the 10 words with the highest percentage of correct guesses (most realistic-sounding) for titles containing those words.

76/100	76%	cp-even	87/122	71%	spin-spin
75/101	74%	Argon	90/127	70%	anomaly-free
140/191	73%	two-particle	127/180	70%	atlas
74/102	72%	self-coupling	70/100	70%	supersymmetry-breaking
84/117	71%	unusual	128/183	69%	naked

In other words, if you want to be taken seriously as a scientist, you should call your next paper Unusual Naked, but Anomaly-Free.

Incidence of Apparent Hooey in Various Subfields

Papers on the arXiv can be associated with one or more physics subfields. Here's a ranking of subfields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/81	48%	Adaptation and Self-Organizing Systems	134/226	59%	Combinatorics
48/98	48%	Popular Physics	638/1075	59%	Other
172/340	50%	Data Analysis, Statistics and Probability	116/195	59%	Accelerator Physics
212/414	51%	History of Physics	140/235	59%	Soft Condensed Matter
167/307	54%	Operator Algebras	1898/3172	59%	Differential Geometry
136/248	54%	Rings and Algebras	114/190	60%	Group Theory
101/184	54%	Disordered Systems and Neural Networks	195/324	60%	Fluid Dynamics
282/512	55%	Pattern Formation and Solitons	216/358	60%	Functional Analysis
128/227	56%	Classical Analysis and ODEs	94/155	60%	Dynamical Systems
295/523	56%	Representation Theory	96/158	60%	Algebraic Topology
39/69	56%	Probability	500/820	60%	Atomic Physics
67/118	56%	Geophysics	180/295	61%	Geometric Topology
138/242	57%	Number Theory	206/337	61%	Symplectic Geometry
757/1327	57%	Strongly Correlated Electrons	41/67	61%	Symbolic Computation
4724/8190	57%	Quantum Algebra	103/168	61%	Classical Physics
2696/4666	57%	Exactly Solvable and Integrable Systems	54/88	61%	Complex Variables
766/1317	58%	Superconductivity	758/1229	61%	Mesoscopic Systems and Quantum Hall Effect
88/151	58%	Computational Physics	50/81	61%	Materials Science
1763/3011	58%	Algebraic Geometry	50/80	62%	Category Theory
8306/14161	58%	Mathematical Physics	76/120	63%	K-Theory and Homology
2072/3511	59%	Statistical Mechanics	59/93	63%	Instrumentation and Detectors
279/472	59%	Chaotic Dynamics	55/86	63%	Spectral Theory
58/98	59%	Analysis of PDEs	82/121	67%	Optics
99/167	59%	Plasma Physics

Performance by Country

These last few statistics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have location data for the most recent quarter-million guesses.² So let's look at how performance varies across the the globe. Here's a ranking of correct guesses from countries with at least 2000 total guesses.³

1400/2256	62%	Austria	2393/4189	57%	Japan
10665/17467	61%	Germany	89632/158059	56%	United States
3111/5183	60%	Israel	12137/21471	56%	United Kingdom
1658/2825	58%	Spain	7233/13053	55%	Canada
4134/7080	58%	Italy	2281/4183	54%	India
5804/9960	58%	France	1652/3037	54%	Finland
2355/4071	57%	Switzerland	2270/4266	53%	Russian Federation
3485/6083	57%	Australia	3496/6593	53%	Netherlands
1690/2958	57%	Sweden	1061/2001	53%	Argentina

It looks like having English as a first language is not particularly helpful.

Performance by School

Finally, universities account for about 1/8th of the total number of guesses on arXiv vs. snarXiv. Altogether, their performance is almost exactly average (59%). However, there are variations... Here's a ranking of schools with at least 400 total guesses.

1145/1388	82%	University of Colorado at Boulder	560/963	58%	The University of Chicago
553/785	70%	University of Regensburg	278/481	57%	UC Santa Barbara
317/481	65%	University of Washington	264/461	57%	Madison
718/1097	65%	Penn State	542/967	56%	University of Cambridge
1001/1538	65%	Berkeley	237/426	55%	Cornell University
549/849	64%	Princeton University	476/861	55%	UC Davis
1277/1981	64%	MIT	471/855	55%	Columbia University
444/691	64%	Imperial College London	859/1577	54%	California Institute of Technology
349/544	64%	Monash University	393/723	54%	Harvard University
376/597	62%	University of Illinois at Urbana-Champaign	363/671	54%	Stanford University
287/457	62%	Hebrew University of Jerusalem	219/423	51%	Yale University
284/461	61%	The University of Edinburgh	308/599	51%	University of Minnesota
261/435	60%	Boston University	281/551	50%	University of Warwick

Congratulations to the University of Colorado at Boulder, which is the clear winner here.⁴ Also, I just wanted to say: seriously Harvard? Seriously?

Disclaimer

Finally, before heading into the comments, let me do a crapload of disclaiming. This is obviously the least scientific survey of science ever conducted. The ranking of a paper as "fake-sounding" or "realistic-sounding" has as much to do with the peculiarities of the snarXiv as with the arXiv itself.⁵ Also, although 750,000 guesses is a lot in total — such that I'm fairly certain that the 59% overall average isn't going anywhere — the statistics get dicey when chopped into small bits (see what I did there?). To be sure of anything, I guess we'll just have to wait until the blogosphere writes more papers.

In addition to being a top-notch physicist, Frederik also happens to be one of the world's best arXiv vs. snarXiv players.
When I told my software-startup friend about arXiv vs. snarXiv and mentioned that I wasn't logging ip addresses, he looked at me very seriously and said: "You're not logging ip addresses? You should always log ip addresses."
The high-scores leader "Ed" is from Portugal, which would have done very well in the rankings had I included his guesses. Unfortunately, "Ed" was cheating (probably already obvious to everyone, though I can also prove it with certitude), so I've removed all his guesses from this analysis.
Or rather, congratulations to the two dudes at the University of Colorado at Boulder who together played over 118 games with a total of 1215 guesses and an average score of 85%.
I already got slammed for this on marginalrevolution.com, and I'll surely get slammed again.

Comments

ErkcanMar 8 2011 04:21:35
Hi, Excellent study. One thing as an experimentalist: Why don't you add some statistical error to the individual scores on arxiv vs. snarxiv? While I was playing it, I felt my score was slowly converging on to a particular value and thus I felt it would be nice to see the uncertainty. I am assuming that my skill is indeed measurable. The best estimator would be indeed by n_correct/n_total. At the very least we can assume binomial errors on this quantity. So when the game reports 3/4=75%, it could instead say 75+-21%. Also given that you have the "stump the experts" section, you must be keeping data that allows extracting more detailed statistics of the players? Would you mind publishing histograms of how many guesses per game are played? Distribution of the results? (ie. do the scores distribute like a Gaussian? what is the rms? etc.) Anyway, cool stuff. I am upset that I heard about the whole thing so late... e.
mgaryApr 10 2012 10:01:45
You might want to check the dates on the submissions from Boulder, I suspect you'll find that they occurred during TASI 2010. There was a little bit of a contest going on to see who could get the longest streak of consecutive correct guesses.
jelmerApr 23 2012 15:07:15
The first 10 attempts I answered wrong, but after a while you start to see a pattern and improve. Maybe you can only show the score after every N answers, and not show right/wrong to prevent learning the pattern of fake paper names.
AndrewFeb 6 2013 15:18:02
14 of 20 At least for the ones I received, it seemed the real article titles tended to be more declarative of experimental results and specific quantities than theoretical discussion. I also leaned more towards simpler titles with less jargon or jargon-y words (not that I'd have any chance of distinguishing what are and aren't real terms). While the sampling isn't random according to the stats page, it makes intuitive sense to me that the fake submissions would strongly tend towards long, jargon-filled titles. Not that physics papers are known for simple, concise titles, but I'd also think that the real titles would be at least slightly biased towards simplicity and concision. This line of thought makes me wonder if some familiarity with the subject matter (but below actual expertise) might actually hurt someone's performance as they might focus more on the titles' subject matter rather than factors that might be better indicative of if an individual title is real or not. It's probably just cognitive bias, but I think I might be able to at least keep up with the actual physicists (or at least beat that damn monkey) as a layperson over a larger number of trials.
BonnieBJul 2 2013 18:47:33
This is hilarious! After my first 30 or so tries I was a "Physics Grad" but soon dropped off to dumber than a monkey :-) ... I wanted to see if there was a pattern and then just picked randomly. Great fun! I'm a retired science librarian with BA in mammalian zoology but loved physics once I was working in a sci/tech library. I am happy to see I still have some talent at picking out snark. Must be thanks to the astrophysicists I worked for at the end of my career :-)