Posted in Humor, Physics, Projects

The arXiv According to arXiv vs. snarXiv

After more than 3/4 of a mil­lion guesses, in over 50,000 games played in 67 coun­tries, the results are clear: Sci­ence sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it’s time to take a look at the results. Here’s how the game works. The user sees two titles: one is the title of an actual the­o­ret­i­cal high energy physics paper on the arXiv, and the other is a com­pletely fake title ran­domly gen­er­ated by the snarXiv. The user guesses which one is real, finds out if they’re right or wrong, and then starts over with a new pair of titles.

I’ve been record­ing the result of each guess, orig­i­nally just out of curios­ity. I never expected to get rea­son­able sta­tis­tics on the over 120,000 high energy the­ory papers on the arXiv. But after more than 750,000 guesses, that’s exactly what I’ve got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let’s take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the low­est per­cent­age of cor­rect guesses when users try to dis­tin­guish them from a ran­domly gen­er­ated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, gen­er­at­ing bet­ter sta­tis­tics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

guesses per­cent paper
36/138 26% High­lights of the The­oryB. Z. Kope­liovich and R. Peschan­ski(June 1998)
41/153 26% Het­erotic on Half-flatSebastien Gur­ri­eri, Andre Lukas and Andrei Micu(August 2004)
13/48 27% Rel­a­tivis­tic con­fine­ment of neu­tral fermi­ons with a trigono­met­ric tan­gent poten­tialLuis B. Cas­tro and Anto­nio S. de Cas­tro(Novem­ber 2006)
13/47 27% Toric Kahler met­rics and AdS_5 in ring-like co-ordinatesBobby S. Acharya, Suresh Govin­dara­jan and Chethan N. Gowdi­gere(Decem­ber 2006)
9/32 28% Aspects of U_A(1) break­ing in the Nambu and Jona-Lasinio modelAlexan­der A. Osipov, Brigitte Hiller, Veronique Bernard and Alex H. Blin(July 2005)
35/116 30% Energy’s and ampli­tudes’ pos­i­tiv­ityAlberto Nico­lis, Ric­cardo Rat­tazzi and Enrico Trincherini(Decem­ber 2009)
16/53 30% A covari­ant diquark-quark model of the nucleon in the Salpeter approachVolker Keiner(March 1996)
51/167 30% Non­com­mu­ta­tive Bun­dles and Instan­tons in TehranGio­vanni Landi and Wal­ter van Sui­jlekom(March 2006)
38/124 30% Baby steps beyond rainbow-ladderRichard Williams and Chris­t­ian S. Fis­cher(May 2009)
13/42 30% Trans­verse force on a mov­ing vor­tex with the acoustic geom­e­tryPeng-ming Zhang, Li-ming Cao, Yi-shi Duan and Cheng-kui Zhong(Jan­u­ary 2005)
44/138 31% Test­ing fac­tor­iza­tionGudrun Hiller(Novem­ber 2001)
19/59 32% Prospects for Mirage Medi­a­tionAaron Pierce and Jesse Thaler(April 2006)
49/152 32% Deter­min­ing the dualArjan Keuren­t­jes(July 2006)
11/34 32% Grav­i­ta­tional Dress­ing of Renor­mal­iza­tion GroupI. R. Kle­banov, I. I. Kogan and A. M. Polyakov(Sep­tem­ber 1993)
53/163 32% Charg­ing Black Sat­urn?Brenda Chng, Robert Mann, Eugen Radu and Cris­t­ian Ste­lea(Sep­tem­ber 2008)

A tip for future arXiv vs. snarXiv play­ers: the snarXiv is more gram­mat­i­cal than the arXiv. If you see “Het­erotic on Half-flat” and think “uh… Half-flat what?” then you can be nearly cer­tain it’s a real sci­en­tific paper that was writ­ten to advance the bound­aries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spec­tac­u­larly low per­cent­age of which were correct.

guesses per­cent paper
2/13 15% Actions and Fermi­onic sym­me­tries for D-branes in bosonic back­groundsDon­ald Marolf, Luca Mar­tucci and Pedro J. Silva(June 2003)
3/15 20% CERN LEP2 con­straint on 4D QED hav­ing dynam­i­cally gen­er­ated spa­tial dimen­sionGi-Chol Cho, Etsuko Izumi and Akio Sug­amoto(Decem­ber 2001)
6/27 22% Fam­i­lies as Neigh­bors in Extra Dimen­sionG. Dvali and M. Shif­man(Jan­u­ary 2000)
6/25 24% The Green­ing of Quan­tum Field The­ory: George and IJulian Schwinger(Octo­ber 1993)
3/12 25% Dyonic solu­tion of Horava-Lifshitz Grav­ityEoin Ó Col­gáin and Hos­sein Yavar­tanoo(April 2009)

Stump the Experts

Peo­ple with all sorts of back­grounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely sus­pi­cious of ridiculous-sounding words that have been co-opted for tech­ni­cal pur­poses — “‘Mirage Medi­a­tion?’ that can’t be real!” High energy physi­cists, on the other hand, are used to their own unfor­tu­nate ver­biage. Unfor­tu­nately for them, how­ever, there are still plenty of papers on the arXiv that sound like they were writ­ten by a computer.

Let’s define an “expert” game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of Sep­tem­ber 16, 2010). Tal­ly­ing up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

guesses per­cent paper
1/6 16% FieldsW. Siegel(Decem­ber 1999)
1/5 20% An extended model for mono­pole catal­y­sis of nucleon decayY. Bri­haye, D. Yu. Grig­oriev, V. A. Rubakov and D. H. Tchrakian(Novem­ber 2002)
1/5 20% Space-time sym­me­try restora­tion in cos­mo­log­i­cal mod­els with Kalb – Ramond and scalar fieldsE. Di Grezia, G. Mangano and G. Miele(July 2004)
1/5 20% Towards Eval­u­a­tion of Stringy Non-Perturbative EffectsR. Brustein and B. A. Ovrut(Novem­ber 1995)
1/5 20% Dimen­sional Reduc­tionCorinne A. Manogue and Tevian Dray(July 1998)
1/5 20% The Ridge, the Glasma and FlowLarry McLer­ran(Decem­ber 2008)
1/5 20% Gen­er­al­ized Bunch­ing Para­me­ters and Mul­ti­plic­ity Fluc­tu­a­tions in Restricted Phase-Space BinsS. V. Chekanov, W. Kit­tel and V. I. Kuvshi­nov(June 1996)
2/7 28% BGWM as Sec­ond Con­stituent of Com­plex Matrix ModelA. Alexan­drov, A. Mironov and A. Moro­zov(June 2009)
3/10 30% Test­ing fac­tor­iza­tionGudrun Hiller(Novem­ber 2001)
2/6 33% Super­sym­met­ric Poten­tials in Einstein-Cartan-Brans-Dicke Cos­mol­ogyL. C. Gar­cia de Andrade(April 2001)

I have to say, some of these def­i­nitely sound like they came straight from the snarXiv. I really don’t know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physi­cists

Ok, so the papers above sound espe­cially ridicu­lous. How­ever, the aver­age across all 750,000 guesses on all papers is still only 59% cor­rect. While bet­ter than a mon­key, this is not par­tic­u­larly good. Who’s respon­si­ble? Surely not the world’s top minds?

Here’s a rank­ing of some of the most-highly cited physi­cists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), accord­ing to the per­cent­age of cor­rect guesses on their papers.[1] A smaller per­cent­age means that their papers sound more like com­plete flap­doo­dle. I should note for the sake of my career that this has absolutely noth­ing to do with the qual­ity of said papers.

178/360 49% Fred­erik Denef 246/429 57% Neil Turok 289/488 59% Joseph Polchin­ski
129/253 50% A. M. Polyakov 310/539 57% Lisa Ran­dall 573/963 59% Edward Wit­ten
69/135 51% Steven Wein­berg 353/612 57% Dim­itri Nanopoulos 1194/1981 60% John Ellis
189/357 52% Howard Georgi 209/362 57% Her­man Verlinde 368/610 60% Shamit Kachru
764/1419 53% Cum­run Vafa 450/778 57% Gia Dvali 315/520 60% Roman Jackiw
332/615 53% Leonard Susskind 297/513 57% Nima Arkani-Hamed 311/510 60% Hirosi Ooguri
6/11 54% H. David Politzer 416/717 58% Nathan Seiberg 592/966 61% Hitoshi Murayama
63/115 54% Ger­ard t Hooft 94/162 58% Lawrence M. Krauss 118/192 61% David J. Gross
282/514 54% Michael B. Green 368/633 58% Thomas Banks 381/606 62% Lawrence J. Hall
348/622 55% Frank Wilczek 535/917 58% Igor R. Klebanov 118/186 63% Stephen Hawk­ing
164/293 55% Erik Ver­linde 412/705 58% Steven S. Gubser 383/600 63% Aneesh V. Manohar
109/194 56% Shel­don Glashow 356/603 59% Juan Mal­da­cena 170/257 66% Michael Peskin
406/718 56% Savas Dimopou­los 481/813 59% Andrew Stro­minger 222/326 68% John H. Schwarz
351/615 57% Mark B. Wise 174/294 59% Brian R. Greene

I’d espe­cially like to con­grat­u­late Fred­erik on his anom­alously low 49 per­cent. You make us all worse than a mon­key, Fred­erik.[2]

The Blo­gos­phere

Now let’s turn to even more famous peo­ple: physics blog­gers (and authors). Here are some of the most promi­nent, ranked from most fake-sounding papers (small­est per­cent­age) to least fake-sounding papers (largest per­cent­age). I think there’s a les­son here some­where, though it’s hard to be sure in some cases, due to small statistics.

Fake-Sounding and Real-Sounding Words

Sup­pose you’re writ­ing a sci­en­tific paper, and you want to ensure that the gen­eral pub­lic doesn’t think it’s com­plete malarkey. How do you do it? Here are the 10 words with the low­est per­cent­age of cor­rect guesses (most fake-sounding) for titles con­tain­ing those words (to ensure no sin­gle paper dom­i­nates this per­cent­age, I’m requir­ing that each word appear in at least 5 titles).

174/521 33% Sat­urn 76/195 38% mul­ti­skyrmions
66/196 33% half-flat 100/252 39% secret
69/189 36% charg­ing 99/249 39% per­turb­ing
54/147 36% caus­tic 78/194 40% pol­lu­tion
80/208 38% high­lights 87/214 40% enough

Avoid these words! Turns out peo­ple don’t believe in “mul­ti­skyrmions.” Also, you shouldn’t men­tion “Sat­urn,” or use nor­mal eng­lish words like “secret” or “enough.” By con­trast, here’s a list of the 10 words with the high­est per­cent­age of cor­rect guesses (most realistic-sounding) for titles con­tain­ing those words.

76/100 76% cp-even 87/122 71% spin-spin
75/101 74% Argon 90/127 70% anomaly-free
140/191 73% two-particle 127/180 70% atlas
74/102 72% self-coupling 70/100 70% supersymmetry-breaking
84/117 71% unusual 128/183 69% naked

In other words, if you want to be taken seri­ously as a sci­en­tist, you should call your next paper Unusual Naked, but Anomaly-Free.

Inci­dence of Appar­ent Hooey in Var­i­ous Subfields

Papers on the arXiv can be asso­ci­ated with one or more physics sub­fields. Here’s a rank­ing of sub­fields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/81 48% Adap­ta­tion and Self-Organizing Systems 134/226 59% Com­bi­na­torics
48/98 48% Pop­u­lar Physics 638/1075 59% Other
172/340 50% Data Analy­sis, Sta­tis­tics and Probability 116/195 59% Accel­er­a­tor Physics
212/414 51% His­tory of Physics 140/235 59% Soft Con­densed Matter
167/307 54% Oper­a­tor Algebras 1898/3172 59% Dif­fer­en­tial Geometry
136/248 54% Rings and Algebras 114/190 60% Group The­ory
101/184 54% Dis­or­dered Sys­tems and Neural Networks 195/324 60% Fluid Dynam­ics
282/512 55% Pat­tern For­ma­tion and Solitons 216/358 60% Func­tional Analysis
128/227 56% Clas­si­cal Analy­sis and ODEs 94/155 60% Dynam­i­cal Systems
295/523 56% Rep­re­sen­ta­tion Theory 96/158 60% Alge­braic Topology
39/69 56% Prob­a­bil­ity 500/820 60% Atomic Physics
67/118 56% Geo­physics 180/295 61% Geo­met­ric Topology
138/242 57% Num­ber Theory 206/337 61% Sym­plec­tic Geometry
757/1327 57% Strongly Cor­re­lated Electrons 41/67 61% Sym­bolic Computation
4724/8190 57% Quan­tum Algebra 103/168 61% Clas­si­cal Physics
2696/4666 57% Exactly Solv­able and Inte­grable Systems 54/88 61% Com­plex Variables
766/1317 58% Super­con­duc­tiv­ity 758/1229 61% Meso­scopic Sys­tems and Quan­tum Hall Effect
88/151 58% Com­pu­ta­tional Physics 50/81 61% Mate­ri­als Science
1763/3011 58% Alge­braic Geometry 50/80 62% Cat­e­gory Theory
8306/14161 58% Math­e­mat­i­cal Physics 76/120 63% K-Theory and Homology
2072/3511 59% Sta­tis­ti­cal Mechanics 59/93 63% Instru­men­ta­tion and Detectors
279/472 59% Chaotic Dynam­ics 55/86 63% Spec­tral Theory
58/98 59% Analy­sis of PDEs 82/121 67% Optics
99/167 59% Plasma Physics

Per­for­mance by Country

These last few sta­tis­tics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have loca­tion data for the most recent quarter-million guesses.[3] So let’s look at how per­for­mance varies across the the globe. Here’s a rank­ing of cor­rect guesses from coun­tries with at least 2000 total guesses.[4]

1400/2256 62% Aus­tria 2393/4189 57% Japan
10665/17467 61% Ger­many 89632/158059 56% United States
3111/5183 60% Israel 12137/21471 56% United King­dom
1658/2825 58% Spain 7233/13053 55% Canada
4134/7080 58% Italy 2281/4183 54% India
5804/9960 58% France 1652/3037 54% Fin­land
2355/4071 57% Switzer­land 2270/4266 53% Russ­ian Federation
3485/6083 57% Aus­tralia 3496/6593 53% Nether­lands
1690/2958 57% Swe­den 1061/2001 53% Argentina

It looks like hav­ing Eng­lish as a first lan­guage is not par­tic­u­larly helpful.

Per­for­mance by School

Finally, uni­ver­si­ties account for about 1/8th of the total num­ber of guesses on arXiv vs. snarXiv. Alto­gether, their per­for­mance is almost exactly aver­age (59%). How­ever, there are vari­a­tions… Here’s a rank­ing of schools with at least 400 total guesses.

1145/1388 82% Uni­ver­sity of Col­orado at Boulder 560/963 58% The Uni­ver­sity of Chicago
553/785 70% Uni­ver­sity of Regensburg 278/481 57% UC Santa Barbara
317/481 65% Uni­ver­sity of Washington 264/461 57% Madi­son
718/1097 65% Penn State 542/967 56% Uni­ver­sity of Cambridge
1001/1538 65% Berke­ley 237/426 55% Cor­nell University
549/849 64% Prince­ton University 476/861 55% UC Davis
1277/1981 64% MIT 471/855 55% Colum­bia University
444/691 64% Impe­r­ial Col­lege London 859/1577 54% Cal­i­for­nia Insti­tute of Technology
349/544 64% Monash Uni­ver­sity 393/723 54% Har­vard University
376/597 62% Uni­ver­sity of Illi­nois at Urbana-Champaign 363/671 54% Stan­ford University
287/457 62% Hebrew Uni­ver­sity of Jerusalem 219/423 51% Yale Uni­ver­sity
284/461 61% The Uni­ver­sity of Edinburgh 308/599 51% Uni­ver­sity of Minnesota
261/435 60% Boston Uni­ver­sity 281/551 50% Uni­ver­sity of Warwick

Con­grat­u­la­tions to the Uni­ver­sity of Col­orado at Boul­der, which is the clear win­ner here.[5] Also, I just wanted to say: seri­ously Har­vard? Seriously?


Finally, before head­ing into the com­ments, let me do a crapload of dis­claim­ing. This is obvi­ously the least sci­en­tific sur­vey of sci­ence ever con­ducted. The rank­ing of a paper as “fake-sounding” or “realistic-sounding” has as much to do with the pecu­liar­i­ties of the snarXiv as with the arXiv itself.[6] Also, although 750,000 guesses is a lot in total — such that I’m fairly cer­tain that the 59% over­all aver­age isn’t going any­where — the sta­tis­tics get dicey when chopped into small bits (see what I did there?). To be sure of any­thing, I guess we’ll just have to wait until the blo­gos­phere writes more papers.

  1. I couldn’t find an h-index rank­ing of physi­cists that was more recent than this one from 2005. I’m prob­a­bly miss­ing lots of names. Let me know and I’ll add them. []
  2. In addi­tion to being a top-notch physi­cist, Fred­erik also hap­pens to be one of the world’s best arXiv vs. snarXiv play­ers. []
  3. When I told my software-startup friend about arXiv vs. snarXiv and men­tioned that I wasn’t log­ging ip addresses, he looked at me very seri­ously and said: “You’re not log­ging ip addresses? You should always log ip addresses.” []
  4. The high-scores leader “Ed” is from Por­tu­gal, which would have done very well in the rank­ings had I included his guesses. Unfor­tu­nately, “Ed” was cheat­ing (prob­a­bly already obvi­ous to every­one, though I can also prove it with cer­ti­tude), so I’ve removed all his guesses from this analy­sis. []
  5. Or rather, con­grat­u­la­tions to the two dudes at the Uni­ver­sity of Col­orado at Boul­der who together played over 118 games with a total of 1215 guesses and an aver­age score of 85%. []
  6. I already got slammed for this on marginalrevolution.com, and I’ll surely get slammed again. []

6 Responses to “The arXiv According to arXiv vs. snarXiv” Comment Feed, Trackback.

  1. Pingback: Quick Links | A Blog Around The Clock Sep 19, 2010 @ 7:01 pm

    […] The arXiv Accord­ing to arXiv vs. snarXiv […]

  2. Erkcan says: Mar 08, 2011 @ 4:21 am


    Excel­lent study. One thing as an exper­i­men­tal­ist: Why don’t you add some sta­tis­ti­cal error to the indi­vid­ual scores on arxiv vs. snarxiv? While I was play­ing it, I felt my score was slowly con­verg­ing on to a par­tic­u­lar value and thus I felt it would be nice to see the uncer­tainty. I am assum­ing that my skill is indeed mea­sur­able. The best esti­ma­tor would be indeed by n_correct/n_total. At the very least we can assume bino­mial errors on this quan­tity. So when the game reports 3/4=75%, it could instead say 75+-21%.

    Also given that you have the “stump the experts” sec­tion, you must be keep­ing data that allows extract­ing more detailed sta­tis­tics of the play­ers? Would you mind pub­lish­ing his­tograms of how many guesses per game are played? Dis­tri­b­u­tion of the results? (ie. do the scores dis­trib­ute like a Gauss­ian? what is the rms? etc.)

    Any­way, cool stuff. I am upset that I heard about the whole thing so late…


  3. mgary says: Apr 10, 2012 @ 10:01 am

    You might want to check the dates on the sub­mis­sions from Boul­der, I sus­pect you’ll find that they occurred dur­ing TASI 2010. There was a lit­tle bit of a con­test going on to see who could get the longest streak of con­sec­u­tive cor­rect guesses.

  4. jelmer says: Apr 23, 2012 @ 3:07 pm

    The first 10 attempts I answered wrong, but after a while you start to see a pat­tern and improve. Maybe you can only show the score after every N answers, and not show right/wrong to pre­vent learn­ing the pat­tern of fake paper names.

  5. Andrew says: Feb 06, 2013 @ 3:18 pm

    14 of 20

    At least for the ones I received, it seemed the real arti­cle titles tended to be more declar­a­tive of exper­i­men­tal results and spe­cific quan­ti­ties than the­o­ret­i­cal dis­cus­sion. I also leaned more towards sim­pler titles with less jar­gon or jargon-y words (not that I’d have any chance of dis­tin­guish­ing what are and aren’t real terms). While the sam­pling isn’t ran­dom accord­ing to the stats page, it makes intu­itive sense to me that the fake sub­mis­sions would strongly tend towards long, jargon-filled titles. Not that physics papers are known for sim­ple, con­cise titles, but I’d also think that the real titles would be at least slightly biased towards sim­plic­ity and concision.

    This line of thought makes me won­der if some famil­iar­ity with the sub­ject mat­ter (but below actual exper­tise) might actu­ally hurt someone’s per­for­mance as they might focus more on the titles’ sub­ject mat­ter rather than fac­tors that might be bet­ter indica­tive of if an indi­vid­ual title is real or not.

    It’s prob­a­bly just cog­ni­tive bias, but I think I might be able to at least keep up with the actual physi­cists (or at least beat that damn mon­key) as a layper­son over a larger num­ber of trials.

  6. BonnieB says: Jul 02, 2013 @ 6:47 pm

    This is hilar­i­ous! After my first 30 or so tries I was a “Physics Grad” but soon dropped off to dumber than a mon­key :-) … I wanted to see if there was a pat­tern and then just picked ran­domly. Great fun! I’m a retired sci­ence librar­ian with BA in mam­malian zool­ogy but loved physics once I was work­ing in a sci/tech library. I am happy to see I still have some tal­ent at pick­ing out snark. Must be thanks to the astro­physi­cists I worked for at the end of my career :-)

Leave a Reply