Posted in Code, Humor, Physics, Projects

The snarXiv

The snarXiv is a ran­dom high-energy the­ory paper gen­er­a­tor incor­po­rat­ing all the lat­est trends, entropic rea­son­ing, and excit­ing mod­uli spaces. The arXiv is sim­i­lar, but occa­sion­ally less ran­dom.[1]

Actu­ally, the snarXiv only gen­er­ates tan­ta­liz­ing titles and abstracts at the moment, while the arXiv deliv­ers match­ing papers as well. Details of the imple­men­ta­tion are below.[2] I’m the author, and I don’t remem­ber exactly why I decided to do this. I did already have the frame­work lying around from a pre­vi­ous project, and I swear I spent more time doing research last week­end than imple­ment­ing snarXiv.org.

Sug­gested Uses for the snarXiv[3]

Context-Free Gram­mars

The snarXiv is based on a con­text free gram­mar (CFG) — basi­cally a set of rules for computer-generated mad libs.[5] Each rule in a CFG con­sists of a term, and a set of choices for how to make that term. The choices can con­tain text, or other terms, or even refer recur­sively to the term being defined. The CFG syn­tax used on the snarXiv is a col­lec­tion of state­ments “term ::= choices”, where choices is a list of pos­si­bil­i­ties sep­a­rated by “|”. Some pos­si­bil­i­ties are just text, but the ones that look like “<newterm>” are direc­tions to go find the def­i­n­i­tion for newterm and fill it in. For instance, the fol­low­ing grammar

nounphrase ::= <noun> | <adj> <adj> <noun> | super <nounphrase>
noun ::= apple | pear | mailman
adj ::= smelly | chartreuse | enormous

can pro­duce noun­phrases like “apple,” “enor­mous smelly mail­man,” or “super super smelly char­treuse mail­man.” The snarxiv’s gram­mar is 622 lines long, and ends like this:

morecomments ::= <smallinteger> figures | JHEP style | Latex file
  | no figures | BibTeX | JHEP3 | typos corrected
  | <nzdigit> tables | added refs | minor changes
  | minor corrections | published in PRD
  | reference added | pdflatex
  | based on a talk given on <physicistname>'s <nzdigit>0th birthday
  | talk presented at the international <pluralphysconcept> workshop
comments ::= <smallinteger> pages | <comments>, <morecomments>

primarysubj ::= High Energy Physics - Theory (hep-th)|
                High Energy Physics - Phenomenology (hep-ph)|
secondarysubj ::= Nuclear Theory (nucl-th)|
      Cosmology and Extragalactic Astrophysics (astro-ph.CO)|
      General Relativity and Quantum Cosmology (gr-qc)|
      Statistical Mechanics (cond-mat.stat-mech)
papersubjects ::= <primarysubj> | <papersubjects>; <secondarysubj>

paper ::= <title> \\ <authors> \\ <comments> \\ <papersubjects> \\ <abstract>

The coolest and most nat­ural thing to do with a CFG is exploit recur­sive­ness as much as pos­si­ble. The more recur­sion built in, the less pre­dictable and richer the out­put. For instance, the fol­low­ing def­i­n­i­tion of a “space” has three rules: space, singspace, plu­ral­space, which refer recur­sively to each other in many dif­fer­ent ways, allow­ing for a huge num­ber of possibilities.

space ::= <pluralspace> | <singspace> | <mathspace>

singspace ::= a <spacetype> | a <spaceadj> <spacetype>
   | <properspacename> | <spaceadj> <properspacename>
   | <mathspace> | <mathspace>
   | a <bundletype> bundle over <space>
   | <singspace> fibered over <singspace>
   | the moduli space of <pluralspace>
   | a <spacetype> <spaceproperty>
   | the <spacepart> of <space>
   | a <group> <groupaction> of <singspace>
   | the near horizon geometry of <singspace>
pluralspace ::= <spacetype>s | <spaceadj> <spacetype>s
   | <n> copies of <mathspace>
   | <pluralspace> fibered over <space>
   | <spacetype>s <spaceproperty>
   | <bundletype> bundles over <space>
   | moduli spaces of <pluralspace>
   | <group> <groupaction>s of <pluralspace>

Of course, there’s also a dan­ger that in a very small num­ber of cases the out­put might be a lit­tle patho­log­i­cal. The noun­phrase exam­ple above, for instance, can pro­duce any phrase of the form “super super … super enor­mous pear.” The snarXiv sim­i­larly occa­sion­ally men­tions QFTs liv­ing on “the mod­uli space of mod­uli spaces of mod­uli spaces of mod­uli spaces of mod­uli spaces of SU(3) bun­dles over ellip­ti­cally fibered Enriques sur­faces.” Too much recur­sion can also quickly lead to expo­nen­tially long abstracts, which are even harder to read all the way through than the usual ones on the arXiv.

The Guts

To get some actual out­put from the gram­mar def­i­n­i­tion, the most straight­for­ward thing would be to write a script that reads in the gram­mar, and works its way down the tree, start­ing with the top term, fill­ing in def­i­n­i­tions recur­sively until it gets a block of text. Instead of using an exter­nal script, the snarXiv com­piles each gram­mar into its own pro­gram, a tech­nique that orig­i­nated from a fresh­man CS project and evolved min­i­mally from there — it’s less straight­for­ward, not clearly bet­ter, but maybe a bit more fun. A perl script com­piles the gram­mar file into OCaml code (snarxiv.ml):

type phrase = Str of string | Opts of phrase array array

let _ = Random.self_init ()

let randelt a = a.(Random.int (Array.length a))
let rec print phr = match phr with
  Str  s       -> print_string s
| Opts options ->
    let parts = randelt options in
    Array.iter print parts

(* Grammar definitions *)
let rec top = Opts [|
  [| paper;|];


and comments = Opts [|
  [| smallinteger; Str " pages";|];
  [| comments; Str ", "; morecomments;|];

and primarysubj = Opts [|
  [| Str "High Energy Physics - Theory (hep-th)";|];
  [| Str "High Energy Physics - Phenomenology (hep-ph)";|];

and secondarysubj = Opts [|
  [| Str "Nuclear Theory (nucl-th)";|];
  [| Str "Cosmology and Extragalactic Astrophysics (astro-ph.CO)";|];
  [| Str "General Relativity and Quantum Cosmology (gr-qc)";|];
  [| Str "Statistical Mechanics (cond-mat.stat-mech)";|];

and papersubjects = Opts [|
  [| primarysubj;|];
  [| papersubjects; Str "; "; secondarysubj;|];

and paper = Opts [|
  [| title; Str " \\\\ "; authors; Str " \\\\ "; comments; Str " \\\\ "; papersubjects; Str " \\\\ "; abstract; Str " ";|];

let _ = print top
let _ = print_string "\n"

And snarxiv.ml is now a spe­cial­ized pro­gram that, when com­piled and run, spits out a paper title and abstract. This setup is more elab­o­rate than nec­es­sary, but OCaml is a lovely lan­guage for recur­sive struc­tures, and the code is nice and sim­ple. OCaml is also fast, allow­ing the snarXiv to gen­er­ate papers even more swiftly than your favorite python script, or Ed Wit­ten in the 80’s.

Other CFGs

A few years ago, the CFG-based CS paper gen­er­a­tor SCI­gen made a splash by get­ting one of their papers accepted to the con­fer­ence SCI 2005. Their web­site has details, and links to some other ran­dom gen­er­a­tors around the web.

  1. For those who aren’t high energy physi­cists, and are still inter­ested (though I can’t imag­ine who that would be), the “X” in arXiv or snarXiv is sup­posed to be a greek chi. We’re meant to pro­nounce them like archive (as in “archive of physics papers”) and snar­chive (as in “snarky archive of physics papers”). []
  2. Please don’t sue me, arXiv.org, for steal­ing your CSS file and your beau­ti­ful color scheme. Also, Werner Heisen­berg, if you’re still alive, please don’t sue me or my com­puter for libel. []
  3. If some­one pre­ten­tious is annoy­ing you, and you use the the­o­rem gen­er­a­tor instead, you could try some­thing like this. []
  4. And check out the results. Also, pick up the unof­fi­cial arXiv vs. snarXiv wall­pa­per. []
  5. I first encoun­tered these in fresh­man year of col­lege in an assign­ment for CS51: Abstrac­tion and Design in Com­puter Pro­gram­ming. We had to imple­ment a CFG in LISP, and the clever­est won its author lunch at the fac­ulty club. The even­tual win­ner was my friend Matt Gline’s the­o­rem gen­er­a­tor, which has since been enhanced with LaTeX, com­mu­ta­tive dia­grams, ajax, and stuff like that. []

37 Responses to “The snarXiv” Comment Feed, Trackback.

  1. yanzhang says: Mar 25, 2010 @ 9:22 pm

    I like the fact you update at roughly the same fre­quency I do, yet with such a bet­ter design men­tal­ity. Added to feed.


  2. katiey says: Jun 03, 2010 @ 2:13 am

    arXiv vs. snarXiv just sucked me in for a good 10 min­utes. The game and the site are hilarious.

  3. tklose says: Jun 03, 2010 @ 2:34 pm

    Hi, arXiv vs. snarXiv is pretty neat! Is there any plan to pub­lish the scores of the real arXiv papers? Which ones are the most crack­potty (I’ve seen ones with 4/4 peo­ple think this is from snarXiv) and which ones are pretty clear? Which of the snarXiv titles man­age to fool most?

  4. davidsd says: Jun 03, 2010 @ 3:06 pm


    Thanks! I am indeed keep­ing track of that data, and I’m hop­ing to fig­ure out a good way to present it. Cur­rently, arXiv vs. snarXiv dis­plays ran­dom arXiv papers 70% of the time, and dis­plays papers with mul­ti­ple wrong guesses the other 30% of the time. There are over 130,000 hep-th/ph papers on the arXiv, so this seemed like a good trade­off between even­tu­ally going through all the papers, and get­ting large amounts of data on par­tic­u­larly fake-looking papers. What­ever I do, though, I won’t be able to make a defin­i­tive list of the most crack­potty papers any­time soon, since many papers won’t get cycled through in the near future. For now, I may set­tle for a list of “some crackpotty-sounding papers.” There are already a few big names on it…

  5. linh says: Jun 03, 2010 @ 9:15 pm

    arXiv vs. snarXiv is an awe­some way to keep one­self occu­pied dur­ing meetings :)

  6. Uncle Al says: Jun 04, 2010 @ 5:26 pm

    Where is the list of scores? I want to know if I am objec­tively stu­pid or a full-paid NSF Grad­u­ate Fel­low­ship diver­sity matric­u­lant. Two can play at this game,

    Do oppo­site shoes vio­late the Equiv­a­lence Principle?

  7. davidsd says: Jun 05, 2010 @ 3:45 am

    Alright, I imple­mented a high scores list. Cur­rently the aver­age score for every­one is about 61%.

  8. Chris says: Jun 05, 2010 @ 5:06 am

    The heuris­tic of choos­ing the paper with the longer title results in the right answer about ~63% of the time.

  9. tklose says: Jun 05, 2010 @ 7:50 am

    Wow, so there are already over 200000 ques­tions played! Lumi­nos­ity is ramp­ing up pretty hard! Sta­tis­tics on all 130000+ papers are com­ing in :D

  10. Pingback: arXiv vs. snarXiv « Strange Quark In London Jun 06, 2010 @ 6:09 am

    […] from the arXiv and which is genereated by a ran­dom title gen­er­a­tor (more about which is explained here ). I can score about 66% in the long run (with­out using Google) how much do you […]

  11. finnw says: Jun 06, 2010 @ 7:48 am

    There are some other weak­nesses as some­one pointed out on Face­Book when I posted a link. Arti­cles with dol­lar signs in the title are usu­ally real. Titles men­tion­ing “instan­tons”, “PDF” or “QFT” are fake about 75% of the time. Maybe you could bias the selec­tion towards/away from cer­tain key­words to off­set this.

  12. Pingback: Anonymous Jun 06, 2010 @ 10:04 am

    […] […]

  13. Pingback: Physical Sciences & Engineering Blog » Blog Archive » snarXiv.org – Summer Fun Jun 07, 2010 @ 2:06 pm

    […] stu­dent in the­o­ret­i­cal high-energy physics at Har­vard Uni­ver­sity, cre­ated snarXiv.org, which he describes as “a ran­dom high-energy the­ory paper gen­er­a­tor incor­po­rat­ing all the lat­est trends, entropic […]

  14. Pingback: A couple of links « Hyper tiling Jun 08, 2010 @ 6:34 am

    […] recently men­tioned, I think it is only fair (pace Sokal) to sug­gest to check out the snarXiv, a spoof ver­sion of the well-known archive of sci­en­tific preprints (the arXiv). If you have a moment, play […]

  15. Akhil Mathew says: Jun 10, 2010 @ 3:25 pm

    This is hilarious!

    I knew I was bad at physics, but appar­ently, I’m worse than a monkey…

  16. city of the lits says: Jun 11, 2010 @ 10:06 am

    This reminds me of the Post­mod­ernism Gen­er­a­tor cre­ated by Andrew Bul­hak in 1996. The sim­i­lar­i­ties are quite strik­ing. The PG oper­ates by cre­at­ing ran­dom jour­nal arti­cles that spoof the kind of papers that are reg­u­larly pub­lished in lit crit and phi­los­o­phy jour­nals with a post­mod­ernist bent (quite the major­ity, in those fields :) Here’s a Wikipedia link for those inter­ested, http://en.wikipedia.org/wiki/Postmodernism_Generator. As some­one who spent a num­ber of years study­ing phi­los­o­phy at Sarah Lawrence Col­lege in the 80’s (prob­a­bly the peak of PM insan­ity, at a school absolutely sat­u­rated by it :), I really got a tremen­dous illicit thrill from the PG. It reminded me of quite a few papers I read (and wrote!) at that time in my life. Ah, physics…(sigh of great relief)

  17. city of the lits says: Jun 11, 2010 @ 10:30 am

    “A few years ago, the CFG-based CS paper gen­er­a­tor SCI­gen made a splash by get­ting one of their papers accepted to the con­fer­ence SCI 2005. Their web­site has details, and links to some other ran­dom gen­er­a­tors around the web.”

    I see that Sokal (sta­tis­ti­cal ther­mo­dy­nam­ics, NYU) is briefly men­tioned in a Ping­back, but in case some­one here isn’t famil­iar with the so-called “Sokal affair”, I’d like to men­tion Sokal’s paper entited “Trans­gress­ing the Bound­aries: Towards a Trans­for­ma­tive Hermeneu­tics of Quan­tum Grav­ity” which he sub­mit­ted to the jour­nal “Social Text” (they pub­lish in the field of post­mod­ern cul­tural stud­ies) in 1996. The jour­nal accepted this non­sen­si­cal paper, a piece so well-filled with post­mod­ernist jar­gon and impen­e­tra­ble argu­ments that a cur­sory glance at it by any­one with even a fleet­ing famil­iar­ity with Fou­cault or Der­rida will howl with laugh­ter. Highly recommended!

  18. iris\ says: Jun 12, 2010 @ 10:14 pm

    @ akhil
    refer­ring to the high school biol­ogy of evo­lu­tion and pun­ning your worse-than-a-monkey stand on physics,
    Your-physics belongs in the monkey-verse
    (hyphens allud­ing to how your ances­tor gave you the bio­me­chan­i­cal legacy of apes)
    Have there been any such gen­er­a­tors of aca­d­e­m­i­cal biology…?

  19. John Baez says: Jun 19, 2010 @ 7:16 pm

    Cur­rently the fake abstracts are easy to rec­og­nize as such, but maybe you could fix that by hav­ing a game where peo­ple tried to spot them, and keep­ing the ones that peo­ple were unable to spot as fake. Sur­vival of the fittest!

  20. Pingback: Randomness and reality « The Lumber Room Jun 19, 2010 @ 11:00 pm

    […] “this is too weird to be gen­er­ated by the gram­mar” — and failed.) You can read his About page for details. (“Sug­gested Uses for the snarXiv: [..] If you’re a grad­u­ate student, […]

  21. tklose says: Jun 21, 2010 @ 3:53 pm

    Hi, has any one arti­cle already won the crack­pot award? John Baez’ idea for find­ing the best fake arti­cles seems neat, too!

  22. davidsd says: Jun 21, 2010 @ 4:45 pm

    I’m hop­ing to do a full analy­sis of crack­pot­ti­ness on the arXiv some­time soon. I’ll write a post here when I do. Update 9/17/10: Here it is! For now, the pre­lim­i­nary win­ner of the crack­pot award is Secrets of the Met­ric In N=4 and N=2* Geome­tries [hep-th/0105235] with 0/10 cor­rect guesses. Maybe a close sec­ond is Near-horizon brane-scan revived [arXiv:0804.3675] with 5/34 cor­rect guesses.

    I like John’s idea too — it might be fun to imple­ment. I also think there’s a lot of room just for opti­miz­ing the snarXiv to sound more like the arXiv. Cur­rently, its vocab­u­lary and sen­tence struc­tures are lim­ited to what­ever popped into my head while I was writ­ing the gram­mar. There’s no sense in which the adjec­tives that appear, for instance, rep­re­sent a com­plete list of com­mon adjec­tives in arXiv titles. And the abstracts gram­mar def­i­nitely doesn’t have enough flex­i­bil­ity to mimic every­thing that peo­ple tend to write. My sense is that many who fig­ured out how to beat the snarXiv sim­ply learned what it could spit out and what it couldn’t, and that was basi­cally enough to dis­tin­guish it from the arXiv every time.

    Maybe I’ll find the time at some point to make improve­ments to the snarXiv gram­mar myself, but I was kind of hop­ing that some nerdy peo­ple might want to help out (I’m sup­posed to be doing research, you see :) ). The gram­mar is freely avail­able online, and this post has instruc­tions on how to set up the actual gen­er­a­tor from the grammar.

  23. Trackback: Travels in a Mathematical World Jul 02, 2010 @ 5:14 am

    Car­ni­val of Math­e­mat­ics #67…

    In arXiv vs. snarXiv, the game is to say which of two arti­cle titles is from the real arXiv, a \“highly-automated elec­tronic archive and dis­tri­b­u­tion server for research arti­cles\”, and which is from the spoof snarXiv, a \“ran­dom high-energy the…

  24. David Gerard says: Jul 11, 2010 @ 9:31 am

    Dude. Can we recruit you to Ratio­nal­Wiki? I added snarXiv to http://rationalwiki.org/wiki/arXiv , and Philip Gibbs has shown up to defend viXra …

  25. George says: Jul 27, 2010 @ 8:36 am

    Hi! snarXiv is so much fun. Thanks! I hope you imple­ment some of those ideas men­tioned. By the way, it is hereti­cal in China
    Per­haps a misunderstanding :)

  26. davidsd says: Jul 27, 2010 @ 4:47 pm

    @George, That is awe­some… Strong evi­dence that the snarXiv is an impor­tant tool in the search for truth.

  27. Pingback: arXiv versus snarXiv « Statisfaction Dec 16, 2010 @ 11:29 am

    […] pretty much the same, but the titles and the abstracts on snarXiv are gen­er­ated ran­domly. To quote their “about” page, the arXiv is sim­i­lar, but occa­sion­ally less ran­dom. Now, even if you’re not into high energy […]

  28. Pingback: Test Your Ignorance of Physics « The Folly of Human Conceits Aug 24, 2011 @ 2:44 pm

    […] snarXiv, accord­ing to its cre­ator, “only gen­er­ates tan­ta­liz­ing titles and abstracts at the […]

  29. Pingback: Portal de la retórica posmoderna y cientificista | Carlos Reynoso Dec 22, 2011 @ 1:34 pm

    […] The snarXiv – Pub­lica lis­tas de artícu­los cien­tí­fi­cos con sus respec­tivos abstracts. Al prin­ci­pio puede […]

  30. Mitchell Porter says: Apr 24, 2012 @ 3:19 am

    I started a “snarxiv blog” in imi­ta­tion of the “arxiv blog” (though right now it’s more of a “vixra blog”). I must have vis­ited this site of yours in the past, but your iden­tity didn’t reg­is­ter. How­ever, recently there was this cool paper about con­for­mal blocks, and now on a revisit to the home­page of the snarxiv’s cre­ator, I see that’s *your* paper.

    Just to com­plete the cir­cle, I was in touch with the author of the Post­mod­ernism Server when it was being cre­ated — in its orig­i­nal ver­sion, you’ll some­times see works by “Porter” in the bib­li­og­ra­phy; that’s me. It’s a small world. (In fact, I went to high school with one of the pio­neers of small-world theory…)

  31. davidsd says: Apr 24, 2012 @ 7:37 am

    I had no idea there was a snarXiv blog, but now I am delighted that it exists! Also, I’m glad you enjoyed my con­for­mal blocks paper.

  32. Daniel says: Aug 29, 2012 @ 3:14 am

    The high score list is sense­less because it implies that the player should be able to dis­cern sense from non­sense, when in fact a lot of real papers are not much bet­ter than ran­domly gen­er­ated gib­ber­ish. Just think about the Bog­danov Affair.

  33. Felipe j. Llanes-Estrada says: Oct 26, 2012 @ 3:47 pm

    I find this hilar­i­ous and had great fun,
    but please be care­ful with a list of irrec­og­niz­ably titled papers,
    you might inad­ver­tently hurt some­one. I have noticed that sev­eral ill-titled papers
    come from the far east where eng­lish skills vary in qual­ity.
    Show­ing up in a promi­nent posi­tion in that list might hurt the career prospects of
    an oth­er­wise valid colleague.

  34. Pingback: vetenskapliga textgeneratorer och stil | Vetenskaplig.se Oct 09, 2013 @ 9:54 am

    […] SnarXiv […]

  35. Pingback: The snarXiv vs arXiv « Um Passeio Aleatório Feb 24, 2014 @ 5:41 pm

    […] Eis então que des­cubro uma evolução do prim­i­tivo Ger­ador de Lero Lero: o snarXiv. Esse serviço, cri­ado em 2010, tem como obje­tivo gerar aleato­ri­a­mente arti­gos cien­tí­fi­cos (mais pre­cisa­mente ape­nas abstracts e títu­los, por enquanto) de Física de altas ener­gias através de um algo­ritmo que leva em conta as últi­mas tendên­cias da área. Nas palavras do próprio cri­ador (http://davidsd.org/2010/03/the-snarxiv/): […]

  36. Pingback: Scienza in grammelot – Ocasapiens - Blog - Repubblica.it Feb 25, 2014 @ 10:53 am

    […] (2) Mes­sag­gio pri­vato: Laisse béton SnarXiv, Cyril, la physique en a besoin pour les pois­sons d’avril… […]

  37. Pingback: Diverses: 10 Jahre Rosetta, gefakte Artikel und lustige Titel, Wissenschaftsvideos und überschnelle Einschlagskrater – Astrodicticum Simplex Mar 02, 2014 @ 6:21 am

    […] aus­pro­bieren. Die meis­ten ken­nen wahrschein­lich die Lit­er­atur­daten­bank arXiv. Aber kennt ihr auch snarXiv? Dieses Pro­gramm gener­iert völ­lig falsche, aber doch oft erstaunlich echt klin­gende Titel von […]

Leave a Reply