Posted in Code, Humor, Physics, Projects

The snarXiv

The snarXiv is a random high-energy theory paper generator incorporating all the latest trends, entropic reasoning, and exciting moduli spaces. The arXiv is similar, but occasionally less random.[1]

Actually, the snarXiv only generates tantalizing titles and abstracts at the moment, while the arXiv delivers matching papers as well. Details of the implementation are below.[2] I’m the author, and I don’t remember exactly why I decided to do this. I did already have the framework lying around from a previous project, and I swear I spent more time doing research last weekend than implementing snarXiv.org.

Suggested Uses for the snarXiv[3]

Context-Free Grammars

The snarXiv is based on a context free grammar (CFG) — basically a set of rules for computer-generated mad libs.[5] Each rule in a CFG consists of a term, and a set of choices for how to make that term. The choices can contain text, or other terms, or even refer recursively to the term being defined. The CFG syntax used on the snarXiv is a collection of statements “term ::= choices“, where choices is a list of possibilities separated by “|”. Some possibilities are just text, but the ones that look like “<newterm>” are directions to go find the definition for newterm and fill it in. For instance, the following grammar

nounphrase ::= <noun> | <adj> <adj> <noun> | super <nounphrase>
noun ::= apple | pear | mailman
adj ::= smelly | chartreuse | enormous

can produce nounphrases like “apple,” “enormous smelly mailman,” or “super super smelly chartreuse mailman.” The snarxiv’s grammar is 622 lines long, and ends like this:

morecomments ::= <smallinteger> figures | JHEP style | Latex file
  | no figures | BibTeX | JHEP3 | typos corrected
  | <nzdigit> tables | added refs | minor changes
  | minor corrections | published in PRD
  | reference added | pdflatex
  | based on a talk given on <physicistname>'s <nzdigit>0th birthday
  | talk presented at the international <pluralphysconcept> workshop
comments ::= <smallinteger> pages | <comments>, <morecomments>

primarysubj ::= High Energy Physics - Theory (hep-th)|
                High Energy Physics - Phenomenology (hep-ph)|
secondarysubj ::= Nuclear Theory (nucl-th)|
      Cosmology and Extragalactic Astrophysics (astro-ph.CO)|
      General Relativity and Quantum Cosmology (gr-qc)|
      Statistical Mechanics (cond-mat.stat-mech)
papersubjects ::= <primarysubj> | <papersubjects>; <secondarysubj>

paper ::= <title> \\ <authors> \\ <comments> \\ <papersubjects> \\ <abstract>

The coolest and most natural thing to do with a CFG is exploit recursiveness as much as possible. The more recursion built in, the less predictable and richer the output. For instance, the following definition of a “space” has three rules: space, singspace, pluralspace, which refer recursively to each other in many different ways, allowing for a huge number of possibilities.

space ::= <pluralspace> | <singspace> | <mathspace>

singspace ::= a <spacetype> | a <spaceadj> <spacetype>
   | <properspacename> | <spaceadj> <properspacename>
   | <mathspace> | <mathspace>
   | a <bundletype> bundle over <space>
   | <singspace> fibered over <singspace>
   | the moduli space of <pluralspace>
   | a <spacetype> <spaceproperty>
   | the <spacepart> of <space>
   | a <group> <groupaction> of <singspace>
   | the near horizon geometry of <singspace>
pluralspace ::= <spacetype>s | <spaceadj> <spacetype>s
   | <n> copies of <mathspace>
   | <pluralspace> fibered over <space>
   | <spacetype>s <spaceproperty>
   | <bundletype> bundles over <space>
   | moduli spaces of <pluralspace>
   | <group> <groupaction>s of <pluralspace>

Of course, there’s also a danger that in a very small number of cases the output might be a little pathological. The nounphrase example above, for instance, can produce any phrase of the form “super super … super enormous pear.” The snarXiv similarly occasionally mentions QFTs living on “the moduli space of moduli spaces of moduli spaces of moduli spaces of moduli spaces of SU(3) bundles over elliptically fibered Enriques surfaces.” Too much recursion can also quickly lead to exponentially long abstracts, which are even harder to read all the way through than the usual ones on the arXiv.

The Guts

To get some actual output from the grammar definition, the most straightforward thing would be to write a script that reads in the grammar, and works its way down the tree, starting with the top term, filling in definitions recursively until it gets a block of text. Instead of using an external script, the snarXiv compiles each grammar into its own program, a technique that originated from a freshman CS project and evolved minimally from there — it’s less straightforward, not clearly better, but maybe a bit more fun. A perl script compiles the grammar file into OCaml code (snarxiv.ml):

type phrase = Str of string | Opts of phrase array array

let _ = Random.self_init ()

let randelt a = a.(Random.int (Array.length a))
let rec print phr = match phr with
  Str  s       -> print_string s
| Opts options ->
    let parts = randelt options in
    Array.iter print parts

(* Grammar definitions *)
let rec top = Opts [|
  [| paper;|];


and comments = Opts [|
  [| smallinteger; Str " pages";|];
  [| comments; Str ", "; morecomments;|];

and primarysubj = Opts [|
  [| Str "High Energy Physics - Theory (hep-th)";|];
  [| Str "High Energy Physics - Phenomenology (hep-ph)";|];

and secondarysubj = Opts [|
  [| Str "Nuclear Theory (nucl-th)";|];
  [| Str "Cosmology and Extragalactic Astrophysics (astro-ph.CO)";|];
  [| Str "General Relativity and Quantum Cosmology (gr-qc)";|];
  [| Str "Statistical Mechanics (cond-mat.stat-mech)";|];

and papersubjects = Opts [|
  [| primarysubj;|];
  [| papersubjects; Str "; "; secondarysubj;|];

and paper = Opts [|
  [| title; Str " \\\\ "; authors; Str " \\\\ "; comments; Str " \\\\ "; papersubjects; Str " \\\\ "; abstract; Str " ";|];

let _ = print top
let _ = print_string "\n"

And snarxiv.ml is now a specialized program that, when compiled and run, spits out a paper title and abstract. This setup is more elaborate than necessary, but OCaml is a lovely language for recursive structures, and the code is nice and simple. OCaml is also fast, allowing the snarXiv to generate papers even more swiftly than your favorite python script, or Ed Witten in the 80’s.

Other CFGs

A few years ago, the CFG-based CS paper generator SCIgen made a splash by getting one of their papers accepted to the conference SCI 2005. Their website has details, and links to some other random generators around the web.

  1. For those who aren’t high energy physicists, and are still interested (though I can’t imagine who that would be), the “X” in arXiv or snarXiv is supposed to be a greek chi. We’re meant to pronounce them like archive (as in “archive of physics papers”) and snarchive (as in “snarky archive of physics papers”). []
  2. Please don’t sue me, arXiv.org, for stealing your CSS file and your beautiful color scheme. Also, Werner Heisenberg, if you’re still alive, please don’t sue me or my computer for libel. []
  3. If someone pretentious is annoying you, and you use the theorem generator instead, you could try something like this. []
  4. And check out the results. Also, pick up the unofficial arXiv vs. snarXiv wallpaper. []
  5. I first encountered these in freshman year of college in an assignment for CS51: Abstraction and Design in Computer Programming. We had to implement a CFG in LISP, and the cleverest won its author lunch at the faculty club. The eventual winner was my friend Matt Gline’s theorem generator, which has since been enhanced with LaTeX, commutative diagrams, ajax, and stuff like that. []

39 Responses to “The snarXiv” Comment Feed, Trackback.

  1. yanzhang says: Mar 25, 2010 @ 9:22 pm

    I like the fact you update at roughly the same frequency I do, yet with such a better design mentality. Added to feed.


  2. katiey says: Jun 03, 2010 @ 2:13 am

    arXiv vs. snarXiv just sucked me in for a good 10 minutes. The game and the site are hilarious.

  3. tklose says: Jun 03, 2010 @ 2:34 pm

    Hi, arXiv vs. snarXiv is pretty neat! Is there any plan to publish the scores of the real arXiv papers? Which ones are the most crackpotty (I’ve seen ones with 4/4 people think this is from snarXiv) and which ones are pretty clear? Which of the snarXiv titles manage to fool most?

  4. davidsd says: Jun 03, 2010 @ 3:06 pm


    Thanks! I am indeed keeping track of that data, and I’m hoping to figure out a good way to present it. Currently, arXiv vs. snarXiv displays random arXiv papers 70% of the time, and displays papers with multiple wrong guesses the other 30% of the time. There are over 130,000 hep-th/ph papers on the arXiv, so this seemed like a good tradeoff between eventually going through all the papers, and getting large amounts of data on particularly fake-looking papers. Whatever I do, though, I won’t be able to make a definitive list of the most crackpotty papers anytime soon, since many papers won’t get cycled through in the near future. For now, I may settle for a list of “some crackpotty-sounding papers.” There are already a few big names on it…

  5. linh says: Jun 03, 2010 @ 9:15 pm

    arXiv vs. snarXiv is an awesome way to keep oneself occupied during meetings :)

  6. Uncle Al says: Jun 04, 2010 @ 5:26 pm

    Where is the list of scores? I want to know if I am objectively stupid or a full-paid NSF Graduate Fellowship diversity matriculant. Two can play at this game,

    Do opposite shoes violate the Equivalence Principle?

  7. davidsd says: Jun 05, 2010 @ 3:45 am

    Alright, I implemented a high scores list. Currently the average score for everyone is about 61%.

  8. Chris says: Jun 05, 2010 @ 5:06 am

    The heuristic of choosing the paper with the longer title results in the right answer about ~63% of the time.

  9. tklose says: Jun 05, 2010 @ 7:50 am

    Wow, so there are already over 200000 questions played! Luminosity is ramping up pretty hard! Statistics on all 130000+ papers are coming in :D

  10. Pingback: arXiv vs. snarXiv « Strange Quark In London Jun 06, 2010 @ 6:09 am

    […] from the arXiv and which is genereated by a random title generator (more about which is explained here ). I can score about 66% in the long run (without using Google) how much do you […]

  11. finnw says: Jun 06, 2010 @ 7:48 am

    There are some other weaknesses as someone pointed out on FaceBook when I posted a link. Articles with dollar signs in the title are usually real. Titles mentioning “instantons”, “PDF” or “QFT” are fake about 75% of the time. Maybe you could bias the selection towards/away from certain keywords to offset this.

  12. Pingback: Anonymous Jun 06, 2010 @ 10:04 am

    […] […]

  13. Pingback: Physical Sciences & Engineering Blog » Blog Archive » snarXiv.org – Summer Fun Jun 07, 2010 @ 2:06 pm

    […] student in theoretical high-energy physics at Harvard University, created snarXiv.org, which he describes as “a random high-energy theory paper generator incorporating all the latest trends, entropic […]

  14. Pingback: A couple of links « Hyper tiling Jun 08, 2010 @ 6:34 am

    […] recently mentioned, I think it is only fair (pace Sokal) to suggest to check out the snarXiv, a spoof version of the well-known archive of scientific preprints (the arXiv). If you have a moment, play […]

  15. Akhil Mathew says: Jun 10, 2010 @ 3:25 pm

    This is hilarious!

    I knew I was bad at physics, but apparently, I’m worse than a monkey…

  16. city of the lits says: Jun 11, 2010 @ 10:06 am

    This reminds me of the Postmodernism Generator created by Andrew Bulhak in 1996. The similarities are quite striking. The PG operates by creating random journal articles that spoof the kind of papers that are regularly published in lit crit and philosophy journals with a postmodernist bent (quite the majority, in those fields :) Here’s a Wikipedia link for those interested, http://en.wikipedia.org/wiki/Postmodernism_Generator. As someone who spent a number of years studying philosophy at Sarah Lawrence College in the 80’s (probably the peak of PM insanity, at a school absolutely saturated by it :), I really got a tremendous illicit thrill from the PG. It reminded me of quite a few papers I read (and wrote!) at that time in my life. Ah, physics…(sigh of great relief)

  17. city of the lits says: Jun 11, 2010 @ 10:30 am

    “A few years ago, the CFG-based CS paper generator SCIgen made a splash by getting one of their papers accepted to the conference SCI 2005. Their website has details, and links to some other random generators around the web.”

    I see that Sokal (statistical thermodynamics, NYU) is briefly mentioned in a Pingback, but in case someone here isn’t familiar with the so-called “Sokal affair”, I’d like to mention Sokal’s paper entited “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity” which he submitted to the journal “Social Text” (they publish in the field of postmodern cultural studies) in 1996. The journal accepted this nonsensical paper, a piece so well-filled with postmodernist jargon and impenetrable arguments that a cursory glance at it by anyone with even a fleeting familiarity with Foucault or Derrida will howl with laughter. Highly recommended!

  18. iris\ says: Jun 12, 2010 @ 10:14 pm

    @ akhil
    referring to the high school biology of evolution and punning your worse-than-a-monkey stand on physics,
    Your-physics belongs in the monkey-verse
    (hyphens alluding to how your ancestor gave you the biomechanical legacy of apes)
    Have there been any such generators of academical biology…?

  19. John Baez says: Jun 19, 2010 @ 7:16 pm

    Currently the fake abstracts are easy to recognize as such, but maybe you could fix that by having a game where people tried to spot them, and keeping the ones that people were unable to spot as fake. Survival of the fittest!

  20. Pingback: Randomness and reality « The Lumber Room Jun 19, 2010 @ 11:00 pm

    […] “this is too weird to be generated by the grammar” — and failed.) You can read his About page for details. (“Sug­gested Uses for the snarXiv: [..] If you’re a grad­u­ate stu­dent, […]

  21. tklose says: Jun 21, 2010 @ 3:53 pm

    Hi, has any one article already won the crackpot award? John Baez’ idea for finding the best fake articles seems neat, too!

  22. davidsd says: Jun 21, 2010 @ 4:45 pm

    I’m hoping to do a full analysis of crackpottiness on the arXiv sometime soon. I’ll write a post here when I do. Update 9/17/10: Here it is! For now, the preliminary winner of the crackpot award is Secrets of the Metric In N=4 and N=2* Geometries [hep-th/0105235] with 0/10 correct guesses. Maybe a close second is Near-horizon brane-scan revived [arXiv:0804.3675] with 5/34 correct guesses.

    I like John’s idea too — it might be fun to implement. I also think there’s a lot of room just for optimizing the snarXiv to sound more like the arXiv. Currently, its vocabulary and sentence structures are limited to whatever popped into my head while I was writing the grammar. There’s no sense in which the adjectives that appear, for instance, represent a complete list of common adjectives in arXiv titles. And the abstracts grammar definitely doesn’t have enough flexibility to mimic everything that people tend to write. My sense is that many who figured out how to beat the snarXiv simply learned what it could spit out and what it couldn’t, and that was basically enough to distinguish it from the arXiv every time.

    Maybe I’ll find the time at some point to make improvements to the snarXiv grammar myself, but I was kind of hoping that some nerdy people might want to help out (I’m supposed to be doing research, you see :) ). The grammar is freely available online, and this post has instructions on how to set up the actual generator from the grammar.

  23. Trackback: Travels in a Mathematical World Jul 02, 2010 @ 5:14 am

    Carnival of Mathematics #67…

    In arXiv vs. snarXiv, the game is to say which of two article titles is from the real arXiv, a \”highly-automated electronic archive and distribution server for research articles\”, and which is from the spoof snarXiv, a \”ran­dom high-energy the…

  24. David Gerard says: Jul 11, 2010 @ 9:31 am

    Dude. Can we recruit you to RationalWiki? I added snarXiv to http://rationalwiki.org/wiki/arXiv , and Philip Gibbs has shown up to defend viXra …

  25. George says: Jul 27, 2010 @ 8:36 am

    Hi! snarXiv is so much fun. Thanks! I hope you implement some of those ideas mentioned. By the way, it is heretical in China
    Perhaps a misunderstanding :)

  26. davidsd says: Jul 27, 2010 @ 4:47 pm

    @George, That is awesome… Strong evidence that the snarXiv is an important tool in the search for truth.

  27. Pingback: arXiv versus snarXiv « Statisfaction Dec 16, 2010 @ 11:29 am

    […] pretty much the same, but the titles and the abstracts on snarXiv are generated randomly. To quote their “about” page, the arXiv is similar, but occasionally less random. Now, even if you’re not into high energy […]

  28. Pingback: Test Your Ignorance of Physics « The Folly of Human Conceits Aug 24, 2011 @ 2:44 pm

    […] snarXiv, according to its creator, “only gen­er­ates tan­ta­liz­ing titles and abstracts at the […]

  29. Pingback: Portal de la retórica posmoderna y cientificista | Carlos Reynoso Dec 22, 2011 @ 1:34 pm

    […] The snarXiv – Publica listas de artículos científicos con sus respectivos abstracts. Al principio puede […]

  30. Mitchell Porter says: Apr 24, 2012 @ 3:19 am

    I started a “snarxiv blog” in imitation of the “arxiv blog” (though right now it’s more of a “vixra blog”). I must have visited this site of yours in the past, but your identity didn’t register. However, recently there was this cool paper about conformal blocks, and now on a revisit to the homepage of the snarxiv’s creator, I see that’s *your* paper.

    Just to complete the circle, I was in touch with the author of the Postmodernism Server when it was being created – in its original version, you’ll sometimes see works by “Porter” in the bibliography; that’s me. It’s a small world. (In fact, I went to high school with one of the pioneers of small-world theory…)

  31. davidsd says: Apr 24, 2012 @ 7:37 am

    I had no idea there was a snarXiv blog, but now I am delighted that it exists! Also, I’m glad you enjoyed my conformal blocks paper.

  32. Daniel says: Aug 29, 2012 @ 3:14 am

    The high score list is senseless because it implies that the player should be able to discern sense from nonsense, when in fact a lot of real papers are not much better than randomly generated gibberish. Just think about the Bogdanov Affair.

  33. Felipe j. Llanes-Estrada says: Oct 26, 2012 @ 3:47 pm

    I find this hilarious and had great fun,
    but please be careful with a list of irrecognizably titled papers,
    you might inadvertently hurt someone. I have noticed that several ill-titled papers
    come from the far east where english skills vary in quality.
    Showing up in a prominent position in that list might hurt the career prospects of
    an otherwise valid colleague.

  34. Pingback: vetenskapliga textgeneratorer och stil | Vetenskaplig.se Oct 09, 2013 @ 9:54 am

    […] SnarXiv […]

  35. Pingback: The snarXiv vs arXiv « Um Passeio Aleatório Feb 24, 2014 @ 5:41 pm

    […] Eis então que descubro uma evolução do primitivo Gerador de Lero Lero: o snarXiv. Esse serviço, criado em 2010, tem como objetivo gerar aleatoriamente artigos científicos (mais precisamente apenas abstracts e títulos, por enquanto) de Física de altas energias através de um algoritmo que leva em conta as últimas tendências da área. Nas palavras do próprio criador (http://davidsd.org/2010/03/the-snarxiv/): […]

  36. Pingback: Scienza in grammelot – Ocasapiens - Blog - Repubblica.it Feb 25, 2014 @ 10:53 am

    […] (2) Messaggio privato: Laisse béton SnarXiv, Cyril, la physique en a besoin pour les poissons d’avril… […]

  37. Pingback: Diverses: 10 Jahre Rosetta, gefakte Artikel und lustige Titel, Wissenschaftsvideos und überschnelle Einschlagskrater – Astrodicticum Simplex Mar 02, 2014 @ 6:21 am

    […] ausprobieren. Die meisten kennen wahrscheinlich die Literaturdatenbank arXiv. Aber kennt ihr auch snarXiv? Dieses Programm generiert völlig falsche, aber doch oft erstaunlich echt klingende Titel von […]

  38. Pingback: Ode to SnarXiv | In the Dark Apr 30, 2014 @ 3:55 am

    […] So many things pass me by these days that I’m not usually surprised when I have no idea what people around me are talking about. I am however quite surprised that, until yesterday, never heard of the snarXiv. As its author explains: […]

  39. kevin says: Nov 16, 2014 @ 6:44 pm

    i noticed that you have the compton effect up on display up on the top banner. I thought it to myself so im not really sure what to do with that knowledge except say yeahhhhh something fishy is going on

Leave a Reply