The snarXiv

The snarXiv is a random high-energy theory paper generator incorporating all the latest trends, entropic reasoning, and exciting moduli spaces. The arXiv is similar, but occasionally less random.1

Actually, the snarXiv only generates tantalizing titles and abstracts at the moment, while the arXiv delivers matching papers as well. Details of the implementation are below.2 I'm the author, and I don't remember exactly why I decided to do this. I did already have the framework lying around from a previous project, and I swear I spent more time doing research last weekend than implementing snarXiv.org.

Suggested Uses for the snarXiv3

Context-Free Grammars

The snarXiv is based on a context free grammar (CFG) — basically a set of rules for computer-generated mad libs.5 Each rule in a CFG consists of a term, and a set of choices for how to make that term. The choices can contain text, or other terms, or even refer recursively to the term being defined. The CFG syntax used on the snarXiv is a collection of statements "term ::= choices", where choices is a list of possibilities separated by "|". Some possibilities are just text, but the ones that look like "<newterm>" are directions to go find the definition for newterm and fill it in. For instance, the following grammar

nounphrase ::= <noun> | <adj> <adj> <noun> | super <nounphrase>
noun ::= apple | pear | mailman
adj ::= smelly | chartreuse | enormous

can produce nounphrases like "apple," "enormous smelly mailman," or "super super smelly chartreuse mailman." The snarxiv's grammar is 622 lines long, and ends like this:

...
morecomments ::= <smallinteger> figures | JHEP style | Latex file
| no figures | BibTeX | JHEP3 | typos corrected
| <nzdigit> tables | added refs | minor changes
| minor corrections | published in PRD
| reference added | pdflatex
| based on a talk given on <physicistname>'s <nzdigit>0th birthday
| talk presented at the international <pluralphysconcept> workshop
comments ::= <smallinteger> pages | <comments>, <morecomments>

primarysubj ::= High Energy Physics - Theory (hep-th)|
High Energy Physics - Phenomenology (hep-ph)|
secondarysubj ::= Nuclear Theory (nucl-th)|
Cosmology and Extragalactic Astrophysics (astro-ph.CO)|
General Relativity and Quantum Cosmology (gr-qc)|
Statistical Mechanics (cond-mat.stat-mech)
papersubjects ::= <primarysubj> | <papersubjects>; <secondarysubj>

paper ::= <title> \\ <authors> \\ <comments> \\ <papersubjects> \\ <abstract>
...

The coolest and most natural thing to do with a CFG is exploit recursiveness as much as possible. The more recursion built in, the less predictable and richer the output. For instance, the following definition of a "space" has three rules: space, singspace, pluralspace, which refer recursively to each other in many different ways, allowing for a huge number of possibilities.

space ::= <pluralspace> | <singspace> | <mathspace>

singspace ::= a <spacetype> | a <spaceadj> <spacetype>
| <properspacename> | <spaceadj> <properspacename>
| <mathspace> | <mathspace>
| a <bundletype> bundle over <space>
| <singspace> fibered over <singspace>
| the moduli space of <pluralspace>
| a <spacetype> <spaceproperty>
| the <spacepart> of <space>
| a <group> <groupaction> of <singspace>
| the near horizon geometry of <singspace>
pluralspace ::= <spacetype>s | <spaceadj> <spacetype>s
| <n> copies of <mathspace>
| <pluralspace> fibered over <space>
| <spacetype>s <spaceproperty>
| <bundletype> bundles over <space>
| moduli spaces of <pluralspace>
| <group> <groupaction>s of <pluralspace>


Of course, there's also a danger that in a very small number of cases the output might be a little pathological. The nounphrase example above, for instance, can produce any phrase of the form "super super ... super enormous pear." The snarXiv similarly occasionally mentions QFTs living on "the moduli space of moduli spaces of moduli spaces of moduli spaces of moduli spaces of SU(3) bundles over elliptically fibered Enriques surfaces." Too much recursion can also quickly lead to exponentially long abstracts, which are even harder to read all the way through than the usual ones on the arXiv.

The Guts

To get some actual output from the grammar definition, the most straightforward thing would be to write a script that reads in the grammar, and works its way down the tree, starting with the top term, filling in definitions recursively until it gets a block of text. Instead of using an external script, the snarXiv compiles each grammar into its own program, a technique that originated from a freshman CS project and evolved minimally from there — it's less straightforward, not clearly better, but maybe a bit more fun. A perl script compiles the grammar file into OCaml code (snarxiv.ml):

type phrase = Str of string | Opts of phrase array array

let _ = Random.self_init ()

let randelt a = a.(Random.int (Array.length a))
let rec print phr = match phr with
Str s -> print_string s
| Opts options ->
let parts = randelt options in
Array.iter print parts

(* Grammar definitions *)
let rec top = Opts [|
[| paper;|];
|]

...

and comments = Opts [|
[| smallinteger; Str " pages";|];
[| comments; Str ", "; morecomments;|];
|]

and primarysubj = Opts [|
[| Str "High Energy Physics - Theory (hep-th)";|];
[| Str "High Energy Physics - Phenomenology (hep-ph)";|];
|]

and secondarysubj = Opts [|
[| Str "Nuclear Theory (nucl-th)";|];
[| Str "Cosmology and Extragalactic Astrophysics (astro-ph.CO)";|];
[| Str "General Relativity and Quantum Cosmology (gr-qc)";|];
[| Str "Statistical Mechanics (cond-mat.stat-mech)";|];
|]

and papersubjects = Opts [|
[| primarysubj;|];
[| papersubjects; Str "; "; secondarysubj;|];
|]

and paper = Opts [|
[| title; Str " \\\\ "; authors; Str " \\\\ "; comments; Str " \\\\ "; papersubjects; Str " \\\\ "; abstract; Str " ";|];
|]

let _ = print top
let _ = print_string "\n"

And snarxiv.ml is now a specialized program that, when compiled and run, spits out a paper title and abstract. This setup is more elaborate than necessary, but OCaml is a lovely language for recursive structures, and the code is nice and simple. OCaml is also fast, allowing the snarXiv to generate papers even more swiftly than your favorite python script, or Ed Witten in the 80's.

Other CFGs

A few years ago, the CFG-based CS paper generator SCIgen made a splash by getting one of their papers accepted to the conference SCI 2005. Their website has details, and links to some other random generators around the web.

  1. For those who aren't high energy physicists, and are still interested (though I can't imagine who that would be), the "X" in arXiv or snarXiv is supposed to be a greek chi. We're meant to pronounce them like archive (as in "archive of physics papers") and snarchive (as in "snarky archive of physics papers").
  2. Please don't sue me, arXiv.org, for stealing your CSS file and your beautiful color scheme. Also, Werner Heisenberg, if you're still alive, please don't sue me or my computer for libel.
  3. If someone pretentious is annoying you, and you use the theorem generator instead, you could try something like this.
  4. And check out the results. Also, pick up the unofficial arXiv vs. snarXiv wallpaper.
  5. I first encountered these in freshman year of college in an assignment for CS51: Abstraction and Design in Computer Programming. We had to implement a CFG in LISP, and the cleverest won its author lunch at the faculty club. The eventual winner was my friend Matt Gline's theorem generator, which has since been enhanced with LaTeX, commutative diagrams, ajax, and stuff like that.

Comments

  1. yanzhangMar 25 2010 21:22:51
    I like the fact you update at roughly the same frequency I do, yet with such a better design mentality. Added to feed. -Y
  2. katieyJun 3 2010 02:13:35
    arXiv vs. snarXiv just sucked me in for a good 10 minutes. The game and the site are hilarious.
  3. tkloseJun 3 2010 14:34:57
    Hi, arXiv vs. snarXiv is pretty neat! Is there any plan to publish the scores of the real arXiv papers? Which ones are the most crackpotty (I've seen ones with 4/4 people think this is from snarXiv) and which ones are pretty clear? Which of the snarXiv titles manage to fool most?
  4. davidsdJun 3 2010 15:06:34
    tklose, Thanks! I am indeed keeping track of that data, and I'm hoping to figure out a good way to present it. Currently, arXiv vs. snarXiv displays random arXiv papers 70% of the time, and displays papers with multiple wrong guesses the other 30% of the time. There are over 130,000 hep-th/ph papers on the arXiv, so this seemed like a good tradeoff between eventually going through all the papers, and getting large amounts of data on particularly fake-looking papers. Whatever I do, though, I won't be able to make a definitive list of the most crackpotty papers anytime soon, since many papers won't get cycled through in the near future. For now, I may settle for a list of "some crackpotty-sounding papers." There are already a few big names on it...
  5. linhJun 3 2010 21:15:39
    arXiv vs. snarXiv is an awesome way to keep oneself occupied during meetings :)
  6. Uncle AlJun 4 2010 17:26:07
    Where is the list of scores? I want to know if I am objectively stupid or a full-paid NSF Graduate Fellowship diversity matriculant. Two can play at this game, http://www.mazepath.com/uncleal/erotor1.jpg Do opposite shoes violate the Equivalence Principle?
  7. davidsdJun 5 2010 03:45:40
    Alright, I implemented a high scores list. Currently the average score for everyone is about 61%.
  8. ChrisJun 5 2010 05:06:42
    The heuristic of choosing the paper with the longer title results in the right answer about ~63% of the time.
  9. tkloseJun 5 2010 07:50:31
    Wow, so there are already over 200000 questions played! Luminosity is ramping up pretty hard! Statistics on all 130000+ papers are coming in :D
  10. finnwJun 6 2010 07:48:49
    There are some other weaknesses as someone pointed out on FaceBook when I posted a link. Articles with dollar signs in the title are usually real. Titles mentioning "instantons", "PDF" or "QFT" are fake about 75% of the time. Maybe you could bias the selection towards/away from certain keywords to offset this.
  11. Akhil MathewJun 10 2010 15:25:07

    This is hilarious!

    I knew I was bad at physics, but apparently, I'm worse than a monkey...

  12. city of the litsJun 11 2010 10:06:02
    This reminds me of the Postmodernism Generator created by Andrew Bulhak in 1996. The similarities are quite striking. The PG operates by creating random journal articles that spoof the kind of papers that are regularly published in lit crit and philosophy journals with a postmodernist bent (quite the majority, in those fields :) Here's a Wikipedia link for those interested, http://en.wikipedia.org/wiki/Postmodernism_Generator. As someone who spent a number of years studying philosophy at Sarah Lawrence College in the 80's (probably the peak of PM insanity, at a school absolutely saturated by it :), I really got a tremendous illicit thrill from the PG. It reminded me of quite a few papers I read (and wrote!) at that time in my life. Ah, physics...(sigh of great relief)
  13. city of the litsJun 11 2010 10:30:48
    "A few years ago, the CFG-based CS paper generator SCIgen made a splash by getting one of their papers accepted to the conference SCI 2005. Their website has details, and links to some other random generators around the web." I see that Sokal (statistical thermodynamics, NYU) is briefly mentioned in a Pingback, but in case someone here isn't familiar with the so-called "Sokal affair", I'd like to mention Sokal's paper entited "Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity" which he submitted to the journal "Social Text" (they publish in the field of postmodern cultural studies) in 1996. The journal accepted this nonsensical paper, a piece so well-filled with postmodernist jargon and impenetrable arguments that a cursory glance at it by anyone with even a fleeting familiarity with Foucault or Derrida will howl with laughter. Highly recommended!
  14. iris\Jun 12 2010 22:14:31
    @ akhil referring to the high school biology of evolution and punning your worse-than-a-monkey stand on physics, Your-physics belongs in the monkey-verse (hyphens alluding to how your ancestor gave you the biomechanical legacy of apes) Have there been any such generators of academical biology...?
  15. davidsdJun 21 2010 16:45:25
    I'm hoping to do a full analysis of crackpottiness on the arXiv sometime soon. I'll write a post here when I do. Update 9/17/10: Here it is! For now, the preliminary winner of the crackpot award is Secrets of the Metric In N=4 and N=2* Geometries [hep-th/0105235] with 0/10 correct guesses. Maybe a close second is Near-horizon brane-scan revived [arXiv:0804.3675] with 5/34 correct guesses. I like John's idea too -- it might be fun to implement. I also think there's a lot of room just for optimizing the snarXiv to sound more like the arXiv. Currently, its vocabulary and sentence structures are limited to whatever popped into my head while I was writing the grammar. There's no sense in which the adjectives that appear, for instance, represent a complete list of common adjectives in arXiv titles. And the abstracts grammar definitely doesn't have enough flexibility to mimic everything that people tend to write. My sense is that many who figured out how to beat the snarXiv simply learned what it could spit out and what it couldn't, and that was basically enough to distinguish it from the arXiv every time. Maybe I'll find the time at some point to make improvements to the snarXiv grammar myself, but I was kind of hoping that some nerdy people might want to help out (I'm supposed to be doing research, you see :) ). The grammar is freely available online, and this post has instructions on how to set up the actual generator from the grammar.
  16. John BaezJun 19 2010 19:16:56
    Currently the fake abstracts are easy to recognize as such, but maybe you could fix that by having a game where people tried to spot them, and keeping the ones that people were unable to spot as fake. Survival of the fittest!
  17. tkloseJun 21 2010 15:53:18
    Hi, has any one article already won the crackpot award? John Baez' idea for finding the best fake articles seems neat, too!
  18. David GerardJul 11 2010 09:31:11
    Dude. Can we recruit you to RationalWiki? I added snarXiv to http://rationalwiki.org/wiki/arXiv , and Philip Gibbs has shown up to defend viXra ...
  19. GeorgeJul 27 2010 08:36:43
    Hi! snarXiv is so much fun. Thanks! I hope you implement some of those ideas mentioned. By the way, it is heretical in China www.websitepulse.com/help/testtools.china-test.html Perhaps a misunderstanding :)
  20. davidsdJul 27 2010 16:47:27
    @George, That is awesome... Strong evidence that the snarXiv is an important tool in the search for truth.
  21. davidsdApr 24 2012 07:37:50
    I had no idea there was a snarXiv blog, but now I am delighted that it exists! Also, I'm glad you enjoyed my conformal blocks paper.
  22. Mitchell PorterApr 24 2012 03:19:13
    I started a "snarxiv blog" in imitation of the "arxiv blog" (though right now it's more of a "vixra blog"). I must have visited this site of yours in the past, but your identity didn't register. However, recently there was this cool paper about conformal blocks, and now on a revisit to the homepage of the snarxiv's creator, I see that's *your* paper. Just to complete the circle, I was in touch with the author of the Postmodernism Server when it was being created - in its original version, you'll sometimes see works by "Porter" in the bibliography; that's me. It's a small world. (In fact, I went to high school with one of the pioneers of small-world theory...)
  23. Felipe j. Llanes-EstradaOct 26 2012 15:47:04
    I find this hilarious and had great fun, but please be careful with a list of irrecognizably titled papers, you might inadvertently hurt someone. I have noticed that several ill-titled papers come from the far east where english skills vary in quality. Showing up in a prominent position in that list might hurt the career prospects of an otherwise valid colleague.
  24. DanielAug 29 2012 03:14:53
    The high score list is senseless because it implies that the player should be able to discern sense from nonsense, when in fact a lot of real papers are not much better than randomly generated gibberish. Just think about the Bogdanov Affair.
  25. kevinNov 16 2014 18:44:02
    i noticed that you have the compton effect up on display up on the top banner. I thought it to myself so im not really sure what to do with that knowledge except say yeahhhhh something fishy is going on