I have started reading, when not reading philosophy, writing, or grading, Steven Hall's The Raw Shark Texts. It's ... not that great, really. It sort of wants to be Snow Crash except with MEMES!!! and no cyberpunk elements. Well, whatever. There is a puzzle with which the narrator is confronted: an earlier, pre-amnesia version of himself (don't ask) has left a message encoded for him in a curious wise. First, it's in morse code according to a recorded light going on or off. (Shades of Cryptonomicon.) That's not so interesting. The interesting part is that the morse-encoded letter stands for one of eight other letters, depending on the layout of the QWERTY keyboard. So, say, "b" could be any of: v, n,r, t, y, f, g, h. The eight adjacent letters, wrapping around, with the two lower rows shifted hard to the left to make a grid. (This makes what do with p or l or the like a little underdefined, but one can stipulate a solution.) The narrator notes that it's difficult to decode because you can't do it piecewise—you'll only be able to tell whether you've begun correctly by verifying that that beginning allows you to get all the way to the end. For instance: if you get something like "I would like to tell you that", and then are left over with one letter, somewhere you've made a mistake.
This looks, thought I, like a job for backtracking! Surely Our Hero could just write up a little program to parse his text into words and then check to see if any of them made a lick of sense. If I still had SICP, or had ever really learned Scheme, or Icon, or any language with continuations, I would presumably know how to do this sort of problem in my sleep, but I seem to have come up with something that works ok using only Python's generators. If Our Hero tried that strategy he'd probably have to come up with a better algorithm and a better-regulated dictionary, though: run on the string "usnrzpo", I did get the intended string—"i am tall"—back out, but it was one of 45983, and I suspect that run on a text of, say, a hundred or so characters it would be nigh interminable. And not only that, "I am tall" wasn't even the only comprehensible result: maybe the message is recommending a game ("netball") or calumniating an airport ("hate lax") or expressing confidence in the sender's magical powers ("mage am i"). With a longer input string there would be more ways for multiple comprehensible outputs to occur. So even if you do get to the end, you can't be certain that that's the right message, since it could just be coincidental.
Well, I'm not familiar with the book, but surely as the length and complexity of the message grows, the number of comprehensible outputs will dwindle? - although we may need some additional assumptions here (the sender intends the message as a whole to convey a single message rather than a number of short ones, speaks standard English, doesn't make spelling errors, etc.).
Also, with a longer code, one could use the statistical distributions of letters in English to guide the search, reducing the complexity even if the search space is larger . . .
Posted by: horus kemwer | December 11, 2007 at 12:16 PM