It is traditional for carping blog posts to work their way through their subjects line by line or at least section by section, which is (now that I review the article) indeed highly tempting. In fact, let's yield to the temptation, though only for some sections: §2.1, §4.1, and §6.1, though I also think §6.2 is not so hot. I will also now observe that the whole thing is quite superficial and proceeds mostly by way of rhetorical questions.
(After having written this post I realize that I didn't find a place to include this interesting post on undefined behavior, so I'll just stick it here.)
§2.1, "The Dual Nature of Programs"
I'll just consider the first and fourth paragraphs of this section. The first:
Many authors (Moor 1978; Rapaport 2005b; Colburn 2004) discuss the so-called dual nature of programs. On the face of it, a program appears to have both a textual and a mechanical or process-like guise. As text, a program can be edited. But its manifestation on a machine-readable disk seems to have quite different properties. In particular, it can be executed on a physical machine. So according to the principle of the indiscernibility of identicals (§3.3), the two guises cannot be the same entity. Of course, anyone persuaded by this duality is under an obligation to say something about the relationship between these two apparent forms of existence.
What, one wonders, do the authors mean by this. Consider the following:
Some source code lives always and forever in human-readable form; it is never compiled to anything else. I believe that PHP is (or was) like this. In that case, the source code can be executed on a physical machine (by an interpreter, but that's immaterial).
Some source code is never directly executed; it must be compiled. The executable can be executed on a physical machine; the source code cannot. (Or at least, not in the same way the executable can; there exist polyglot programs.) There doesn't seem to be any reason at all to balk at calling the source and the executable different entities, any more than there would be a reason to balk at calling the recipe and the cake different entities. (Or, perhaps less contentiously, calling Der Zauberberg and The Magic Mountain different entities, whatever one's ontology of fictional works is, or score and performance. (It occurs to me that this is perhaps not less contentious. But it should be!)) I take it the second paragraph is supposed to tell us why this is in fact problematic, and it does seem that, say, byte compilation is more like a series of transformations all of the same sort than performing a musical score is. But the evocation of the supposed peculiar difficulty is pretty unconvincing (also I don't think anyone refers to compilation as implementation).
Nevertheless, the means by which the authors seek to draw attention to the intuitive difference between source code and executable is faulty. True, one can write a program on a piece of paper, and edit it with a pencil and an eraser (or one could punch holes in cards oneself, I guess, with a steady hand) but more often one writes them using a computer, in which case it has, of course, "a manifestation on a machine-readable disk", which is, obviously, editable. And a compiled program is also editable, though not as easily.
It's also worth noting that even an executable is only actually capable of being run given a certain context; even something that has no language runtime or links to no external libraries will only be runnable by an operating system that understands its format (or by an emulator that does). This is independent of its representation on disk. (And for that matter its representation on disk is still the wrong thing to draw our attention to, because what we would presumably like to call the same file will be represented on a disk differently depending on the filesystem used.)
The fourth paragraph:
A slightly different perspective on these issues starts from the question of program identity. When are two programs taken to be the same? Such issues arise for example in attempts to determine the legal identity of a piece of software. If we identify a program with its textual manifestation then the identity of a program is sensitive to changes in its appearance (e.g. changing the font). Evidently, it is not the text alone that provides us with any philosophically interesting notion of program identity. Rather, to reach an informed criterion of identity we need to take more account of semantics and implementation. We shall return to this subject in §3 and §6.
This whole post actually really only exists because of the egregious badness of the fourth sentence in this paragraph: were it not for that I'd have found the article baffling but probably not been motivated to write about it. Here is the issue: the font in which source code is displayed is not a property of its text, any more than (e.g.) where the line wrapping occurs, or the color in which the text is displayed is. This is fairly close to exactly equivalent to arguing that if we individuate programs by their textual manifestations, then looking at your monitor through sunglasses creates a different program.
A much better example of a trivial change which we might not want to claim leads to a difference in program identity is the insertion of grammatically irrelevant characters—in many languages this can be done by fiddling with indentation or lineation—or by changing representations of characters. (Or even the insertion of comments, which are grammatically relevant.) But in the first case the reason for denying that they change the identity of the program seems close at hand—they're grammatically irrelevant, so they aren't really there at all. (Consider Lisp programmers, who traditionally incline to the view that the program is the AST, and that they're just writing a representation of the AST, and consequently occasionally claim that Lisp has no parentheses.) And really one can imagine a whole host of at least diverting questions here:
Have you got the same program after something that doesn't affect the AST but does make things clearer to the reader (e.g. alpha conversion (in the absence of quotation or other tricky things), or even just comments)?
After local changes that do affect the AST but shouldn't affect the semantics (e.g. eta conversion, though I believe in some type systems this can lead to different inferred types)?
After global changes that radically affect the AST but shouldn't affect the semantics (e.g. conversion to CPS in a language in which that won't completely trash the stack)?
Does the same source code compiled with different compiler optimization options produce different programs?
Is a polyglot quine the same program as itself?
§4.1, "Proofs in Computer Science"
It is averred that proofs in computer science are shallow and uninteresting, and do not employ mathematical abstraction, and this distinguishes them (to their discredit) from mathematical proofs. The sole example of a proof in computer science is one of the correctness of a straightforward algorithm for computing a factorial. It is obvious that all arguments in CS are about on that level, especially as regards abstraction. Or, I guess, look here. (I also wish to link to this post, which includes the marginally hilarious function definition
memoize :: Functor f => (forall a. (n -> a, [a]) -> f a) -> T f n memoize f = let x = prop7 (f . lemma) in T (yoneda (fst x)) (memoize (snd x))
.) The cited statement about proofs in CS also seems hard to reconcile with the authors' reference to sophisticated type systems later, since one of the things people interested in sophisticated type systems are often interested in establishing is that the type system is sound and the type checker will terminate, which—as I understand it—tends to involve equally sophisticated logical systems.
First, note the sentence "A more logical approach to the analysis of abstraction, that does have some strong advocacy (Wright 1983; Hale 1987).". Of course. We continue in a vein that makes one wonder if the authors actually know how "abstraction" is typically used in a CS context:
the conceptual investigation of abstraction in computer science is in its infancy. Colburn (2007) suggests that the distinction between abstraction in mathematics and abstraction in computer science lies in the fact that in mathematics abstraction is information neglect whereas in computer science it is information hiding. That is, abstractions in mathematics ignore what is judged to be irrelevant (e.g. the colour of similar triangles). In contrast, in computer science, any details that are ignored at one level of abstraction (e.g. Java programmers need not worry about the precise location in memory associated with a particular variable) must not be ignored by one of the lower levels (e.g. the virtual machine handles all memory allocations).
This is a shockingly terrible example of abstraction (but it seems to be a matter of information neglect on the part of the programmer), being just about the least abstract one could be. It doesn't even qualify as a toy example. One might also observe that frequently one wants to combine two values of the same type in such a way that the combining operation has a "default" that acts as an identity, without caring about precisely what the combination comes to or what the default is, and abstract over that. Naturally in order for this to be useful some specific operations will have to be defined—what would be the point, otherwise? But what would be the point of the exact same abstraction in mathematics if there were no particular mathematical structures that fulfilled it?