The combination of the concept of arrows and Python's subprocess and multiprocessing libraries suggests the possibility of a compact and efficient mini-language for expressing shell pipelines in code. One could imagine, that is, a function that took a specification for a process to be run and returned an arrow that could be combined with other such arrows, all of them eventually to be run, something like (to use Haskell syntax though I'm really thinking of a Python library, and assuming that the names "diff", etc, represent the curried application of this imagined arrow-producing function):
runProcessA $ hgdiff "somefile" >>> ((grep "^+") &&& (grep "^-")) >>> (first $ wc "-l")
(which would return a 2-tuple whose first element is the number of lines added to the file, and whose second is the lines taken from the file), where the fan-out operator "&&&" would take care of properly distributing its input to the input of the processes that are its argument (which does not seem too hard: reading from the pipe representing its input, creating a multiprocessing.Pipe for each of its argument processes, and writing the input read to it; that input in the function that actually runs the subprocess then being written to the pipe to that subprocess) and capturing their output and passing it along to whatever's next in the chain (at the moment this seems trickier). While the basic concept of a pipeline doesn't, obviously, require multiprocessing, the use of the arrow syntax to express the fanning-out of the same input to multiple child processes is not only pleasingly compact but also, or so it seems, would offer a built-in annotation for when multiprocessing can be used and parallelism exploited. (One can conceive of employing arrow laws for optimization purposes here, even. In fact, depending on how we can define first
, (***)
, etc., and if we can conceive of the processes as purely producing output for one another (rather than affecting global state that would potentially affect reordering)—which is obviously questionable!—then we could rewrite the above as runProcessA $ hgdiff "somefile" >>> ((grep "^+" >>> wc "-l") &&& (grep "^-"))
* and do other similar transformations, which, I don't know, could be advantageous.)
I'm sure that the basic idea here has been worked out in great detail by real actual Haskell-heads. Lord knows I don't want to try to wrangle with actually implementing anything like this in Python at the moment: pressing issues concerning barn facades and Megarians confront me.
* Reasoning thus: f &&& g
can be expressed as arr dup >>> (f *** (arr id)) >>> ((arr id) *** g)
(where dup
duplicates its input and arr
lifts a pure function into an arrow), and, if there's an independent definition of (***)
, first
can be defined as first f = (f *** (arr id))
, so that (f &&& g) >>> h = arr dup >>> (f *** (arr id)) >>> ((arr id) *** g) >>> (h *** (arr id))
, but as those arr id
s make clear, g
does not affect the transfer of input from f
to h
, so we can (one would need a definition of (***)
to actually prove this, natch) rewrite that as arr dup >>> (f *** (arr id)) >>> (h *** (arr id)) >>> ((arr id) *** g)
, then to arr dup >>> ((f >>> h) *** (arr id)) >>> ((arr id) *** g)
. And if that's legitimate, it's equivalent to ((f >>> h) &&& g)
.
Comments