Monday, February 1, 2010

Literate Programming











During a conversation about programming methodologies with a network guru friend of mine, who also happens to be one of the best practical utility programmers I've ever met, I brought up the subject of Literate Programming. He glanced at me quizzically and asked "What's the opposite of Literate Programming? Illiterate Programming?". In some ways his off the cuff (and somewhat derisive) joke sums up the reason I've had an attraction to the concepts of Literate Programming for years.

After all, what are the definitions of the word literate?

Merriam-Webster's Online Dictionary has the following definitions:

1 a
: educated, cultured b : able to read and write

2 a : versed in literature or creative writing : literary b : lucid, polished c : having knowledge or competence

Too much computer language source code and documentation exhibits neither education nor culture, is both unreadable and shows little ability to write, and is not creative, lucid, or polished.

But what exactly constitutes Literate Programming with an upper case L and P?

Literate Programming was a system created by Donald Knuth during the 1970s, and made public in his 1983 book of the same name. Knuth is one of the most important figures in computer science. He wrote the multi-volume work The Art of Computer Science, and is the creator of the TEX formatting and typesetting language.

Knuth's premise in putting forth the principles of Literate Programming was that programs should be written in a manner which supports comprehension by the humans who are writing, reading, and maintaining the program, not primarily written to be shoe-horned into a format which meets the execution needs of the computer itself.

In his own words:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Donald Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99.

The way Knuth accomplished this was to devise a system of programming whereby two types of document could be extracted from a single source document.

One document would be a human readable article explaining how the program works, typeset and attractive to view. It would include the machine readable source code within it, but arranged and formatted such that the article serves the needs of the reader to understand the how the program operates, not primarily arranged to serve the compilation and execution needs of the machine.

The second document would be oriented towards the need of the machine. All the necessary boilerplate code would be in its proper place so that the compiler or interpreter could process it and perform the tasks for which the program was intended.

Knuth chose the name "weave" for the utility which produces the human oriented article, and the name "tangle" for the utility which created the machine readable code.

While Literate Programming hasn't exactly become the dominant programming system in the information technology world, it has attracted many enthusiasts over the years, and as computer languages, technology, and standards have evolved, new tools and approaches have been created.

Knuth's original system was known as WEB, and its weave produced output in his typesetting language TEX, and its tangle produced pascal code.

Since then some of the tools and systems have included:

CWEB
nuweb
funnelweb
Various SGML and XML based approaches

If you are interested in learning more about this fascinating and useful approach to software development and documentation, there are many good online sources.
The website Literate Programming is a good starting point.
The Wikipedia article on Literate Programming is also useful.

One particular article which I've enjoyed, since perl is my language of choice, is Mark Jason Dominus's article explaining why document extraction tools like perl's POD do not constitute literate programming.

Literate Programming is an approach which is worth exploring by anyone who is serious about producing good quality software and documentation.

No comments:

Post a Comment