Wednesday, February 24, 2010

The end of OpenSolaris?

slashdot linked to this interesting article about Oracle's hazy plans for OpenSolaris, with some speculation that they may be shutting down support for it. Having worked in a solaris environment for fifteen years, this makes me sort of sad. Even though I don't use OpenSolaris, I recognize that several relatively recent solaris features are groundbreaking (particularly dtrace and zfs).

Friday, February 12, 2010

Kevin Rose on Buzz

Given that Buzz is in the face of gmail users at the moment it doesn't exactly qualify as "Off the Beaten Path". It is, however, an important development in communications technology because of the ubiquitous nature of Google, and because of the dramatic way they launched it.

My first impulse is that it was nothing more than an attempt to do a frontal attack on Facebook. Kevin Rose made this video which puts forward another view, that Buzz is primarily about giving Google a more sophisticated data mining pool for customer information.

Tuesday, February 9, 2010

Putting Together Literate Programming Tools -- tangle


One of the best ways for me to acquire knowledge and skills on a subject is to agree to a project or talk on that topic. Learning about literate programming has been a half baked project of mine for years, so in order to get off my posterior with respect to the project I volunteered to do a talk at the Atlanta Perlmongers on February 25th, then decided that given the amount of work involved in this I may as well do modified versions of the talk before other interested audiences, so I agreed to talk at the Atlanta Linux Enthusiasts meeting on March 18th.

I wrote a brief overview of literate programming in a previous post.

I decided that my approach would be to put together some simple tools. Since I'm somewhat perl centric I decided to do as much of the project as I could realistically finish by late February using perl (and to make the target code for my tangle a perl script as well).

After reading Norman Walsh's article Literate Programming in XML I decided that docbook made a great deal of sense as the markup language for the source file. It was devised for technical articles, has a rich set of elements, and there are many tools for working with that markup language (including many xsl style sheets to facilitate conversion to xhtml, pdf, and other formats).

At first I devised an XML tag which included as attributes both a position tag and a file tag to allow for multi-file projects, but then decided to simplify it to the same sort of "id" tag Norman Walsh used in his article for my first few passes over the project.

Since "id" is already a docbook attribute for a tag I was using, and since I wanted to visually simplify things for the talks, I called my tag sourcecode, and the attribute fragment_id.

I'm going to talk a little bit about my "tangle" utility here which I named "tangy.pl", and will talk about my "weave" in the next posting.

The concepts behind tangy.pl are not very complicated, at least in the manner in which I'm approaching it. The developer has the fragments of machine usable code scattered across a larger document in jumbled order, and the tangle has to put them together in the order needed by the machine. I'm going to show you the code for my first two passes over this utility. Bear in mind that my testing has been minimal, but it seems to work for my purposes now (and if you see any problems with it please post a comment).

On my first pass, I used the most primitive approach imaginable, and made the attribute "position" hold an integer representing the fixed position of the code fragment within the tangled file. Since I'm pulling data from presumably well formed XML, I decided to use the perl module XML::LibXML which seems to work for my purposes. Here's the primitive version of the utility:



#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $source_file = shift;

my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($source_file);
my $xpc = XML::LibXML::XPathContext->new($tree);
my @n = $xpc->findnodes('//sourcecode');

my @tanglefile;

foreach my $nod (@n) {
$tanglefile[$nod->getAttribute('position')] = \
$nod->getFirstChild()->getData;
}

foreach my $fragment (@tanglefile) {
unless ($fragment =~ /^$/) {
print $fragment;
}
}



All this code does is create an XML parser, build a tree, and shove the source code fragments in an array based on the contents of the position parameter. The obvious problem with this is if you rearrange the code, you have to go back through the code renumbering every position attribute . Whether you do it by hand or automate that process it's a Rube Goldbergish solution.

So I used a solution somewhat like Norman Walsh's more purely XML solution. I renamed the attribute fragment_id. The most straightforward way I could come up with for being able to rapidly rearrange and edit the instructions for tangling was to create a separate file named outline. Assuming I name the fragment_id attributes descriptively and sensibly I can insert and rearrange rapidly. Here's the (rapidly tested) code I came up with to do the tangle using the outline file:

#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $source_file = shift;

my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($source_file);
my $xpc = XML::LibXML::XPathContext->new($tree);
my @n = $xpc->findnodes('//sourcecode');

my %tanglefile;
my @outline;

foreach my $nod (@n) {
$tanglefile{$nod->getAttribute('fragment_id')} = \
$nod->getFirstChild()->getData;
}

open OUTLINE, 'outline'
or die "Can't open the outline file: $!";

while (<OUTLINE>) {
chomp;
push @outline, $_;
}


foreach my $id (@outline) {
print $tanglefile{$id};
}


This code is identical to my earlier version, except that
instead of storing the fragments in an array it creates a hash
with the key=>attribute scheme fragment_id => content, and uses
the order of the outline file to walk through the hash printing
out the machine usable code in the correct order.

This is just the beginning stages of this tangle utility. Other things
on my todo list are to add command line args so the name of the
outline file can be specified on the command line, to put the
"file" attribute back in to support multi-file projects, to convert
the standalone utility to a method within a module, and to write unit tests for that module.

I've set up a google code project called

literate-programming-tools

to make the progress of this little project available. So far I've posted these scripts and some unfinished (but usable for tests) docbook.

If you see any problems with anything I've presented or done thus far either put a comment here, or get in touch with me.

Saturday, February 6, 2010

Demon Sheep and Going Viral

In general I plan on keeping this blog out of the partisan political arena except on issues which directly impact the world of technology. But I can't resist commenting on Carly Fiorina's "Demon Sheep" video. It demonstrates what happens when you have just the right mix of camcorder, livestock, Monty Python clipart, minimal costuming skills, and ample quantities of powerful hallucinogenic drugs. If you haven't seen this masterpiece of modern political film yet, enjoy the link.

Monday, February 1, 2010

Literate Programming











During a conversation about programming methodologies with a network guru friend of mine, who also happens to be one of the best practical utility programmers I've ever met, I brought up the subject of Literate Programming. He glanced at me quizzically and asked "What's the opposite of Literate Programming? Illiterate Programming?". In some ways his off the cuff (and somewhat derisive) joke sums up the reason I've had an attraction to the concepts of Literate Programming for years.

After all, what are the definitions of the word literate?

Merriam-Webster's Online Dictionary has the following definitions:

1 a
: educated, cultured b : able to read and write

2 a : versed in literature or creative writing : literary b : lucid, polished c : having knowledge or competence

Too much computer language source code and documentation exhibits neither education nor culture, is both unreadable and shows little ability to write, and is not creative, lucid, or polished.

But what exactly constitutes Literate Programming with an upper case L and P?

Literate Programming was a system created by Donald Knuth during the 1970s, and made public in his 1983 book of the same name. Knuth is one of the most important figures in computer science. He wrote the multi-volume work The Art of Computer Science, and is the creator of the TEX formatting and typesetting language.

Knuth's premise in putting forth the principles of Literate Programming was that programs should be written in a manner which supports comprehension by the humans who are writing, reading, and maintaining the program, not primarily written to be shoe-horned into a format which meets the execution needs of the computer itself.

In his own words:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Donald Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99.

The way Knuth accomplished this was to devise a system of programming whereby two types of document could be extracted from a single source document.

One document would be a human readable article explaining how the program works, typeset and attractive to view. It would include the machine readable source code within it, but arranged and formatted such that the article serves the needs of the reader to understand the how the program operates, not primarily arranged to serve the compilation and execution needs of the machine.

The second document would be oriented towards the need of the machine. All the necessary boilerplate code would be in its proper place so that the compiler or interpreter could process it and perform the tasks for which the program was intended.

Knuth chose the name "weave" for the utility which produces the human oriented article, and the name "tangle" for the utility which created the machine readable code.

While Literate Programming hasn't exactly become the dominant programming system in the information technology world, it has attracted many enthusiasts over the years, and as computer languages, technology, and standards have evolved, new tools and approaches have been created.

Knuth's original system was known as WEB, and its weave produced output in his typesetting language TEX, and its tangle produced pascal code.

Since then some of the tools and systems have included:

CWEB
nuweb
funnelweb
Various SGML and XML based approaches

If you are interested in learning more about this fascinating and useful approach to software development and documentation, there are many good online sources.
The website Literate Programming is a good starting point.
The Wikipedia article on Literate Programming is also useful.

One particular article which I've enjoyed, since perl is my language of choice, is Mark Jason Dominus's article explaining why document extraction tools like perl's POD do not constitute literate programming.

Literate Programming is an approach which is worth exploring by anyone who is serious about producing good quality software and documentation.