Tuesday, February 9, 2010

Putting Together Literate Programming Tools -- tangle


One of the best ways for me to acquire knowledge and skills on a subject is to agree to a project or talk on that topic. Learning about literate programming has been a half baked project of mine for years, so in order to get off my posterior with respect to the project I volunteered to do a talk at the Atlanta Perlmongers on February 25th, then decided that given the amount of work involved in this I may as well do modified versions of the talk before other interested audiences, so I agreed to talk at the Atlanta Linux Enthusiasts meeting on March 18th.

I wrote a brief overview of literate programming in a previous post.

I decided that my approach would be to put together some simple tools. Since I'm somewhat perl centric I decided to do as much of the project as I could realistically finish by late February using perl (and to make the target code for my tangle a perl script as well).

After reading Norman Walsh's article Literate Programming in XML I decided that docbook made a great deal of sense as the markup language for the source file. It was devised for technical articles, has a rich set of elements, and there are many tools for working with that markup language (including many xsl style sheets to facilitate conversion to xhtml, pdf, and other formats).

At first I devised an XML tag which included as attributes both a position tag and a file tag to allow for multi-file projects, but then decided to simplify it to the same sort of "id" tag Norman Walsh used in his article for my first few passes over the project.

Since "id" is already a docbook attribute for a tag I was using, and since I wanted to visually simplify things for the talks, I called my tag sourcecode, and the attribute fragment_id.

I'm going to talk a little bit about my "tangle" utility here which I named "tangy.pl", and will talk about my "weave" in the next posting.

The concepts behind tangy.pl are not very complicated, at least in the manner in which I'm approaching it. The developer has the fragments of machine usable code scattered across a larger document in jumbled order, and the tangle has to put them together in the order needed by the machine. I'm going to show you the code for my first two passes over this utility. Bear in mind that my testing has been minimal, but it seems to work for my purposes now (and if you see any problems with it please post a comment).

On my first pass, I used the most primitive approach imaginable, and made the attribute "position" hold an integer representing the fixed position of the code fragment within the tangled file. Since I'm pulling data from presumably well formed XML, I decided to use the perl module XML::LibXML which seems to work for my purposes. Here's the primitive version of the utility:



#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $source_file = shift;

my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($source_file);
my $xpc = XML::LibXML::XPathContext->new($tree);
my @n = $xpc->findnodes('//sourcecode');

my @tanglefile;

foreach my $nod (@n) {
$tanglefile[$nod->getAttribute('position')] = \
$nod->getFirstChild()->getData;
}

foreach my $fragment (@tanglefile) {
unless ($fragment =~ /^$/) {
print $fragment;
}
}



All this code does is create an XML parser, build a tree, and shove the source code fragments in an array based on the contents of the position parameter. The obvious problem with this is if you rearrange the code, you have to go back through the code renumbering every position attribute . Whether you do it by hand or automate that process it's a Rube Goldbergish solution.

So I used a solution somewhat like Norman Walsh's more purely XML solution. I renamed the attribute fragment_id. The most straightforward way I could come up with for being able to rapidly rearrange and edit the instructions for tangling was to create a separate file named outline. Assuming I name the fragment_id attributes descriptively and sensibly I can insert and rearrange rapidly. Here's the (rapidly tested) code I came up with to do the tangle using the outline file:

#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $source_file = shift;

my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($source_file);
my $xpc = XML::LibXML::XPathContext->new($tree);
my @n = $xpc->findnodes('//sourcecode');

my %tanglefile;
my @outline;

foreach my $nod (@n) {
$tanglefile{$nod->getAttribute('fragment_id')} = \
$nod->getFirstChild()->getData;
}

open OUTLINE, 'outline'
or die "Can't open the outline file: $!";

while (<OUTLINE>) {
chomp;
push @outline, $_;
}


foreach my $id (@outline) {
print $tanglefile{$id};
}


This code is identical to my earlier version, except that
instead of storing the fragments in an array it creates a hash
with the key=>attribute scheme fragment_id => content, and uses
the order of the outline file to walk through the hash printing
out the machine usable code in the correct order.

This is just the beginning stages of this tangle utility. Other things
on my todo list are to add command line args so the name of the
outline file can be specified on the command line, to put the
"file" attribute back in to support multi-file projects, to convert
the standalone utility to a method within a module, and to write unit tests for that module.

I've set up a google code project called

literate-programming-tools

to make the progress of this little project available. So far I've posted these scripts and some unfinished (but usable for tests) docbook.

If you see any problems with anything I've presented or done thus far either put a comment here, or get in touch with me.

No comments:

Post a Comment