Mittwoch, 29. Januar 2014

Documenting Shell Scripts

Intro

I recently was involved in the development of a rather large library of shell script functions. Although I'm aware that one should pick proper scripting languages for more complex programing task ( like Python, Perl, Ruby...) - sometimes you simply don't have a choice.

Abusing Doxygen

On issue was to find a way to generate an API like documentation straight out of the shell script code. Searching the Internet I mostly found obscure ways of using (not saying abusing) Doxygen for that task (here and here). For me that sounded too complex and confusing - I was only trying to add some comments with markup which eventually would turn into some HTML documentation file.

Plain Old Documentation In Shell Script

After a while I found this idea of Milivoj who is using the well proven Plain Old Documentation (POD) markup for documenting shell scripts. POD is the markup used in the Perl world for source code documentation.
Milivoj is utilizing HERE documents to embed POD markup in shell script which is in my opinion as close as you can get to the original Perl way. However, I decided for a simpler, at the same time more shell like approach on embedding POD code.

The POD format

This is an example of a file containing POD markup:
=head1 Important heading

This is inside the important heading section. 

  this is example code inside important heading.

=heading1 Another important heading

This is another important heading content.

=head2 This is a sub heading

This is the content of the subheading.

POD2HTML

Running the following command the POD markup is converted into a HTML page:
cat pod_example.pod | pod2html > pod_example.html
The tool which performs the magic is pod2html which is part of the standard Perl distribution. See the manual page for more details. However, I was more than  happy with the defaults:

Example HTML page, generated from a POD file using pod2html

 

Embedding Into Shell Script

The next step was to include the POD markup somehow in the shell script library. The Perl way of including POD markup into source code is by simply starting with a "=" tag (like =head1 ...) and closing a documentation section with the special tag =cut. After this last tag normal Perl code starts again. Since the Perl interpreter is aware of the "=" tags POD code can be directly inserted into the source.

Unfortunately, the shell interpreter ( in my case bash and ksh are possible) does not know about "=" tags so I needed to look for a different way to embed POD code in shell script. As already mentioned, Milivoj's approach is to use HERE documents to mimic the Perl embedding of POD markup which I felt is smart but hard to read in a large code base.

After thinking for a while I came up with a less smart but more readable solution (as always, in my opinion ;-) ). Here is an example how I use POD markup inside the shell script library:
## =head1 Intro
##
## This is a library with useful shell script functions.
##
## =head1 Functions
##
## This section contains the functions available.
##
## =head2 firstFunction "an argument"
## 
##  # example usage
##  firstFunction "some argument"
##
## This is the first function which accepts one argument.
##
firstFunction() {
    typeset arg1=$1
    ...
}

## =head2 secondFunction "an argument"
##
##  #example again
##  secondFunction "argument"
##
## Also the second function is important.
##
secondFunction() {
    typeset arg1=$1   
    ...
}

# i still can write normal, non-double hash comments
...
The double-hash comments are reserved for POD markup, normal, single hash comments keep their original purpose. I prefer this style of embedding over the HERE document variant since (as with e.g. Javadoc) documentation consists only of specially formated comments, nothing else.

To convert the shell script into an HTML documentation the command line is now slightly longer:
cat pod_example.sh | egrep '^##' | sed 's/^##\s\?//g' | \
pod2html > pod_example.html
egrep is used to only filter out POD markup lines (beginning with double hash), sed is then removing the double hash and if existing the following white space. The result is plain POD which is directly fed into pod2html.

The result is as nice as always:

Example HTML, this time generated out of POD markup inside a shell script

Samstag, 11. Januar 2014

Poor man's error handling with goto

I quite often deal with code which is build up on C functions which return the functions status as an integer. That way a parent function can check if its child function calls returned an error. The parent function then returns either "0" if everything ran well or otherwise any other number than "0". With this design you can build a call stack which gives the user (some logging provided) an idea where an error had happen.

This is a simple example:
// in some header:
#define OK 0
#define NOT_OK 1

int doCalculations()
{
    int rc = 0;

    rc = calcSomething();

    if ( rc != OK )
    {
        printf("function calcSomething had an error.\n");
        return NOT_OK;
    }

    rc = calcSomethingElse();

    if ( rc != OK )
    {
        printf("function calcSomethingElse had an error.\n");
        return NOT_OK;
    }

    return OK;
}
The problem here is that it is hard to follow the flow of the business logic since error handling is happening on the same stage.  To gain better readability I started to split the business logic and the error handling inside the function using goto:
// in some header:
#define OK 0
#define NOT_OK 1

int doCalculations()
{
    int rc = 0;

    rc = calcSomething();
    if ( rc != OK ) goto error_calcSomething;

    rc = calcSomethingElse();
    if ( rc != OK ) goto error_calcSomethingElse;

    return OK;


// error handling    
error_calcSomething:
    printf("function calcSomething had an error.\n");
    return NOT_OK;

error_calcSomethingElse:
    printf("function calcSomethingElse had an error.\n");
    return NOT_OK;
}
Now the upper part of the function deals mostly with the business logic (you can't get rid of the return code checking totally), the lower part is purly used for error handling. This technique is quite popular, I also spotted it inside the Linux kernel sources.

To my defense I need to state: I only use the evil goto for that one purpose: error handling and freeing of locally allocated heap data - nothing else.

By the way: An alternative way to print a backtrace is provided by the glibc with the backtrace function. However, this is not part of the ANSI C standard - if this is important for you.