Sonntag, 30. November 2014

Quick C Tricks - Structured Include Path

In a larger C project you might want to have your API (=the collection of your header files) more structured instead of throwing all the header files into one flat directory.

The cpputest project is using a nice and simple approach. Their software (a C and C++ unit test library we are using at work) consists out of core functions and extended functionality.

Here is a sketch of the include path layout:
/.../cpputest/include/
                      CppUTest/
                              MemoryLeakDetector.h
                              TestHarness.h
                                ...
                      CppUTestExt/
                              GMock.h
                              MockFailure.h
                              ...  
The compiler gets the main include directory -I/.../cpputest/include/. In your code (for example to start writing a unit test) you include the remaining path:
#include <CppUTest/TestHarness.h>
... 
Looking at this include line you instandly know that TestHarness.h belongs to CppUTest and (if you are familiar with the library) it is part of the core functionality (=not inside CppUTestExt).

This structuring approach gives you simple namespacing of your headers - which also your Java co-works will appriciate ;-)

Freitag, 28. November 2014

Quick C Tricks - Explicit Declaration Of String Terminator Byte

In C a string is an array of single characters terminated by the NUL-value ('\0'). I came across the following notation to clearly communicate this fact:
// declaring a 10 byte long string buffer which 
// will be later terminated by '\0'
char myString[10+1];
Of couse I could have written the above line like this:
// same thins as above but not that obvious
char myString[11];
It's a cosmetical thing - but they help to improve the readability of your code.

Dienstag, 25. November 2014

Learning Tests - Something you Shouldn't Smile About

I admit - at university I was one of the guys who would rarely attent a non-technical course. Just because I thought that this is the easy stuff - I don't need to waste my time with that.

Well, this aditute has changed.  Over the years I've learned the benefits of reflecting your work style, your personal progress and so on. Attempting to follow the principles of the agile manifesto is one of these findings.

The topic of this post are Learning tests, something a collegue of mine introduced to me and which I problably would have smiled about in university since it is not technical in itself - but it is a nice and easy way to invest your (usually limited) coding time carefully.

So what is it all about?Learning tests is about using your existing unit test framework to have a play with some new library function or class you haven't used before (or a long time ago). You are writing tests not against your own implementation but against the currently not so well understood new library function/ class you are planing to use in your own implementation later.

Learning tests are a kind of a step in between - the unit tests for a new feature you are planning to implement are there but you feel that you should examine the internal mechanics you will utilize for the implementation a little bit further (e.g. some complicated external library).
At this stage you slide in one ore more Learning tests which are writting using your every day unit testing frame work and hence live on with all the other unit tests. In your Learning tests you have a play with the unknown library until you understand how to employ it in your own work. You then move on to your actual implementation.

That's all about it. Nice, simple and definitly helpful.

Freitag, 21. November 2014

Fibonacci numbers with Oracle WITH clause (SQL recursion)

While studying the Declaritive Programming section of Composing Programs, I stumbled accross a SQL example which calculated the fibonacci numbers using the SQL WITH clause. Since this is at work my major secret weapon to tackle complex queries I changed the example to run on Oracle SQL:

WITH compute_fib(previous, curr) AS
(
  SELECT 0, 1 FROM dual UNION ALL
  SELECT curr, previous+curr FROM compute_fib WHERE curr < 60
)
-- Oracles endless loop protection
CYCLE previous, curr SET is_cycle TO '1' DEFAULT 'Y' 

SELECT previous FROM compute_fib;

For those of you who wonder what this CYCLE expression stands for this is a good summary:

When a node is encountered for the second time, it will be included in the result set, its column value IS_CYCLE (a new column [...] with the statement above) is set to Y and the recursion stops then and there – so there will not be a second iteration starting at this node. Note however that any node where a cycle starts is included in the result set twice. [source]

This example is also introducing a nice aspect of the WITH clause I wasn't aware of so far - you can provide the column names of the subquery as arguments. See those two snippets for what I mean:

-- this is how I used to name the columns of a subquery.
-- the first SELECT of a UNION also defines the column names
WITH old_way AS
(
  SELECT 1 AS first_row, 
         2 AS second_row 
  FROM dual 
  UNION ALL
  SELECT 3, 4 FROM dual
)
SELECT * FROM old_way;


-- this is much clearer way where the subquery 
-- explicitly defines the column names
WITH new_way(first_row, second_row) AS
(
  SELECT 1, 2 FROM dual UNION ALL
  SELECT 3, 4 FROM dual
)
SELECT * FROM new_way;

Sonntag, 2. November 2014

Reflections on SICP (Part 1) - Function dispatching in C

Yesterday I've committed the last work for the Scheme project of SICP. Work, namely a lot of mighty C coding is mainly responsible for not having finished the course yet. However, since I am in the last quarter of the course material I decided to recap from time to time - what are the things I've learned, what was good, what was surprising and so on.

One of my favorite learnings is the idea of a dispatch table. Instead of having a long list of if/elseif branches which direct to a certain function the idea is to have this dispatching being done in a hash table. Since the table is dynamic the dispatching can be changed at run time (late binding).

Although I appreciate the "get my quickly from A to B" qualities of Python most of my daily business is based on good old C and Java. In this post I want to sketch out how function dispatching could be implemented in C - as usual as high level as possible.

Here is a naive Python example of using a dispatch table. See the last code example of this post of a more Pythonic version. This following code will be the template of our transformation to C later:

#!/usr/bin/python3

def _sumAll(numberList):
    sum = 0
    for number in numberList:
        sum = sum + number
    
    return sum

def _sumEven(numberList):
    sum = 0
    for number in numberList:
        if number % 2 == 0:
            sum = sum + number

    return sum

def _multiplyAll(numberList):
    prod = 1
    for number in numberList:
        prod = prod * number
    
    return prod


dispatchTable = {
    'sumAll' : _sumAll,
    'sumEven' : _sumEven,
    'multiplyAll' : _multiplyAll
}

def accumulateList(method, numberList):
    return dispatchTable[method](numberList)

numbers = (2,4,5,6)

print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))

Dispatching takes place in accumulateList where the dictionary dispatchTable is being looked up for the entry method. The function returned  is directly executed, hence the numberList argument.

So let'S look at C. As a friend of high-level expressions I'm using as usual GLib data structures. GList for the number list and GHashtable for the dispatch table.
#include <stdio.h>
#include <glib-2.0/glib.h>

static GHashTable* dispatchTable;

int _sumAll(GList* numberList) {
    int sum = 0;
    GList *iter = numberList;

    while(iter != NULL) {
        sum = sum + GPOINTER_TO_INT(iter->data);
        iter = g_list_next(iter);
    }

    return sum;
}

int _sumEven(GList* numberList) {
    int sum = 0;
    GList *iter = numberList;

    while(iter != NULL) {
        int number = GPOINTER_TO_INT(iter->data);
        if (number % 2 == 0) {
            sum = sum + number;
        }
        iter = g_list_next(iter);
    }

    return sum;
}

int _multiplyAll(GList* numberList) {
    int prod = 1;
    GList *iter = numberList;

    while(iter != NULL) {
        prod = prod * GPOINTER_TO_INT(iter->data);
        iter = g_list_next(iter);
    }

    return prod;
}

void createDispatchTable() {
    dispatchTable = g_hash_table_new(g_str_hash, g_str_equal);

    g_hash_table_insert(dispatchTable, g_strdup("sumAll"), 
                        _sumAll);
    g_hash_table_insert(dispatchTable, g_strdup("sumEven"), 
                        _sumEven);
    g_hash_table_insert(dispatchTable, g_strdup("multiplyAll"), 
                        _multiplyAll);
}

int accumulateList(const char* method, GList* numberList) {
    int (*func)(GList*);
    func = g_hash_table_lookup(dispatchTable, method);

    return func(numberList);
}

void destroyDispatchTable() {
    g_hash_table_destroy(dispatchTable);
}

int main() {
    GList* numbers = NULL;
    createDispatchTable();

    numbers = g_list_append(numbers, GINT_TO_POINTER (2));
    numbers = g_list_append(numbers, GINT_TO_POINTER (4));
    numbers = g_list_append(numbers, GINT_TO_POINTER (5));
    numbers = g_list_append(numbers, GINT_TO_POINTER (6));

    printf("%i\n",accumulateList("sumAll", numbers));
    printf("%i\n",accumulateList("sumEven", numbers));
    printf("%i\n",accumulateList("multiplyAll", numbers));

    destroyDispatchTable();
    g_list_free(numbers);

    return 0;
}

The code structure is close to the Python example before. A remark for the strange GINT_TO_POINTER and GPOINTER_TO_INT macros: This is the GLib way of storing and retrieving integers in their data structures. For more details refer to the GLib manual.

createDispatchTable uses a GLib GHashtable to store string to function pointer pairs.

accumulateList is retrieving the right function pointer by looking up the dispatch table. As the next step the function is executed and it's result is directly returned.

Usually I would append a common prefix to the public functions createDispatchTable(),
accumulateLis()  and destroyDispatchTable() to indicate that they belong to one module (see here for details) but for this example I tried to keep it simple.

So what do we think about that. Well, of course we could add some macro magic to shrink certain function calls (like the g_list_append) but I tend to use the C language straight, falling back on the preprocessor only if I can't achieve my goals directly.

The reason for that is more related to (good) code style. A macro always adds a new level of indirection which needs to be resolved by a human brain when trying to understand the code.  Eclipse CDTs macro expansion feature is definitly helpful here but I rather live without personal syntactic sugar.

The matter is different little fellows like GINT_TO_POINTER and GPOINTER_TO_INT. They represent a feature which can't be accomplished by using the language straight - so their fine.

I personally would like to see the definition of the dispatch table somehow outside a function, more prominent and if I could choose defined by something like designated initializers.

The GHashtable approach does not support that so we would need to replace it by something like a struct (e.g. DispatchTableEntry) and an array of these DispatchTableEntry struct.  The accumulateList function would then iterate over this array of structs looking for the right entry.

I tried that approach but it has a couple of downsides. One thing is that for simple usage of designated initializers the dispatch table array ideally has a fixed size. My personal favorite, a NULL terminated array of unknown length would lead to a rather odd syntax. So there is a trade-off here.

Also, implementing my own lookup functionality in accumulateList isn't what I was looking for - I wanted to use this out of the shelf.

An other obvious approach would be to define enums for the function names and use them as an index of an ordinary C array where the value would be the according function pointer. The lookup would be just a simple array index access using the enum as well.

Beside the missing runtime flexibility (you can't add enums at runtime) this approach has the problem that you maintain data at two sections: a new dispatch route would mean an entry in enums (new function name) and an extra entry in the array (the actual dispatch entry).

To my knowledge the best solution for this double data maintenance issue are X macros whose philosophy dates back to the 1960s. When you're in a pure C environment (poor you) this would be my way to go.

After weighting my options I stuck with the C implementation above. To add a new dispatch route function createDispatchTable has to be extended. Everything else is using battle proof implementations (GLib functions) in favour of (once again) rolling my own stuff.

To summarize the C example: this is as good as it gets in (high-level) C.

To conclude I promised to provide a Pythonic version of our example:
#!/usr/bin/python3

from operator import add,mul
from functools import reduce

onlyEvenFilter = lambda x: x % 2 == 0

dispatchTable = {
    'sumAll' : lambda list: reduce(add, list),
    'sumEven' : lambda list: reduce(add, filter(onlyEvenFilter, list)),
    'multiplyAll' : lambda list: reduce(mul, list)
}

def accumulateList(method, numberList):
    return dispatchTable[method](numberList)

numbers = (2,4,5,6)

print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))

That is more then 80 line of code in C versus 21 in Python - the same functionality, expressed more clearer (in my opinion) in 1/4 of the code ;-)