Donnerstag, 19. September 2013

The C Module Pattern

Update Feb. 2015: As I've learned meanwhile this "thing" described below has an offical name. It is called "Abstract Data Type" and is nowedays the way one should design its C code. See "C Interfaces and Implementations: Techniques for Creating Reusable Software" for more on that.

I mentioned before that I am currently reading the book Test Driven Development for Embedded C. An unexpected but very welcome outcome of this study is a much clearer idea of how to structure C modules. Considering the amount of C code which is around in this world, it is astonishing that there apparently is no common pattern for this task.

The proposed approach by James Grenning, author of the book is close to what we consider as object oriented. However, it is far simpler than the GNOMEs gobject interpretation of OO.

The module pattern is all about clear separation of the modules to support loose coupling, predictable module and function names and standardized module constructors and destructores.

Their are two variations of the pattern, which build up on top of each other. The first one is the single instance module, which is presumably the most common one. If more than one instance of a module is required at the same time the multiple instance module variant is the one to choose. However, they don't differ that much.

Common Rules

Both variants have a couple of things in common.

Dependency Inversion Via Interfaces


This might sound odd in the world of C language but it has got all we need to follow the dependency inversion principle. There, a module is not directly depending on an other module but on its interface.

In C terms that is a client module which is only employing the functions and constants provided by the header file of the module used. The client doesn't care about the implementation of the function. It only relies on the header definition.

This is already common practice when we use libraries like stdlib - we don't care about the implementation of the functions offered in stdlib.h but we simply use them.

 

Information Hiding Inside The Module


To hide modul internal variables and functions inside the module they are marked with the C keyword static in front of them. Also, the declaration of private functions (forward declaration) takes place at the top of the module file, not inside the header. This thinking was new to me but makes much more sense: The interface/ header only contains the outside world communication of the module, nothing else.

Always Constructors And Destructors


This was also a new concept for me which I loved from the first minute. Users of the module always initialize the module with a Create function and cleanup module data later with a Destroy function.  Again this rule applies always, even if for the moment one of the functions has only a stub implementation.

What we gain here is a clear, predictable way of opening and closing the communication with the module.  Now it is much harder to forget to free data, since this is what usually the modules destructor will do for use.

Module And Function Naming


The following rules are simple and effective. Modules have a meaningful name, like Database.c which lets us assume that this module deals with the database. So far so good, as the next step the modules public functions use the name of the module as their prefix. Function GetOrderData() is so becoming Database_GetOrderData(). Using this notation it is easy to see where GetOrderData was implemented.

Remark One


This rule has a downside: If you try to give your functions meaningful names to prevent additional comments ( as you should do as a clean coder ) and you've got a lot of parameters (which is not good style anyway but unfortunately harder to get around in C as e.g. in Javascript with its instant JSON objects), then your functions signature might get quite long and is likely to break the 80 characters line width rule.

In that case I code the function onto multiple lines (see examples below). As I tend to be obsessed with clean, verbose code, I don't like that but I keep on following this rule anyway since in my opinion the clear structure I gain outweighs this downside.

Remark Two


Now it is getting slightly esoteric ;-) but since I strongly belief that code style matters lets take a closer look at the function names:

I usually prefer the Java style camel case notation where methods and functions start lower case whereas classes and interfaces begin with a capital letter. For our C function naming rule here there are two possible ways to go:
  • Database_getOrderData()
  • Database_GetOrderData()
The first one transfers the Java notation into our naming rule. However I decided to go for the latter one which I believe is a bit quicker to grasp when you read the code. This is also the proposed notation of James Grenning, author of Test Driven Development in Embedded C.

Single Instance Module

For this example there won't be an implementation, only the public interface (the header file) is presented.

// RecordCollection.h - Single Instance

#ifndef D_RecordCollection_H
#define D_RecordCollection_H

void RecordCollection_Create(); 
void RecordCollection_Add( const char* artist, const char* title ); 
void RecordCollection_PrintContainsArtist( const char* artist ); 
void RecordCollection_Destroy(); 

#endif  /* D_RecordCollection_H */

The module RecordCollection contains a constructor and a destructor function (RecordCollection_Create() and RecordCollectionDestroy()). Beside that there is a function to add a new record to the collection and one function to display whether or not an artist is present in the collection.

The interface does not reveal how RecordCollection is organized internally. We don't know (and we don't want to know) if the module is using a struct to store it's internal data or maybe something else. As its users the only thing we get is a simple instruction on how to work with that module.

Multiple Instance Module

The previously introduced single instance module has one drawback - at one time we can't use more than one. Sticking the analogy of our example I can't have an Long Player (LP, 12inch vinyl)  and a Singles (7inch) object at the same time. With the single instance module it is all one.

To keep separate lists of our vinyl we have to convert our RecordCollection module to an multiple instance module. For that purpose our interface looks like this:
// RecordCollection.h - Multiple Instance

#ifndef D_RecordCollection_H
#define D_RecordCollection_H

GHashTable* RecordCollection_Create(); 
void RecordCollection_Add( GHashTable* collection, 
                           const char* artist, 
                           const char* title ); 
void RecordCollection_PrintContainsArtist( GHashTable* collection, 
                                           const char* artist ); 
void RecordCollection_Destroy( GHashTable* collection ); 

#endif  /* D_RecordCollection_H */
On creation the constructor RecordCollection_Create() returns  a pointer to a GHashTable object somewhere in memory. The destructor RecordCollection_Destroy() in turn accepts a pointer to this object to free the memory occupied.

The remaining  two user functions have almost the same signature as their counterparts in the single instance example - except for the newly appended first argument which passes the current instance of our RecordCollection to the function.

To finish this section I'll give you this time a simple implementation of RecordCollection plus a client program using it. After the code example I finish the post with a discussion of the methodology presented, so bear with me.
// RecordCollection.c

#include <glib.h>
#include <stdio.h>
#include <stdlib.h>
#include "RecordCollection.h"

// declaration of private function inside module, 
// not visible in the interface (header)
static gboolean containsArtist( GHashTable* collection, 
                                const char* artist );

GHashTable* RecordCollection_Create() {
    GHashTable* collection = g_hash_table_new_full( g_str_hash,  
                                                    g_str_equal,
                                                    free,
                                                    free );
    return collection;
}

// Public Functions
void RecordCollection_Add( GHashTable* collection, 
                           const char* artist, 
                           const char* title ) {

    g_hash_table_insert( collection, 
                         g_strdup( artist ), 
                         g_strdup( title ) );
}

void RecordCollection_PrintContainsArtist( GHashTable* collection, 
                                           const char* artist ) {

    if ( containsArtist( collection, artist ) ) {
        printf( "Yepp, got it.\n" );
    }
    else {
        printf( "No, not found.\n" );
    }   
}

void RecordCollection_Destroy( GHashTable* collection ) {
    g_hash_table_destroy ( collection );
}

// Private Functions
static gboolean containsArtist( GHashTable* collection, 
                                const char* artist ) {

    return g_hash_table_contains( collection, artist );
}
// simple application of RecordCollection
// with multiple instances

#include <glib.h>
#include "RecordCollection.h"

int main() {
   GHashTable* myLPs = RecordCollection_Create();
   GHashTable* mySingles = RecordCollection_Create();

   RecordCollection_Add( myLPs, "Marvin Gaye", 
                                "What's Going On" );

   RecordCollection_Add( myLPs, "Baltic Fleet", 
                                "Towers" );

   RecordCollection_Add( mySingles, "Josh Rouse", 
                                    "Winter in the Hamptons" );

   RecordCollection_Add( mySingles, "Team 4", 
                                    "Ich zeig den Weg" );

   // Yepp 
   RecordCollection_PrintContainsArtist( myLPs, "Marvin Gaye");

   // No
   RecordCollection_PrintContainsArtist( myLPs, "Team 4");

   // Yepp
   RecordCollection_PrintContainsArtist( mySingles, "Team 4");

   RecordCollection_Destroy( myLPs );
   RecordCollection_Destroy( mySingles );
}

Discussion

The multiple instance module is as close as we got in terms of objects and object methods with plain C. To get a feeling how my RecordCollection might have looked like in Java, I've sketched out an interface and an example usage of the RecordCollection object, leaving out its implementation:
// possible interface defintion in Java

public interface RecordCollectionInterface
{
    void add( String artist, String name);
    void printContainsArist( String artist );
}
// creating and using the RecordCollection object in Java
// fortunately the garbage collector takes care of the destruction

RecordCollection myLPs = new RecordCollection();
myLPs.add("Marvin Gaye", "What's Going On");
myLPs.printContainsArtist("Marvin Gaye");
You can compare your self but in Java (or any other OO language) you can basically achieve the same functionality with less code - however, that's no news. I won't enter the performance discussion, though.

But still, for me the good news is, that there are ways to write C modules which come quite close to the behavior of objects and object methods in OO languages - of course leaving inheritance completely out.  I can accept the additional syntactical effort required. I mean, did you ever try to run Java on your Arduino?

Montag, 16. September 2013

Inserting "Spy" Code in Plain C With Link Time Substitution

I'm currently working my way through the very recommendable book Test Driven Development For Embedded C. Although I'm not applying the techniques in an embedded environment, most of the book is still valuable for general C programing.

While test driving coding one has often to spy on objects. Meaning, to call a method or function of a library which is usually not under our control but to verify the result of the function/ method call.  
An example would be a logging function provided by a third party library. You can't change the library but you still are interested in the result of your code ( the code to test) calling this logging function/ method with a certain string.

In Java this problem is resolved by employing the dependency inversion principle. The third party logging library is not directly used inside the code. Instead an interface is defined offering an own logging function.
Now two objects are implementing this interface. The first one is the  production object which is in our example probably just a wrapper for a call to the original logging library.
The second object is a spy object (officially called test double) which implements the same interface but offering additional methods like e.g. getLastLoggingMessage.
The original application code is changed to only work with an object of the type of the new interface - not the logging class directly.  With the additional details provided by the test object you can than continue writing your test logic. This methodology is called dependency injection

Interestingly enough, this technique is also possible in a pure C environment. To understand the whole approach I might write an other post in the future. Here, I want to demonstrate how the technical part of dependency injection could be done. In my opinion the most convenient way to achieve this in C is a method called link time substitution.

The idea is to let the compiler (well, the linker part of the compiler) pick your spy implementation instead of the original implementation.

But maybe one step backwards. Compiling C files takes place in two stages: 1) compiling (roughly transforming the C code to an object file) and 2) linking (copying the object code of the functions needed into the final binary). Link time substitution is happening in the second stage - at the linker level. But it is in my opinion not totally obvious how this is precisely working so let's have a little example here.
// program.c

#include "fav_music.h"

int main() {
   tellFavoriteMusic(); 
}
This is our main program which calls the function tellFavoriteMusic. As it can be seen this function is not implemented in the same file but is provided somewhere else. To tell the compiler about tellFavoriteMusic the header file fav_music.h is included.
// fav_music.h

void tellFavoriteMusic( void );
The header file contains only the functions signature. Now there are two ways interpreting what is happening here.

A C programmer would say, that the content of fav_music.h is simple inserted before anything happens in program.c. Since now the declaration of the function is inside the file before it's invocation in main, the compiler (first stage) is happy with the setup. This is called a forward declaration.
progam.c doesn't need to see the actual implementation of tellFavoriteMusic at this level. It is later up to the linker to figure that out.

A Java porgrammer would probably say that program.c is implementing the fav_music.h interface not knowing about it's exact implementation. Since the Test Driven Development principles are coming from an object oriented background (thinking of the good old Smalltalk days) I prefer in this context the latter, rather high level explanation.

As next step we create two implementations of tellFavoriteMusic in two separate files which will be eventually compiled into two distinct object files:
//fav_music_indie.c
 
#include <stdio.h>

void tellFavoriteMusic() {
    printf("I like Indie!!!\n");
}
//fav_music_soul.c
 
#include <stdio.h>

void tellFavoriteMusic() {
    printf("I like Soul!!!\n");
}
Generating the object files  for the three source files is straight forward:
$ gcc -c program.c fav_music_indie.c fav_music_soul.c 
$ ls *.o
fav_music_indie.o  fav_music_soul.o  program.o
So lets recap what we've done. We transformed (compiled) our three C files into object files. The compiler (first stage) did no bother with the double implementation of tellFavoriteMusic since at stage one it is only focused on the current source file.

Lets step to stage two - linking. program.o currently contains a reference which says "I need an implementation of tellFavoriteMusic to execute." It is the task of the linker to find this implementation in the available object and library files.
So lets start with the obvious approach and see what is happening:
$ gcc -o program.out fav_music_indie.o fav_music_soul.o program.o 
fav_music_soul.o: In function `tellFavoriteMusic':
fav_music_soul.c:(.text+0x0): multiple definition of `tellFavoriteMusic'
fav_music_indie.o:fav_music_indie.c:(.text+0x0): first defined here
collect2: ld gab 1 als Ende-Status zurück
Not surprisingly the linker is complaining about the double implementation of tellFavoriteMusic. The situation does not change by moving the object files into different positions on the command line.

To resolve this issue, we need to look at how the linker is proceeding. First of all, file order counts! So in the above example first fav_music_indie.o, then fav_music_soul.o and then program.o are processed. Secondly, a double implementation of a function in object files (as in our example) always gives an error since the linker does not know which one to choose.

But there is a trick: If one function is stuffed into a static library (.a file) the linker will first pick the implementation from the object file and will than fall back to the implementation offered inside the library - provided that still the object file stands first on the command line.

We have a little play with that and generate an archive next:
$ ar cr fav_music_soul_lib.a fav_music_soul.o
$ ls *.a
fav_music_soul_lib.a
We have now a library fav_music_soul_lib.a which contains our object file fav_music_soul.o and therefore provides an implementation of the function tellFavoriteMusic.
$ gcc -o program.out program.o fav_music_soul_lib.a 
$ ./program.out
I like Soul!!!
The linker could puzzle everything together and provided us we the expected result. Let's get prepared for the last step.
Again,  we offer the linker the second implementation of tellFavoriteMusic but this time we place our object files first and have the library as the last argument:
$ gcc -o program.out fav_music_indie.o program.o fav_music_soul_lib.a
$ ./program.out 
I like Indie!!!
Now it has done the trick: The licker first picked the implementation of tellFavoriteMusic from the object file. By the time the linker hit the archive fav_music_soul_lib.a the dependency to tellFavoriteMusic was already resolved so it ignored the implementation offered there.

Coming back to the beginning of this post, with this little trick dependency injection is also working quite nicely with plain C.

Donnerstag, 12. September 2013

False Alarm While Checking For Memory Leaks in GLib2 Hashmaps

After I finished a complex implementation using GLib2 hash tables the memory checker valgrind reported several memory leaks. Spending some time to track down the issue it all came down to the following supposingly problematic code:
int main() {
    GHashTable* item = g_hash_table_new_full( g_str_hash, g_str_equal, g_free, g_free );
    g_hash_table_insert( item, g_strdup("key") , g_strdup("value") );
    g_hash_table_destroy( item );
}
Valgrind complained about 1512 lost bytes:
valgrind --leak-check=full ./hash.out

==12624== Memcheck, a memory error detector
==12624== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==12624== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==12624== Command: ./hash.out
==12624== 
==12624== 
==12624== HEAP SUMMARY:
==12624==     in use at exit: 5,656 bytes in 12 blocks
==12624==   total heap usage: 17 allocs, 5 frees, 5,758 bytes allocated
==12624== 
==12624== 1,512 bytes in 3 blocks are possibly lost in loss record 9 of 10
==12624==    at 0x402A420: memalign (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==12624==    by 0x402A4DE: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==12624==    by 0x40592E1: ??? (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.3)
==12624==    by 0x40A573F: g_slice_alloc (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.3)
==12624==    by 0x4079EB1: g_hash_table_new_full (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.3)
==12624==    by 0x804883D: main (in /home/xubuntu/projects/playing_with_glib2/hash.out)
==12624== 
==12624== LEAK SUMMARY:
==12624==    definitely lost: 0 bytes in 0 blocks
==12624==    indirectly lost: 0 bytes in 0 blocks
==12624==      possibly lost: 1,512 bytes in 3 blocks
==12624==    still reachable: 4,144 bytes in 9 blocks
==12624==         suppressed: 0 bytes in 0 blocks
==12624== Reachable blocks (those to which a pointer was found) are not shown.
==12624== To see them, rerun with: --leak-check=full --show-reachable=yes
==12624== 
==12624== For counts of detected and suppressed errors, rerun with: -v
==12624== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

After intensive trying and eventually searching your favorite search engine it turned out, that newer (including mine) version of GLib2 are using by default a mechanism to allocate memory which leads to false alarms within valgrind, even if the hash table is destroyed correctly after usage.

To overcome this by setting the two environment variables G_SLICE and G_DEBUG before calling valgrind. Read more here and here.

With this test configuration valgrind is satisfied:
G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind --leak-check=full ./hash.out 

==12671== Memcheck, a memory error detector
==12671== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==12671== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==12671== Command: ./hash.out
==12671== 
==12671== 
==12671== HEAP SUMMARY:
==12671==     in use at exit: 2,116 bytes in 5 blocks
==12671==   total heap usage: 11 allocs, 6 frees, 2,274 bytes allocated
==12671== 
==12671== LEAK SUMMARY:
==12671==    definitely lost: 0 bytes in 0 blocks
==12671==    indirectly lost: 0 bytes in 0 blocks
==12671==      possibly lost: 0 bytes in 0 blocks
==12671==    still reachable: 2,116 bytes in 5 blocks
==12671==         suppressed: 0 bytes in 0 blocks
==12671== Reachable blocks (those to which a pointer was found) are not shown.
==12671== To see them, rerun with: --leak-check=full --show-reachable=yes
==12671== 
==12671== For counts of detected and suppressed errors, rerun with: -v
==12671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Dienstag, 10. September 2013

Initializing a Glib2 String Hash-Map

Glib2 is a big help while working with native C. I had some hard time using a key/ value hash map ( in GLib2 it is called "hashtable") for strings.
My naive, lazy appoach was this one:
// this outputs "NULL"
#include <glib.h> 

void testHashMap( const char* key, const char* value ) {

    GHashTable* test = g_hash_table_new( NULL, NULL );
    g_hash_table_insert ( test , g_strdup(key),  g_strdup(value) );

    char * result = g_hash_table_lookup( test,  "mykey" );

    printf("found %s\n", result );
}

int main() {
    testHashMap( "mykey", "myval");
}
As you can see I didn't really bother with the parameters of g_hash_table_new since I assumed that it would choose the correct hash function (first parameter) and the correct compare function (parameter two) automatically.

No, it doesn't. Reading the manual about the function it reveals that the NULL behaviour is all about working with pure pointers. To let the hash table work with strings as expected, the hash function g_str_hash and the compare function g_str_equal need to be supplied. 

This concludes in the following, now working code:
// this outputs "myval" as expected
#include <glib.h>

void testHashMap( const char* key, const char* value ) {

    GHashTable* test = g_hash_table_new( g_str_hash, g_str_equal );
    g_hash_table_insert ( test , g_strdup(key),  g_strdup(value) );

    char * result = g_hash_table_lookup( test,  "mykey" );

    printf("found %s\n", result );
}

int main() {
    testHashMap( "mykey", "myval");
}

Sonntag, 8. September 2013

Fazit "Software Sanierung" von Sebastian Kübeck - Teil 1c - Refactoring

Die umfangreiche Einführung des Buches "Software Sanierung" von Sebastian Kübeck muss sich natürlich auch dem Thema "Refactoring" widmen. Da ich mich in der Vergangenheit bereits intensiver mit Testgetriebener Entwicklung (TDD) beschäftigt habe, gab es in diesem Abschnitt kaum neues.

Eine Übersicht der behandelten Refactoring:
  • Umbenennen der Bezeichner (sprechende Namen für Variablen, Konstanten, Klassen...)
  • Methoden aus größerem Code-Block extrahieren
  • Methode auflösen, das Gegenteil von "extrahieren", Logik (nach Umorganisation) wieder in Ursprungsmethode zurück verschieben
  • Methode hochziehen - Verlangerung einer Methode von der Kinds- in die Elternklasse. Sinnvoll, wenn mehrere Kindsklassen die gleiche Methode nutzen.
  • Interface extrahieren - Erstellen eines neuen Interfaces und darauffolgend Ableiten der bestehenden Klasse von diesem Interface. Optional können auch Methoden der Klasse ins Interface wandern. Viele der Beispiele Im Buch beruhen auf dem Abhängigkeits-Inversions-Prinzip welches fordert, dass Klassen möglichst nicht von anderen Klassen sondern nur von deren Interfaces abhängig sein sollen. Dieses Refactoring spielt genau in dieser Liga.
  • Klasse extrahieren - aus einer Klasse eine Eltern-Klasse ableiten die optional auch Methoden  Implementierungen aus der Ursprungsklasse mitnehmen kann - ganz im Gegensatz zu Interfaces wo nur Deklarationen übernommen werden können. Letztendlich sind die beiden letzten Refactorings Alltag im Objekt-Orientierten Entwicklungsgeschäft.
Eine Gute Idee: Skizzieren von Refactorings
Da nicht immer am Anfang klar ist, was das günstigste Vorgehen ist, schlägt der Autor ein "Skizzieren" des  anstehenden Refaktorings vor: Außerhalb der Versionskontrolle können Source-Files umbenannt, umkopiert, neuangelegt werden. Es soll nicht kompilierfähiger Code herauskommen sondern eher eine Idee in welche Richtung das Refactoring getrieben werden soll.

Dienstag, 3. September 2013

Native character strings with Oracle Pro*C and C

There are times where you can't get get around the Oracle Pro*C precompiler. It allows you to access a Oracle database from within your C programs. However, having seen things like JDBC in Java, Pro*C is technology from a past century.
Yes, it's not directly a friendship between me and the precompiler. If you're allowed to introduce a third-party open source library, I highly recommend OCILIB for working with your Oracle database. It allows a more intuitive and C native way (without precompiling) to access the data.

Coming back to Pro*C. All code I came across was handling character strings with the Oracle supplied VARCHAR struct. This struct contains the actual data (data field) and it's length (len field). Before using the string one has to NULL-terminated manually like in this example:
EXEC SQL BEGIN DECLARE SECTION;
VARCHAR string1[20];
VARCHAR string2[20];
EXEC SQL END DECLARE SECTION;

EXEC SQL DECLARE c cursor FOR
SELECT 'a test',
       ' an other test ' 
FROM dual;

EXEC SQL OPEN c;
EXEC SQL WHENEVER NOT FOUND DO break;

for (;;) {
    EXEC SQL FETCH c INTO :string1, :string2;

    // manually adding NULL to terminate string
    string1.arr[string1.len] = '\0';
    string2.arr[string2.len] = '\0';

    printf("..%s..\n", string1.arr);
    printf("..%s..\n", string2.arr);
}
EXEC SQL CLOSE c; 

After spending some hours with the Pro*C developers guide I discovered a more C like way to accomplish the same thing. Basically, it uses a new character map "STRING" which is either set as precompiler option or set inline like in the following example:
// either as precompiler option or inline in the code
EXEC ORACLE OPTION (CHAR_MAP=STRING);

EXEC SQL BEGIN DECLARE SECTION;
char *string1[20];
char *string2[20];
EXEC SQL END DECLARE SECTION;

EXEC SQL DECLARE c cursor FOR
SELECT 'a test',
       ' an other test ' 
FROM dual;

EXEC SQL OPEN c;
EXEC SQL WHENEVER NOT FOUND DO break;

for (;;) {
    EXEC SQL FETCH c INTO :string1, :string2;
    printf("..%s..\n", string1);
    printf("..%s..\n", string2);
}
EXEC SQL CLOSE c;
Both examples do the same thing, the last one reliefs you from dealing with NULL termination of strings yourself.