Montag, 16. September 2013

Inserting "Spy" Code in Plain C With Link Time Substitution

I'm currently working my way through the very recommendable book Test Driven Development For Embedded C. Although I'm not applying the techniques in an embedded environment, most of the book is still valuable for general C programing.

While test driving coding one has often to spy on objects. Meaning, to call a method or function of a library which is usually not under our control but to verify the result of the function/ method call.  
An example would be a logging function provided by a third party library. You can't change the library but you still are interested in the result of your code ( the code to test) calling this logging function/ method with a certain string.

In Java this problem is resolved by employing the dependency inversion principle. The third party logging library is not directly used inside the code. Instead an interface is defined offering an own logging function.
Now two objects are implementing this interface. The first one is the  production object which is in our example probably just a wrapper for a call to the original logging library.
The second object is a spy object (officially called test double) which implements the same interface but offering additional methods like e.g. getLastLoggingMessage.
The original application code is changed to only work with an object of the type of the new interface - not the logging class directly.  With the additional details provided by the test object you can than continue writing your test logic. This methodology is called dependency injection

Interestingly enough, this technique is also possible in a pure C environment. To understand the whole approach I might write an other post in the future. Here, I want to demonstrate how the technical part of dependency injection could be done. In my opinion the most convenient way to achieve this in C is a method called link time substitution.

The idea is to let the compiler (well, the linker part of the compiler) pick your spy implementation instead of the original implementation.

But maybe one step backwards. Compiling C files takes place in two stages: 1) compiling (roughly transforming the C code to an object file) and 2) linking (copying the object code of the functions needed into the final binary). Link time substitution is happening in the second stage - at the linker level. But it is in my opinion not totally obvious how this is precisely working so let's have a little example here.
// program.c

#include "fav_music.h"

int main() {
   tellFavoriteMusic(); 
}
This is our main program which calls the function tellFavoriteMusic. As it can be seen this function is not implemented in the same file but is provided somewhere else. To tell the compiler about tellFavoriteMusic the header file fav_music.h is included.
// fav_music.h

void tellFavoriteMusic( void );
The header file contains only the functions signature. Now there are two ways interpreting what is happening here.

A C programmer would say, that the content of fav_music.h is simple inserted before anything happens in program.c. Since now the declaration of the function is inside the file before it's invocation in main, the compiler (first stage) is happy with the setup. This is called a forward declaration.
progam.c doesn't need to see the actual implementation of tellFavoriteMusic at this level. It is later up to the linker to figure that out.

A Java porgrammer would probably say that program.c is implementing the fav_music.h interface not knowing about it's exact implementation. Since the Test Driven Development principles are coming from an object oriented background (thinking of the good old Smalltalk days) I prefer in this context the latter, rather high level explanation.

As next step we create two implementations of tellFavoriteMusic in two separate files which will be eventually compiled into two distinct object files:
//fav_music_indie.c
 
#include <stdio.h>

void tellFavoriteMusic() {
    printf("I like Indie!!!\n");
}
//fav_music_soul.c
 
#include <stdio.h>

void tellFavoriteMusic() {
    printf("I like Soul!!!\n");
}
Generating the object files  for the three source files is straight forward:
$ gcc -c program.c fav_music_indie.c fav_music_soul.c 
$ ls *.o
fav_music_indie.o  fav_music_soul.o  program.o
So lets recap what we've done. We transformed (compiled) our three C files into object files. The compiler (first stage) did no bother with the double implementation of tellFavoriteMusic since at stage one it is only focused on the current source file.

Lets step to stage two - linking. program.o currently contains a reference which says "I need an implementation of tellFavoriteMusic to execute." It is the task of the linker to find this implementation in the available object and library files.
So lets start with the obvious approach and see what is happening:
$ gcc -o program.out fav_music_indie.o fav_music_soul.o program.o 
fav_music_soul.o: In function `tellFavoriteMusic':
fav_music_soul.c:(.text+0x0): multiple definition of `tellFavoriteMusic'
fav_music_indie.o:fav_music_indie.c:(.text+0x0): first defined here
collect2: ld gab 1 als Ende-Status zurück
Not surprisingly the linker is complaining about the double implementation of tellFavoriteMusic. The situation does not change by moving the object files into different positions on the command line.

To resolve this issue, we need to look at how the linker is proceeding. First of all, file order counts! So in the above example first fav_music_indie.o, then fav_music_soul.o and then program.o are processed. Secondly, a double implementation of a function in object files (as in our example) always gives an error since the linker does not know which one to choose.

But there is a trick: If one function is stuffed into a static library (.a file) the linker will first pick the implementation from the object file and will than fall back to the implementation offered inside the library - provided that still the object file stands first on the command line.

We have a little play with that and generate an archive next:
$ ar cr fav_music_soul_lib.a fav_music_soul.o
$ ls *.a
fav_music_soul_lib.a
We have now a library fav_music_soul_lib.a which contains our object file fav_music_soul.o and therefore provides an implementation of the function tellFavoriteMusic.
$ gcc -o program.out program.o fav_music_soul_lib.a 
$ ./program.out
I like Soul!!!
The linker could puzzle everything together and provided us we the expected result. Let's get prepared for the last step.
Again,  we offer the linker the second implementation of tellFavoriteMusic but this time we place our object files first and have the library as the last argument:
$ gcc -o program.out fav_music_indie.o program.o fav_music_soul_lib.a
$ ./program.out 
I like Indie!!!
Now it has done the trick: The licker first picked the implementation of tellFavoriteMusic from the object file. By the time the linker hit the archive fav_music_soul_lib.a the dependency to tellFavoriteMusic was already resolved so it ignored the implementation offered there.

Coming back to the beginning of this post, with this little trick dependency injection is also working quite nicely with plain C.

Keine Kommentare:

Kommentar veröffentlichen