Freitag, 27. November 2015

Thoughts On "SE-Radio Episode 242: Dave Thomas on Innovating Legacy Systems"

In episode 242 Software Engineering Radio interviewed Dave Thomas about how to deal with legacy systems. I liked the show so much that I had to do a sketchnote:

Controversial And Very Inspiring At The Same Time - SE Radio 242 with Dave Thomas

Actually I am a faithful follower of Working Effectively With Legacy Code : isolate the piece of code you want to change (dependency breaking), write tests for it and then modify the code using TDD. Over time I got quite good at it - even in C. However, it's a lot of effort - even when you're trained.

Dave sayed "Unit tests are a waste of time, focus on acceptance test" (end-to-end tests). The problem with end-to-end tests is that they are even harder to setup. Instead of mocking the objects around you, you have to provide all the external dependencies or at least good replacements:  test databases, test middleware, test clients...
Anyway, once you've managed all that and wrote your first end-to-end test, things are getting easier a lot. Covering "unhappy paths" with tests is now actually quite simple - drop a central database table, switch of the middleware, send faulty messages to your application and check what's going on.

With all this virtualization (docker as latest hype) and infrastructure as code (Puppet, Chef, ...) we now have got good tools to write end-to-end tests which are repeatable, automated and maintainable.
Surely this was not as simple in 2004 when "Working Effectively With Legacy Code" came out.

Dave's statements  reminded of the Golden Master approach which is quite similar. However, the initial end-to-end tests there is only meant to  provide the basic safety net towards a unit test coverage. The latter one is the actual goal of "Golden Master" testing.

So yes, maybe going from outside to inside is nowadays a better way of creating a safety net. I am still not convinced to ditch unit testing of old code completely but this is as always something you have to try out.

Mittwoch, 25. November 2015

Something For The Toolshelf - Code Analysis Tools Used For Security Analysis Of Truecrypt

Recently the Bundesamt für Sicherheit in der Informationstechnik (BSI), an authority of the German government released a security analysis of Truecrypt. This analysis was carried out by the Fraunhofer-Institut für Sichere Informationstechnologie (SIT) in Darmstadt /Germany. This institute is part of the Frauenhofer society - a research organization spread across Germany.

From a software engineering perspective I was curious what approach the researches took to evaluate the code code base.

 

GOTO

Apparently also the Truecrypt authors liked their goto. The study on goto (my translation):

To implement exception handling the usage of goto is generally accepted since the language C does not offer an own feature for that. New research concludes that meanwhile programmers are predominantly using goto in a sensible way.

Die Verwendung von goto wird jedoch im Allgemeinen zur Umsetzung einer Ausnahmebehandlung akzeptiert, da die Sprache C kein eigenes Konstrukt hierfür kennt. Neuere Untersuchungen haben ergeben, dass Programmierer mittlerweile die goto-Anweisung überwiegend nur noch in sinnvoller Weise verwenden. (original)

On that topic the study quotes An empirical study of goto in C, a paper which was pre-released in February 2015 and which was subject of my previous post.

 

Complexity Of The Source Code

To measure complexity the authors of the study employed a tool called Lizard which can deal with a bunch of languages including C, C++, Java, Python and Javascript

Here is the feature list taken from the Github page of Lizard:
  • the nloc (lines of code without comments),
  • CCN (cyclomatic complexity number),
  • token count of functions.
  • parameter count of functions.

As their measure of complexity the study uses the cyclomatic complexity:

As a measure for the complexity of the flow of control especially the cyclomatic complexity is being used. Values higher than 15 are an indicator for potential refactoring. Values above 30 are usually accompanied by flawed code. (my translation)

Als Maß für die Kontrollflusskomplexität wird insbesondere die zyklomatische Komplexität verwendet. Werte größer 15 sind ein Indiz dafür, dass Refaktorierung sinnvoll ist. Werte über 30 gehen oft mit fehlerhaftem Code einher. (original)

Code Duplicates

To find identical pieces of source code the autors of the study use Duplo, a duplicate finder for C and C++. With its default settings the tools consideres three and more identical  lines of code as duplicates. 

 

Static Code Analysis

For this kind of analysis three tools where used: Coverity, Cppcheck and the Clang Static Analyzer. The interesting point here is that there where almost no overlaps in the errors found by the three tools. Which brings me to the conclusion that it is a sensible investment to integrate more than one static analyzer in the Continuous Integration chain.

Montag, 23. November 2015

Rehabilitating C's goto

I admit - I regularly write goto's. Actually almost all non-pure functions see at least one goto. For always the same reason: Handling errors and cleaning up resources. I already wrote about the technique 1 1/2 year ago.

Example For Error Handling And Cleanup using goto [1]

In my eyes the usage of goto for cleanup and error handling is a good thing. The flow of application logic is not unnecessarily cluttered with local error handling. Instead the function is divided into two parts: The upper part which contains the application logic and the lower part which contains the error handling and the cleanup of resources.

However, using these goto's always left me feel like doing something in the gray zone: There is an old ban from the 60ies (Letters to the Editor: Go To Statement Considered Harmful, Dijkstra, 1968) but without talking to much about it in public C programmers still carry on writing goto.

The paper An Empirical Study of goto in C Code  releases as a pre-print in February 2015 now takes an interesting second look at this old ban.

The international group of researches who was involved in the paper analyzed 2 million lines of C code collected from 11K Github repositories. I leave the reading of the entire paper up to you and jump directly to the important part of the conclusion:

...far from being maintenance nightmares, most usages of goto follow disciplined, well-designed usage patterns, that are handled by specific constructs in more modern languages. 
The most common pattern are error-handling and cleanup, for which exception handling constructs exist in most modern languages, but not in C. Even properties such as several goto statements jumping to the same label have benefits, such as reducing code duplication, in spite of the coordinate problem described by Dijkstra.

That sounds like good news to me - I eventually can exit the gray zone.