Memory allocation in C/Solaris

A big reason for preferring Java over C/C++ is the automatic memory management. Garbage collection is performed automatically once all references to an object have disappeared. This prevents the "memory leak" problem endemic to C/C++ programs. Memory allocation and freeing is a manual process and a tricky one at that. You don't want to eliminate references to allocated memory since you then won't be able to free it. Similarly, you don't want to free memory to which you have retained one or more references.

I had a serious problem recently with memory allocation in a C daemon. I used the Gnu debugger (sdb or dbx would have been nice but they don't come bundled with Solaris) to determine that a particular block of memory was being overwritten. Now, the only way that could happen would be if the memory had been freed somewhere. Also, it didn't occur the first time a request was processed, only on all subsequent calls. I had to find out where the memory block was being freed in that particular circumstance.

Now, anybody who has done any development in C has probably done the same thing: write your own wrappers for the malloc/free system calls. I initially used this approach but none of the code which was calling the wrappers was freeing that specific memory block. Okay, so it had to be some other function calling the library version of free directly. I had to find some way of trapping the call to free by inserting code between the application and the system library. I also needed to be able to call the real library routine in order to obtain the required functionality.

Solaris provides a facility which could make this work: the dynamic linker. Through the use of the dlsym call I could get the address of the library routine. I could then statically link my versions of malloc/free into the executable and have them call the library versions. The general framework of the call looks like this:

#include	<dlfcn.h>

	returnType	(*functionPtr)( argTypes );
	returnType	result;

	functionPtr = ( returnType(*)( argTypes ) )
		dlsym( RTLD_NEXT, functionName );
	result = (*functionPtr)( args );
In this code snippet, returnType is the function return type (e.g. void * for malloc,) argTypes are the argument types (e.g. size_t for malloc,) functionName is the name of the function (e.g. malloc,) and args are the actual arguments. Let's use the boilerplate and write the actual code which we can use as our own version of malloc:
#include	<dlfcn.h>
#include	<malloc.h>

void *
malloc( size_t size ) {
	void	*(*functionPtr)( size_t );
	void	*result;

	functionPtr = ( void *(*)( size_t ) ) dlsym( RTLD_NEXT, "malloc" );
	result = (*functionPtr)( size );

	return( result );
}
Two points worth noting here, namely the includes. The dlfcn.h header has to be included to define symbols such as RTLD_NEXT. The inclusion of malloc.h is just convenient; why bother having to define the return types from your functions when the definitions already exist in a system header file? We could have included stdlib.h instead, probably a good idea depending on the functions you want to use.

NOTE: You need to include the -ldl linker argument in order to use the dlsym call. I usually just add it to the end of the LDFLAGS line in my makefile.

In order to capture all references to the memory allocation routines, you also have to implement the calloc, realloc and free calls. The nice thing about our framework is that we can insert any code we wish either before or after the library call. In my case, I maintain a table of memory as its allocated and freed. But the real problem was that particular block of memory which was getting freed somewhere, but where?

I added a function called setMark which would take a pointer which could be accessed in the free routine. The free routine could examine the passed arguments and perform some action once an attempt was made to free the marked memory block. Here's a look at the free code:

void
free( void *ptr ) {
	void	(*functionPtr)( void * );

	if( ptr == mark ) {
		fprintf( stderr, "attempt to free memory at 0x%08x\n", ptr );
		abort();
	}

	functionPtr = ( void (*)( void * ) ) dlsym( RTLD_NEXT, "free" );
	(*functionPtr)( ptr );
}
As you can infer, the mark variable is set in the setMark function call. I use the abort system call here to force a core dump. Why do I want a core dump? I use gdb with the executable and the core file names as arguments and use the where command. This gives me a complete stack backtrace. If you've compiled the code with gcc and the -g flag, you'll even be provided with the corresponding line numbers in the source code. Not only that, but you can move around in the stack, examine variables, etc.

In my case, this was enough information to find a call to free a structure which had never been populated. It just so happened that it wasn't called until the second time through. The contents of the uninitialzed structure depended on what was on the stack at the time. Unfortunately, the data in the stack was pointing to the element I was still using and so it was freed. Once I removed the offending call, the program worked properly. Of course, I do have to ensure that the call is made in those cases where the structure has been populated else I'll have myself a memory leak.

As I mentioned at the beginning, management of memory allocated from the heap is a huge challenge in C/C++ programming. Fortunately, there are facilities in Solaris which can be utilized to assist in locating offending code. The complete source code is available here and includes additional tracking facilities which I ended up not needing. Hopefully, you'll never need to make use of these techniques, but if you do then I hope that you find this code helpful.