April 1, 2021
In a previous blog, we outlined how AppScope uses function interpositioning as a means to extract information from applications, in user mode, at run time. You can check it out here. In this blog, we want to provide an overview of a few (among many available) interposition mechanisms that we’ve found valuable in building AppScope. This blog delves into application development details, and will be of particular interest to developers who love to maximize their apps’ performance.
Library preloading is a feature of a modern dynamic linker/loader (
ld.so). A dynamic linking and loading capability is available on most Unix-based and Windows systems. The linker/loader’s preload feature allows a user-specified shared library to be loaded before all other shared libraries required by an executable.
The dynamic linker resolves external addresses for an executable, using the symbol namespace of libraries as they are loaded. It loads the symbol namespace in library load order. Therefore, if a library includes a function named
fwrite, the symbol table will include an entry for
fwrite. The address for
fwrite is determined when the library is loaded, and the linker uses that address to resolve references to
Now assume that an application uses
fwrite. It has a dependency on the
libc.so library because that is where the function
fwrite is resolved when the application is built. The dynamic loader will resolve the address to
libc.so is loaded. Now, if a library is preloaded before
libc.so, and the preloaded library defines a function
fwrite, the dynamic linker will resolve
fwrite‘s address to the preloaded library instead of using the address to
libc.so. The function
fwrite has just been interposed. It is now up to the interposed
fwrite function to locate the address of
libc.so, so that it can (in turn) call
Library preloading has been around for a long time, with multiple uses. Library preload is most commonly used to replace the memory allocation subsystem. For example, Valgrind uses this mechanism to track memory leaks.
There are several alternatives for memory allocators supporting the definitions of
free. Many of these are in common use. The
ptmalloc2 subsystem is the default memory allocator used by
glibc. Chromium replaces
tcmalloc, a Google-defined memory allocator subsystem. The
jemalloc subsystem has been used with FreeBSD, and has found its way into numerous applications that rely on its predictable behavior.
You too could implement your own
free(3) functions, with which you could perform leak checking or memory access control. In this case, the library to be preloaded would implement the functions you want to interpose.
Note that only functions of dynamically loaded libraries can be interposed by means of library preload. Library preload would not be used to interpose functions that an application itself provides, nor would it be used to interpose any functions in a statically linked executable.
A few references:
GOT stands for Global Offsets Table, and this topic gets very low-level and detailed very quickly. We’ll try to keep it simple, to describe in what conditions this is needed for interposing functions.
PLT stands for Procedure Linkage Table. The dynamic linker uses the PLT to enable calling of external functions. The complete address resolution is accomplished by the dynamic linker when an executable is loaded. The Global Offsets Table is used to resolve addresses in conjunction with the PLT. At the risk of oversimplification, PLT is the code that gets executed when an external function is called, while the GOT is data that defines the actual function address.
The dynamic loader uses what’s known as lazy binding. By convention, when the dynamic linker loads a library, it will put an identifier and a resolution function into known places in the GOT. Then, the first call to an external function uses a call to a default stub in the PLT. The PLT loads the identifier and calls into the dynamic linker. The linker resolves the address of the function being called, and the associated GOT entry is updated. The next time the PLT entry is called, it will load the actual address of the function from the GOT, rather than the dynamic loader lookup. Very cool!