ELF static libraries are simply collections of relocatable object files, plus a symbol table that maps symbols to the object file that defines it. They are usually built using the ar tool.

As a general rule, when such a library is used in a link editor invocation, the linker will not simply add all the relocatable files from the library to the output artifact (unless we explicitly request it to do so). Instead, it will perform selective extraction: only those files which are required to satisfy symbol references the linker has already seen will be used. Other files will not take part in the linking process.

The situation is a little more complex than what the wording above may indicate. This is mainly because there is some confusion about what a defined symbol actually is. If we refer to Oracle’s Linker Guide, we’ll notice that, for the purpouse of link editing, symbols may fall in 3 different classes:

  • undefined symbols are easy: they give the name of a symbol we want to refer to, but without providing a definition, so that we do not know were the symbol lives or its size. It is the linker’s job to find an appropriate definition for this symbol, maybe in a different object file. They typically arise from (but not only) global declarations in C files, for example this snippet:

    extern int i_am_undefined;
    

    would result in an undefined symbol in the compiled relocatable file.

  • defined symbols are also easy: in contrast to undefined symbols, they provide a complete definition for a symbol, including its position within an relocatable object file (or in memory, for executables and shared objects) and its size. Among other things, they are generated by initialized C variables:

    int i_am_defined = 1;
    
  • tentative symbols are more complex: they are ideally somewhere halfway between undefined and defined symbols. They don’t come with a position into the file; instead, they provide an alignment constraint. They basically declare: “I don’t care were you put this symbol within the program, but wherever it ends, its address must be a multiple of the alignment I provide”. For example, a tentative symbol with an alignment of 4 may be allocated into the final executable so that it is loaded in memory at address 0x100000, but not 0x100001.

    The basic idea is that, at the end of linking, if a certain symbol name has only been seen as a tentative symbol (plus any undefined symbols), the tentative one gets defined somewhere by the linker. Since tentative symbols are intended to be initialized with 0, they usually end up into the BSS, to avoid wasting precious bytes from the data section.

    Conversely, if the linker finds a defined symbol with the same name somewhere in a different object file, the tentative definition is ignored and the defined symbol replaces it.

    There is also a special case covering what happens if the linker sees multiple homonymous tentative symbols but with different sizes: it retains the symbol with the larger size.

    The primary source of tentative symbols are uninitialized C globals:

    int i_am_tentative;
    

    This apparently odd behaviour can be used to define the same symbol in multiple object file and automatically have the linker merge them instead of spitting out an error about multiple defined symbols: multiple tentative symbols do not clash among themselves or against defined symbols. Within ELF relocatable files, they are represented as symbols defined in the special SHN_COMMON section; for this, they are also called COMMON symbols.

Symbols also have binding, which may be either weak or global (local binding is not considered, as local symbols cannot be used outside their own object file and thus do not pertain to linking). Global symbols have higher priority that weak symbols, so if the linker must choose between a weak and a global symbol, it will choose the latter. Also, weak symbols get special treatment when static libraries are involved: no file is extracted from a library if its only use would be satisfying a weak symbol.

I have noticed that weak tentative symbols are somewhat a mysterious entity. In fact, it seems that compilers refuse to emit them. Consider a simple global undefined C variable like this:

int foo;

If we compile a file containing just this definition and read its symbol table, we get:

7: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM foo

Which is perfectly fine, as our tentative C definition produced a COMMON global symbol, whose section index is SHN_COMMON. Now, let’s try again with:

#pragma weak foo
int foo;

This time we get:

7: 0000000000000000     4 OBJECT  WEAK   DEFAULT    3 foo

What? The symbol ended up into a real section, so clearly this is not a common symbol. If we take a look at the sections we find:

[ 3] .bss              NOBITS           0000000000000000  00000040
0000000000000004  0000000000000000  WA       0     0     4

So the symbol actually ended up into the BSS. This was to be expected, since undefined globals in C are initialized to 0 and storing them into the BSS saves space. However, we may have expected a symbol with weak binding and SHN_COMMON section index. This is not the case. The resulting symbol is undistinguishable from a zero-initialized C global variable. So, when we talk about weak tentative symbols in the rest of this article, we should remember that such symbols are technically equivalent to weak defined symbols.

Now back to static libraries. What makes the documentation ambiguous is that it uses the expression “defined symbol” in a somewhat unclear way. Given the 3 classes above, one would says that only the first class is actually being referred. Conversely, when talking about “undefined symbols” or “symbol references”, one would think only the second class is involved. This leaves tentative definitions behind and does not clarify the behaviour that applies to them. Here I want to clarify the exact behaviour.

It is interesting to see what happens exactly when we throw a static library into a linking operation in which a relocatable file has already mentioned a symbol, which also appears in one of the library’s object files: will it be extracted or not?

Basically, we have a link command like this:

ld -o a.out first.o lib.a

where lib.a holds a file second.o, and both first.o and second.o mention a symbol with the same name.

In the discussion that follow, ld refers to GNU ld. While it is just an implementation of an ELF linker, it is widespread enough that it can be used to check what actually happens in some corner cases. I also test gold, which is also shipped alongside the GNU binutils. The versions used for the tests are:

$ LANG=C ld -v
GNU ld (GNU Binutils) 2.33.1
$ LANG=C ld.gold -v
GNU gold (GNU Binutils 2.33.1) 1.16

I have tried matching all symbol cases (weak/global undefined/tentative/defined) in the relocatable file against all symbol cases in the archive. This is what happened:

  • if the symbol in first.o symbol is weak, second.o is not extracted from the archive. This not only means, as clearly stated in the documents above, that weak undefined references are never used to pull files from archives. It also means that defined symbols in archives will never take precedence over tentative symbols in relocatable files, something that would have happened had second.o been passed directly to the linker. Also, tentative symbols are never merged in this case, so if second.o contained a tentative definition larger than a tentative definition in first.o, there would be no merging. Again, this is contrary to what happens when linking relocatable files directly.
  • if the symbol in first.o is undefined (and, because of the previous point, it has global binding) second.o is extracted from the archive if it provides a tentative or defined symbol. The binding of the symbol in the archived file is irrelevant.
  • if the symbol in first.o is tentative (and, again, has global binding), second.o is pulled only if its symbol is defined and has global binding. gold does not extract the file in this case.

The following table details the description above by listing every case:

first.o second.o Extracted?
Binding Class Binding Class
Global Undefined Global Undefined No
Tentative Yes
Defined Yes
Weak Undefined No
Tentative Yes
Defined Yes
Tentative Global Undefined No
Tentative No
Defined Only by GNU ld
Weak Undefined No
Tentative No
Defined No
Defined Global Undefined No
Tentative No
Defined No
Weak Undefined No
Tentative No
Defined No
Weak
Undefined Global Undefined No
Tentative No
Defined No
Weak Undefined No
Tentative No
Defined No
Tentative Global Undefined No
Tentative No
Defined No
Weak Undefined No
Tentative No
Defined No
Defined Global Undefined No
Tentative No
Defined No
Weak Undefined No
Tentative No
Defined No