[This article is still INCOMPLETE. I will continue to wrap it up as soon as I can... it may take weeks or months. There are a lot of information I would like to include...]
Proper organization of C/C++ source codes is essential for code maintenance. Unfortunately, poorly organized codes are still found everywhere, making life very difficult for code maintenance.
In this article, I attempt to summarize some good practices in writing good C/C++ header files. I found that some similar articles do not emphasize certain important points, do not justify the ideas properly, or do not state exceptions to the common practices. I attempt to include these here.
The following rules summarize provably good practices for writing C/C++ header files. They are quite essential, and have been (or should be) included in good coding standards.
The following sections elaborate the rules in detail. The description, justification, checking, and exceptions of each rule are presented. (Note that weightless, single, minimal and self-sufficing are not standard technical terms for describing the practices. I borrowed/invented these terms to summarize the rules since I fail to find any term defined for such practices.)
When you invoke a compiler, it goes through preprocessing, compilation, assembly, and linking phases. There are some language constructs that do not convert to concrete executable instructions but only aid in compilation by providing useful information. These constructs can be included in a header file.
The following constructs will be handled during the preprocessing phase:
The following constructs only provide references for generating correct intermediate code (and do not inflict any byte):
While the following components should not appear in a header file:
A header file essentially should contain only interface information or specification of some implementation file(s) of a library. In other words, everything included in a header file only serves as reference, and should not inflict any byte in the executable.
Suppose that you have defined some ordinary functions, class methods, or global variables in a header file, and that more than one implementation files have included the header file. Your compiler should produce errors complaining 'multiple definition of blah blah...' when linking all object files together, and should not generate any executable. (If it does, avoid it!)
If you have defined some file scope variables (e.g., static int stupid_var;) in a header file, then you are in big trouble! Suppose that two implementation files p1.c and p2.c have included it. The compiler will silently compile all files without errors, because it will generate two sets of such variables, each set for one implementation file! (But it may generates some warnings if an implementation file does not make use of all the variables defined.) When some function in p1.c modifies the values of these variables, p2.c won't know about it! This is rarely the behavior desired. Furthermore, the more implementation files that include the header file, the bigger the final executable will be, since the compiler will generate for each of these implementation files a private copy of these variables.
To find out whether your header file is weightless, simply compare the executables compiled by excluding and including the header file respectively. (Note: don't include debugging information during compilation.) Both executables generated should be exactly the same if the header file is weightless (and it does not anyhow impact the original implementation).
Here's an example how I have tested a header myheader.h in GNU/Linux environment. You can apply the similar concept on other OSs. First, create a dummy source file weightless.c:
int main(){return 0;}
Compile it with gcc:
$ gcc -O3 weightless.c -o a.out
Modify the dummy source file to include the header interested (in this case, myheader.h):
#include "myheader.h"
int main(){return 0;}
Compile it with gcc:
$ gcc -O3 weightless.c -o b.outCompare the executables:
$ diff a.out b.outThe header is weightless if the output is empty. (No news is good news!) If you see "Binary files a.out and b.out differ" then the header is not weightless. (Or, your header may be weightless but you have a lousy compiler.)
It would be interesting to find out if the ANSI C/C++ headers follow this rule. Let's construct the following file weightless_ansicpp_headers.cpp (some lines have been omitted to save space):
#ifdef INCLUDE_MOST
// All ANSI C/C++ headers except <iostream>
#include <algorithm>
#include <bitset>
// ... Omitted
#include <cwctype>
#endif /* INCLUDE_MOST */
#ifdef INCLUDE_IOSTREAM
#include <iostream>
#endif /* INCLUDE_IOSTREAM */
/* Dummy main */
int main(){}
So, defining INCLUDE_MOST allows us to examine all ANSI C++ standard headers but iostream, while defining INCLUDE_IOSTREAM allows us to examine iostream. (The reason of examining iostream separately will become clear soon.) I performed the checking under GNU/Linux with g++ 4.0.0 as follows:
$ g++ -O3 weightless_ansicpp_headers.cpp -o empty.out $ g++ -DINCLUDE_MOST -O3 weightless_ansicpp_headers.cpp -o most.out $ g++ -DINCLUDE_IOSTREAM -O3 weightless_ansicpp_headers.cpp -o iostream.out $ g++ -DINCLUDE_MOST -DINCLUDE_IOSTREAM -O3 weightless_ansicpp_headers.cpp -o all.out $ ls -latr *.out -rwxrwxr-x 1 user1 user1 4680 Jun 14 11:45 empty.out -rwxrwxr-x 1 user1 user1 4680 Jun 14 11:45 most.out ... -rwxrwxr-x 1 user1 user1 5587 Jun 14 11:47 iostream.out -rwxrwxr-x 1 user1 user1 5587 Jun 14 11:45 all.out $ diff empty.out most.out $ diff iostream.out all.out
From the output, we know that the header iostream is not weightless. But why can a properly designed and implemented standard header file violate this rule? The next section explains and justifies it.
The rule of weightlessness holds as a result of using the header file to include only interface information. Pragmatically, this rule has been violated by iostream. Here's the story.
In C++, the initialization order of global objects among the translation units is unspecified. In some occasions, you may need to initialize some internal static components before anybody uses them. There are a few ways to solve this problem. The easiest way is probably to introduce to the class a special static variable, which is guaranteed to be initialized to zero. Then, for every method in the class, check the variable and perform initialization if required, as shown below:
class NeedInit {
static int first_time; // Initialized to zero by compiler
static ABC_Type this_static_component_needs_to_be_initialized;
...
}
void NeedInit::fn1(void)
{
if (!first_time) { init(); first_time=1; }
...
}
void NeedInit::fn2(void)
{
if (!first_time) { init(); first_time=1; }
...
}
This approach is ugly if there are many class methods, and it also unavoidably introduces additional overhead for each invocation of any method involved. An alternative approach is to introduce a tricky auxiliary static object in the header file for helping the initialization of the class interested:
class NeedInit {
...
}
class NeedInit_Aux {
static unsigned int count; // Initialized to zero by compiler
public:
NeedInit_Aux() {
if (!count++) {
... // initialize NeedInit
}
}
}
// Force initialization before main(). NOT weightless!
static NeedInit_Aux NeedInit_Aux_initializer;
This approach, pioneered by Jerry Schwar, seems better as it requires only little change to the original class. However, unlike the former approach, this approach requires a header file that is not weightless. Is this justifiable?
Let's compare these approaches analytically. Let F be the number of files that require the class NeedInit (and hence the number of files that include the header), M be the number of methods in the class NeedInit, and n be the number of total run-time invocations of these methods. Note that, for a compiled executable, n is variable, while F and M are constants. It's easy to show that, the first approach incurs extra O(M) = O(1) space overhead and O(n) time overhead, while the second approach incurs extra O(F) = O(1) space overhead and O(M) = O(1) time overhead. So, the second approach outperforms the first approach asymptotically.
Taking various factors into the consideration, the second approach is a better choice for the implementation of the iostream library. Although iostream is not weightless, the use of the trick is fully justifiable. Since each inclusion of iostream incurs additional space overhead, it is important not to include the header if you don't need anything from the header for the implementation file. In fact, generally speaking, you should not include any redundant header file.
Nonetheless, it is extremely rare that you have to rely on this trick. The class can usually be redesigned to avoid resorting to such a trick. You can find out more information related to the trick from Section 3.11.4 in the book The Design and Evolution of C++ (by Bjarne Stroustrup), and from some articles online such as this one, and this one.