GCC/Clang Compilation Optimize
- GNU Binutils
- DWARF
- Precompiled Headers
- 编译加速 (统一编译)
- 编译优化级别
- Options for Debugging Your Program - GCC - 调试相关的编译选项
- Include What You Use (A tool for use with clang to analyze #includes in C and C++ source files)
- 编译二进制大小优化
- 其他
- 优化调试
- Q&A
- Refer
GNU Binutils
The GNU Binary Utilities, or binutils, are a set of programming tools for creating and managing binary programs, object files, libraries, profile data, and assembly source code.
The GNU Binutils are typically used in conjunction with compilers such as the GNU Compiler Collection (gcc
), build tools like make
, and the GNU Debugger (gdb
).
The binutils include the following commands:
name | usage |
---|---|
as | assembler popularly known as GAS (GNU Assembler) |
ld | linker |
gprof | profiler |
addr2line | convert address to file and line |
ar | create, modify, and extract from archives |
c++filt | demangling filter for C++ symbols |
dlltool | creation of Windows dynamic-link libraries |
gold | alternative linker for ELF files |
nlmconv | object file conversion to a NetWare Loadable Module |
nm | list symbols exported by object file |
objcopy | copy object files, possibly making changes |
objdump | dump information about object files |
ranlib | generate indices for archives (for compatibility; same as ar -s ) |
readelf | display content of ELF files |
size | list total and section sizes |
strings | list printable strings |
strip | remove symbols from an object file |
windmc | generates Windows message resources |
windres | compiler for Windows resource files |
DWARF
The name DWARF is something of a pun, since it was developed along with the ELF object file format. The name is an acronym for “Debugging With Arbitrary Record Formats”.
DWARF
is a widely used, standardized debugging data format. DWARF
was originally designed along with Executable and Linkable Format (ELF
), although it is independent of object file formats. The name is a medieval fantasy complement to “ELF” that had no official meaning, although the backronym “Debugging With Arbitrary Record Formats” has since been proposed.
DWARF 是一种用于表示源代码调试信息的标准格式。调试信息通常包括变量名、类型信息、行号等,用于在调试过程中帮助开发人员了解程序的运行状态。DWARF 的不同版本提供了不同的特性和优化,其中 DWARF version 5 是最新的版本,它引入了许多改进,包括更紧凑的表示形式和更高效的数据访问方式。
GCC 11 将 DWARF version 5 作为默认的 debug info 版本,这意味着当使用 GCC 11 编译项目时,生成的二进制文件将包含 DWARF version 5 格式的调试信息。由于 DWARF version 5 的优化,这使得生成的二进制文件尺寸显著缩小,同时仍保留了丰富的调试信息。
在实际项目中,这种尺寸缩小可以带来诸多好处,如节省磁盘空间、加快传输速度和提高加载速度等。因此,升级到 GCC 11 可以帮助开发人员更高效地处理大型项目和二进制文件。
More:
- https://en.wikipedia.org/wiki/DWARF
- DWARF, 调试信息存储格式
- Michael J. Eager (April 2012). “Introduction to the DWARF Debugging Format” (PDF). Retrieved 2015-01-08.
- https://dwarfstd.org/dwarf5-press-release.html
Debugging Formats
There are several debugging formats: stabs
, COFF
, PE-COFF
, OMF
, IEEE-695
, and two variants of DWARF
, to name some common ones. I’m not going to describe these in any detail. The intent here is only to mention them to place the DWARF
Debugging Format in context.
The name stabs
comes from symbol table strings, since the debugging data were originally saved as strings in Unix’s a.out
object file’s symbol table. Stabs
encodes the information about a program in text strings. Initially quite simple, stabs
has evolved over time into a quite complex, occasionally cryptic and less-than-consistent debugging format. Stabs
is not standardized nor well documented. Sun Microsystems has made a number of extensions to stabs. GCC has made other extensions, while attempting to reverse engineer the Sun extensions. Nonetheless, stabs is still widely used.
A Brief History of DWARF
DWARF 1 ─ Unix SVR4 sdb and PLSIG
Dwarf was developed by Brian Russell, Ph.D., at Bell Labs in 1988 for use with the C compiler and sdb debugger in Unix System V Release 4 (SVR4). The Programming Languages Special Interest Group (PLSIG
), part of Unix International (UI), documented the DWARF generated by SVR4 as DWARF Version 1 in 1992. Although the original DWARF had several clear shortcomings, most notably that it was not very compact, the PLSIG decided to standardize the SVR4 format with only minimal modification. It was widely adopted within the embedded sector where it continues to be used today, especially for small processors.
DWARF 2 ─ PLSIG
The PLSIG continued to develop and document extensions to DWARF to address several issues, the most important of which was to reduce the size of debugging data that were generated. There were also additions to support new languages such as the upandcoming C++ language. DWARF Version 2 was released as a draft standard in 1993.
Since Unix International had disappeared and PLSIG disbanded, several organizations independently decided to extend DWARF 1 and 2. Some of these extensions were specific to a single architecture, but others might be applicable to any architecture. Unfortunately, the different organizations didn’t work together on these extensions. Documentation on the extensions is generally spotty or difficult to obtain. Or as a GCC developer might suggest, tongue firmly in cheek, the extensions were well documented: all you have to do is read the compiler source code. DWARF was well on its way to following COFF and becoming a collection of divergent implementations rather than being an industry standard.
DWARF 3 ─ Free Standards Group
Despite several online discussions about DWARF on the PLSIG email list (which survived under X/Open [later Open Group] sponsorship after UI’s demise), there was little impetus to revise (or even finalize) the document until the end of 1999. At that time, there was interest in extending DWARF to have better support for the HP/Intel IA64 architecture as well as better documentation of the ABI used by C++ programs. These two efforts separated, and the author took over as Chair for the revived DWARF Committee.
DWARF 4 ─ DWARF Debugging Format Committee
After the Free Standards Group merged with Open Source Development Labs (OSDL) in 2007 to form the Linux Foundation, the DWARF Committee returned to independent status and created its own web site at dwarfstd.org. Work began on Version 4 of the DWARF in 2007.
The DWARF Version 4 Standard was released in June, 2010, following a public review.
Work on DWARF Version 5 started in February, 2012. This version is expected to be completed in 2014.
Debugging Information Entry (DIE)
- Tags and Attributes
The basic descriptive entity in DWARF is the Debugging Information Entry (DIE
). A DIE has a tag, which specifies what the DIE describes and a list of attributes which fill in details and further describes the entity.
A DIE (except for the topmost) is contained in or owned by a parent DIE and may have sibling DIEs or children DIEs. Attributes may contain a variety of values: constants (such as a function name), variables (such as the start address for a function), or references to another DIE (such as for the type of a function’s return value).
The following figure shows C’s classic hello.c
program with a simplified graphical representation of its DWARF description. The topmost DIE represents the compilation unit. It has two “children”, the first is the DIE describing main and the second describing the base type int which is the type of the value returned by main. The subprogram DIE is a child of the compilation unit DIE, while the base type DIE is referenced by the Type attribute in the subprogram DIE. We also talk about a DIE “owning” or “containing” the children DIEs.
- Types of DIEs
DIEs can be split into two general types. Those that describe data including data types and those that describe functions and other executable code.
Precompiled Headers
This document describes the design and implementation of Clang’s precompiled headers (PCH). If you are interested in the end-user view, please see the User’s Manual.
更多参考:
- Clang precompiled headers and improving C++ compile times, conclusion
- Why precompiled headers do (not) improve C++ compile times
- How do I generate and use precompiled headers with Clang++?
Using Precompiled Headers with clang
The Clang compiler frontend, clang -cc1
, supports two command line options for generating and using PCH
files.
To generate PCH
files using clang -cc1
, use the option -emit-pch
:
$ clang -cc1 test.h -emit-pch -o test.h.pch
This option is transparently used by clang when generating PCH
files. The resulting PCH
file contains the serialized form of the compiler’s internal representation after it has completed parsing and semantic analysis. The PCH
file can then be used as a prefix header with the -include-pch
option:
$ clang -cc1 -include-pch test.h.pch test.c -o test.s
说明:上面的
clang -cc1
在实际使用中应替换为clang++
。例如,要生成预编译头文件my_header.pch
,可以使用命令clang++ -x c++-header -std=c++11 -o my_header.pch my_header.hpp
Design Philosophy
Precompiled headers are meant to improve overall compile times for projects, so the design of precompiled headers is entirely driven by performance concerns. The use case for precompiled headers is relatively simple: when there is a common set of headers that is included in nearly every source file in the project, we precompile that bundle of headers into a single precompiled header (PCH
file). Then, when compiling the source files in the project, we load the PCH
file first (as a prefix header), which acts as a stand-in for that bundle of headers.
A precompiled header implementation improves performance when:
-
Loading the PCH file is significantly faster than re-parsing the bundle of headers stored within the PCH file. Thus, a precompiled header design attempts to minimize the cost of reading the PCH file. Ideally, this cost should not vary with the size of the precompiled header file.
-
The cost of generating the PCH file initially is not so large that it counters the per-source-file performance improvement due to eliminating the need to parse the bundled headers in the first place. This is particularly important on multi-core systems, because PCH file generation serializes the build when all compilations require the PCH file to be up-to-date.
预编译头文件的实现主要在以下两个方面改善性能:
加载 PCH 文件的速度明显快于重新解析 PCH 文件中存储的头文件集合。因此,预编译头文件设计试图最小化读取 PCH 文件的成本。理想情况下,这个成本不应随预编译头文件的大小而变化。 这意味着,通过使用预编译头文件,编译器可以快速地加载已经解析过的头文件内容,而不需要重新解析这些头文件。这将减少每个源文件的编译时间,从而提高整个项目的编译速度。
生成 PCH 文件的初始成本不应过大,以免抵消由于消除解析捆绑头文件的需要而带来的每个源文件的性能改进。这在多核系统上尤为重要,因为 PCH 文件生成会在所有编译都需要最新的 PCH 文件时序列化构建。 这意味着,尽管生成预编译头文件会带来一定的开销,但这个开销不应过大,以免影响预编译头文件带来的性能提升。在多核系统上,这一点尤为重要,因为生成预编译头文件可能会导致编译过程中的其他任务等待,从而降低并行编译的效果。
总之,预编译头文件实现通过加快加载 PCH 文件的速度和控制生成 PCH 文件的成本来提高编译性能。这使得编译器能够更快地处理头文件,从而提高整个项目的编译速度。
Clang’s precompiled headers are designed with a compact on-disk representation, which minimizes both PCH creation time and the time required to initially load the PCH file. The PCH file itself contains a serialized representation of Clang’s abstract syntax trees and supporting data structures, stored using the same compressed bitstream as LLVM’s bitcode file format.
Clang’s precompiled headers are loaded “lazily” from disk. When a PCH file is initially loaded, Clang reads only a small amount of data from the PCH file to establish where certain important data structures are stored. The amount of data read in this initial load is independent of the size of the PCH file, such that a larger PCH file does not lead to longer PCH load times. The actual header data in the PCH file–macros, functions, variables, types, etc.–is loaded only when it is referenced from the user’s code, at which point only that entity (and those entities it depends on) are deserialized from the PCH file. With this approach, the cost of using a precompiled header for a translation unit is proportional to the amount of code actually used from the header, rather than being proportional to the size of the header itself.
Clang 编译器如何以“懒加载”(lazy loading)的方式从磁盘加载预编译头文件(PCH)。懒加载意味着只有在实际需要时才加载数据,这有助于提高性能和降低内存使用。
当 PCH 文件最初被加载时,Clang 只从 PCH 文件中读取少量数据以确定某些重要数据结构的存储位置。这个初始加载阶段读取的数据量与 PCH 文件的大小无关,因此较大的 PCH 文件不会导致更长的加载时间。
PCH 文件中的实际头文件数据(如宏、函数、变量、类型等)只有在用户代码中引用时才会被加载。此时,只有该实体(以及它所依赖的实体)会从 PCH 文件中被反序列化。通过这种方法,使用预编译头文件的成本与实际从头文件中使用的代码量成正比,而不是与头文件的大小成正比。
总之,Clang 编译器通过懒加载的方式从磁盘加载预编译头文件,从而提高了性能。这种方法使得使用预编译头文件的成本与实际使用的代码量成正比,而不是与头文件的大小成正比。这有助于在保持编译速度的同时,降低内存使用。
When given the -print-stats
option, Clang produces statistics describing how much of the precompiled header was actually loaded from disk. For a simple “Hello, World!” program that includes the Apple Cocoa.h
header (which is built as a precompiled header), this option illustrates how little of the actual precompiled header is required:
*** PCH Statistics:
933 stat cache hits
4 stat cache misses
895/39981 source location entries read (2.238563%)
19/15315 types read (0.124061%)
20/82685 declarations read (0.024188%)
154/58070 identifiers read (0.265197%)
0/7260 selectors read (0.000000%)
0/30842 statements read (0.000000%)
4/8400 macros read (0.047619%)
1/4995 lexical declcontexts read (0.020020%)
0/4413 visible declcontexts read (0.000000%)
0/7230 method pool entries read (0.000000%)
0 method pool misses
For this small program, only a tiny fraction of the source locations, types, declarations, identifiers, and macros were actually deserialized from the precompiled header. These statistics can be useful to determine whether the precompiled header implementation can be improved by making more of the implementation lazy.
Precompiled headers can be chained. When you create a PCH while including an existing PCH, Clang can create the new PCH by referencing the original file and only writing the new data to the new file. For example, you could create a PCH out of all the headers that are very commonly used throughout your project, and then create a PCH for every single source file in the project that includes the code that is specific to that file, so that recompiling the file itself is very fast, without duplicating the data from the common headers for every file.
预编译头文件(PCH)可以被链接在一起。当你在创建一个新的 PCH 时包含了一个已有的 PCH,Clang 可以通过引用原始文件并只将新数据写入新文件来创建新的 PCH。这种方法允许在不重复公共头文件数据的情况下,更高效地为每个源文件创建 PCH。
举个例子,你可以为项目中经常使用的所有头文件创建一个 PCH,然后为项目中的每个源文件创建一个 PCH,该 PCH 包含特定于该文件的代码。这样,在重新编译文件本身时,速度会非常快,同时避免了为每个文件重复公共头文件的数据。
通过链接预编译头文件,可以在保持编译速度的同时,减少生成的 PCH 文件的大小。这种方法在处理大型项目时尤为有用,因为它可以有效地减少编译时间和磁盘空间占用。
Precompiled Header Contents
Clang’s precompiled headers are organized into several different blocks, each of which contains the serialized representation of a part of Clang’s internal representation. Each of the blocks corresponds to either a block or a record within LLVM’s bitstream format. The contents of each of these logical blocks are described below.
For a given precompiled header, the llvm-bcanalyzer utility can be used to examine the actual structure of the bitstream for the precompiled header. This information can be used both to help understand the structure of the precompiled header and to isolate areas where precompiled headers can still be optimized, e.g., through the introduction of abbreviations.
- Metadata Block
- Source Manager Block
- Preprocessor Block
- Types Block
- Declarations Block
- Statements and Expressions
- Identifier Table Block
- Method Pool Block
Precompiled Header Integration Points
The “lazy” deserialization behavior of precompiled headers requires their integration into several completely different submodules of Clang. For example, lazily deserializing the declarations during name lookup requires that the name-lookup routines be able to query the precompiled header to find entities within the PCH file.
编译加速 (统一编译)
bUseUnityBuild
Whether to unify C++ code into larger files for faster compilation.
bForceUnityBuild
Whether to force C++ source files to be combined into larger files for faster compilation.
- https://stackoverflow.com/questions/45110783/when-do-i-need-to-include-cpp-files
- https://accu.org/journals/overload/25/138/thomason_2360/
- https://leegoonz.blog/2020/04/26/disable-to-buseunitybuild-even-building-to-ue4-source-code/
- https://docs.unrealengine.com/5.1/en-US/build-configuration-for-unreal-engine/
编译优化级别
gcc
- 编译环境:x86-64 gcc 4.8.5
- 在线编译工具:https://gcc.godbolt.org/
- GCC优化选项说明:Options That Control Optimization
gcc默认使用-O0
优化级别,可以通过gcc -Q --help=optimizers -O<number>
查看每个优化级别的差异。
$gcc -Q --help=optimizers -O0 | head -n20
The following options control optimizations:
-O<number>
-Ofast
-Og
-Os
-faggressive-loop-optimizations [enabled]
-falign-functions [disabled]
-falign-jumps [disabled]
-falign-labels [disabled]
-falign-loops [disabled]
-fasynchronous-unwind-tables [enabled]
-fbranch-count-reg [enabled]
-fbranch-probabilities [disabled]
-fbranch-target-load-optimize [disabled]
-fbranch-target-load-optimize2 [disabled]
-fbtr-bb-exclusive [disabled]
-fcaller-saves [disabled]
-fcombine-stack-adjustments [disabled]
-fcommon [enabled]
-fcompare-elim [disabled]
测试代码:
#include <cstdio>
int func()
{
int x = 3;
return x;
}
int func2()
{
int y = 4;
return y;
}
// 1M
//char str[1024 * 1024] = {0, 1};
int main()
{
int a = 1;
int b = 2;
int c = func();
int d = a + b + c;
printf("d(%d)\n", d);
return 0;
}
Makefile:
# Compare gcc and clang, default is gcc
#CC = /root/compile/llvm_install/bin/clang
#CXX = /root/compile/llvm_install/bin/clang++
CFLAGS = -Werror -Wall -g -pipe
CFLAGS += -O0
#CFLAGS += -O1
#CFLAGS += -O2
#CFLAGS += -O3
# https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Optimize-Options.html#Optimize-Options
# https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Link-Options.html#Link-Options
CFLAGS += -fdata-sections -ffunction-sections
INCLUDE = -I./
LIBPATH = -L./
LIBS =
BIN = demo
OBJS = demo.o
MAP_FILE = mapfile
.PHONY: clean
all: $(BIN)
$(BIN): $(OBJS)
#$(CXX) -o $@ $^ $(LIBPATH) $(LIBS) -Wl,-Map=$(MAP_FILE)
$(CXX) -o $@ $^ $(LIBPATH) $(LIBS) -Wl,-Map=$(MAP_FILE) -Wl,--gc-sections
@echo "build $(BIN) ok"
install:
@echo "nothing to install"
clean:
rm -f $(OBJS) $(BIN) $(MAP_FILE)
%.o: %.cpp
$(CXX) $(CFLAGS) $(INCLUDE) -c $<
%.o: %.c
$(CC) $(CFLAGS) $(INCLUDE) -c $<
使用gcc编译,优化选项(默认级别,不做优化):-O0
(gdb) b main
Breakpoint 1 at 0x400605: file demo.cpp, line 20.
(gdb) r
Starting program: /root/test/cpp/cpp_strip/demo
Breakpoint 1, main () at demo.cpp:20
20 int a = 1;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.tl2.3.x86_64 libgcc-4.8.5-4.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64
(gdb) disassemble /m main
Dump of assembler code for function main():
19 {
0x00000000004005fd <+0>: push %rbp
0x00000000004005fe <+1>: mov %rsp,%rbp
0x0000000000400601 <+4>: sub $0x10,%rsp
20 int a = 1;
=> 0x0000000000400605 <+8>: movl $0x1,-0x4(%rbp)
21 int b = 2;
0x000000000040060c <+15>: movl $0x2,-0x8(%rbp)
22
23 int c = func();
0x0000000000400613 <+22>: callq 0x4005ed <func()>
0x0000000000400618 <+27>: mov %eax,-0xc(%rbp)
24
25 int d = a + b + c;
0x000000000040061b <+30>: mov -0x8(%rbp),%eax
0x000000000040061e <+33>: mov -0x4(%rbp),%edx
0x0000000000400621 <+36>: add %eax,%edx
0x0000000000400623 <+38>: mov -0xc(%rbp),%eax
0x0000000000400626 <+41>: add %edx,%eax
0x0000000000400628 <+43>: mov %eax,-0x10(%rbp)
26
27 printf("d(%d)\n", d);
0x000000000040062b <+46>: mov -0x10(%rbp),%eax
0x000000000040062e <+49>: mov %eax,%esi
0x0000000000400630 <+51>: mov $0x4006cd,%edi
0x0000000000400635 <+56>: mov $0x0,%eax
0x000000000040063a <+61>: callq 0x4004d0 <printf@plt>
28
29 return 0;
0x000000000040063f <+66>: mov $0x0,%eax
30 }
0x0000000000400644 <+71>: leaveq
0x0000000000400645 <+72>: retq
End of assembler dump.
使用clang编译,优化选项(默认级别,不做优化):-O0
(gdb) b main
Breakpoint 1 at 0x40115f: file demo.cpp, line 20.
(gdb) r
Starting program: /root/test/cpp/cpp_strip/demo
Breakpoint 1, main () at demo.cpp:20
20 int a = 1;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.tl2.3.x86_64 libgcc-4.8.5-4.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64
(gdb) disas /m main
Dump of assembler code for function main:
19 {
0x0000000000401150 <+0>: push %rbp
0x0000000000401151 <+1>: mov %rsp,%rbp
0x0000000000401154 <+4>: sub $0x20,%rsp
0x0000000000401158 <+8>: movl $0x0,-0x4(%rbp)
20 int a = 1;
=> 0x000000000040115f <+15>: movl $0x1,-0x8(%rbp)
21 int b = 2;
0x0000000000401166 <+22>: movl $0x2,-0xc(%rbp)
22
23 int c = func();
0x000000000040116d <+29>: callq 0x401140 <func()>
0x0000000000401172 <+34>: mov %eax,-0x10(%rbp)
24
25 int d = a + b + c;
0x0000000000401175 <+37>: mov -0x8(%rbp),%eax
0x0000000000401178 <+40>: add -0xc(%rbp),%eax
0x000000000040117b <+43>: add -0x10(%rbp),%eax
0x000000000040117e <+46>: mov %eax,-0x14(%rbp)
26
27 printf("d(%d)\n", d);
0x0000000000401181 <+49>: mov -0x14(%rbp),%esi
0x0000000000401184 <+52>: movabs $0x402000,%rdi
0x000000000040118e <+62>: mov $0x0,%al
0x0000000000401190 <+64>: callq 0x401030 <printf@plt>
0x0000000000401195 <+69>: xor %eax,%eax
28
29 return 0;
0x0000000000401197 <+71>: add $0x20,%rsp
0x000000000040119b <+75>: pop %rbp
0x000000000040119c <+76>: retq
End of assembler dump.
也可通过在线编译工具反汇编。
clang
- Clang optimization levels
- CLang optimizations on Mac OSX
- Compiling With Clang Optimization Flags
- In clang, how do you use per-function optimization attributes?
- How to change optimization level of one function?
- https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-selectively-disabling-optimization
- clang ignoring attribute noinline
- https://clang.llvm.org/docs/AttributeReference.html
- Does LLVM/clang have flags to control code padding?
To sum it up, to find out about compiler optimization passes:
llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments
clang additionally runs some higher level optimizations, which we can retrieve with:
echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#
Documentation of individual passes is available here.
You can compare the effect of changing high-level flags such as -O like this:
diff -wy --suppress-common-lines \
<(echo 'int;' | clang -xc -Os - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp) \
<(echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp)
# will tell you that -O0 is indeed the default.
-O0 优化选项:
$echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\#
clang version 3.5.2 (tags/RELEASE_352/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
"/usr/local/bin/clang-3.5" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "-disable-free" "-disable-llvm-verifier" "-main-file-name" "-" "-mrelocation-model" "static" "-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-dwarf-column-info" "-resource-dir" "/usr/local/bin/../lib/clang/3.5.2" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/local/bin/../lib/clang/3.5.2/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-O0" "-fdebug-compilation-dir" "/data/home/gerryyang/pracing/build/release/src/gamesvr/CMakeFiles/gamesvr.dir" "-ferror-limit" "19" "-fmessage-length" "198" "-mstackrealign" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-o" "/tmp/--e22b2f.o" "-x" "c" "-"
"/bin/ld" "--eh-frame-hdr" "-m" "elf_x86_64" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "/dev/null" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crt1.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbegin.o" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64" "-L/usr/local/bin/../lib64" "-L/lib/../lib64" "-L/usr/lib/../lib64" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.." "-L/usr/local/bin/../lib" "-L/lib" "-L/usr/lib" "/tmp/--e22b2f.o" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "-lc" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn.o"
-O2 优化选项:
$echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\#
clang version 3.5.2 (tags/RELEASE_352/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
"/usr/local/bin/clang-3.5" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-disable-free" "-disable-llvm-verifier" "-main-file-name" "-" "-mrelocation-model" "static" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" "-resource-dir" "/usr/local/bin/../lib/clang/3.5.2" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/local/bin/../lib/clang/3.5.2/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-O2" "-fdebug-compilation-dir" "/data/home/gerryyang/pracing/build/release/src/gamesvr/CMakeFiles/gamesvr.dir" "-ferror-limit" "19" "-fmessage-length" "198" "-mstackrealign" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-o" "/tmp/--ebb79a.o" "-x" "c" "-"
"/bin/ld" "--eh-frame-hdr" "-m" "elf_x86_64" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "/dev/null" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crt1.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbegin.o" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64" "-L/usr/local/bin/../lib64" "-L/lib/../lib64" "-L/usr/lib/../lib64" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.." "-L/usr/local/bin/../lib" "-L/lib" "-L/usr/lib" "/tmp/--ebb79a.o" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "-lc" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn.o"
-O3 优化选项:
$echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#
clang version 3.5.2 (tags/RELEASE_352/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
"/usr/local/bin/clang-3.5" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-emit-obj" "-disable-free" "-disable-llvm-verifier" "-main-file-name" "-" "-mrelocation-model" "static" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" "-resource-dir" "/usr/local/bin/../lib/clang/3.5.2" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/local/bin/../lib/clang/3.5.2/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-O3" "-fdebug-compilation-dir" "/data/home/gerryyang/pracing/build/release/src/gamesvr/CMakeFiles/gamesvr.dir" "-ferror-limit" "19" "-fmessage-length" "198" "-mstackrealign" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-o" "/tmp/--1ea6a8.o" "-x" "c" "-"
"/bin/ld" "--eh-frame-hdr" "-m" "elf_x86_64" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "/dev/null" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crt1.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbegin.o" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64" "-L/usr/local/bin/../lib64" "-L/lib/../lib64" "-L/usr/lib/../lib64" "-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.." "-L/usr/local/bin/../lib" "-L/lib" "-L/usr/lib" "/tmp/--1ea6a8.o" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "-lc" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o" "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn.o"
diff -wy --suppress-common-lines <(echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp) <(echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp)
"-mrelax-all" <
"-mdisable-fp-elim" <
> "-momit-leaf-frame-pointer"
"-O0" | "-O2"
> "-vectorize-loops"
> "-vectorize-slp"
diff -wy --suppress-common-lines \
> <(echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp) \
> <(echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp)
"-O2" | "-O3"
With version 3.5
the passes are as follow (parsed output of the command above):
- default (-O0): -targetlibinfo -verify -verify-di
- -O1 is based on -O0
- adds: -correlated-propagation -basiccg -simplifycfg -no-aa -jump-threading -sroa -loop-unswitch -ipsccp -instcombine -memdep -memcpyopt -barrier -block-freq -loop-simplify -loop-vectorize -inline-cost -branch-prob -early-cse -lazy-value-info -loop-rotate -strip-dead-prototypes -loop-deletion -tbaa -prune-eh -indvars -loop-unroll -reassociate -loops -sccp -always-inline -basicaa -dse -globalopt -tailcallelim -functionattrs -deadargelim -notti -scalar-evolution -lower-expect -licm -loop-idiom -adce -domtree -lcssa
- -O2 is based on -01
- adds: -gvn -constmerge -globaldce -slp-vectorizer -mldst-motion -inline
- removes: -always-inline
- -O3 is based on -O2
- adds: -argpromotion
- -Os is identical to -O2
- -Oz is based on -Os
- removes: -slp-vectorizer
Extensions for selectively disabling optimization
Clang provides a mechanism for selectively disabling optimizations in functions and methods.
To disable optimizations in a single function definition, the GNU-style or C++11 non-standard attribute optnone
can be used.
// The following functions will not be optimized.
// GNU-style attribute
__attribute__((optnone)) int foo() {
// ... code
}
// C++11 attribute
[[clang::optnone]] int bar() {
// ... code
}
To facilitate disabling optimization for a range of function definitions, a range-based pragma is provided. Its syntax is #pragma clang optimize
followed by off
or on
.
All function definitions in the region between an off
and the following on
will be decorated with the optnone
attribute unless doing so would conflict with explicit attributes already present on the function (e.g. the ones that control inlining).
#pragma clang optimize off
// This function will be decorated with optnone.
int foo() {
// ... code
}
// optnone conflicts with always_inline, so bar() will not be decorated.
__attribute__((always_inline)) int bar() {
// ... code
}
#pragma clang optimize on
If no on
is found to close an off
region, the end of the region is the end of the compilation unit.
Note that a stray #pragma clang optimize on
does not selectively enable additional optimizations when compiling at low optimization levels. This feature can only be used to selectively disable optimizations.
clang ignoring attribute noinline
I expected __attribute__((noinline))
, when added to a function, to make sure that that function gets emitted. This works with gcc, but clang still seems to inline it.
Here is an example, which you can also open on Godbolt:
namespace {
__attribute__((noinline))
int inner_noinline() {
return 3;
}
int inner_inline() {
return 4;
}
int outer() {
return inner_noinline() + inner_inline();
}
}
int main() {
return outer();
}
When build with -O3
, gcc
emits inner_noinline
, but not inner_inline
:
(anonymous namespace)::inner_noinline():
mov eax, 3
ret
main:
call (anonymous namespace)::inner_noinline()
add eax, 4
ret
Clang insists on inlining it:
main: # @main
mov eax, 7
ret
If adding a parameter to the functions and letting them perform some trivial work, clang respects the noinline attribute: https://godbolt.org/z/NNSVab
Shouldn’t noinline be independent of how complex the function is? What am I missing?
Answers:
Does clang support noinline attribute?
It doesn’t have its own category in the list of attributes, but if you search for noinline there, you will find it mentioned several times.
Also, looking at the version with parameters, if I remove it there, both functions are inlined. So clang seems to at least know it.
related: noinline attribute is not respected in -O1 and above #3409
__attribute__((noinline))
prevents the compiler from inlining the function. It doesn’t prevent it from doing constant folding. In this case, the compiler was able to recognize that there was no need to call inner_noinline
, either as an inline insertion or an out-of-line call. It could just replace the function call with the constant 3
.
It sounds like you want to use the optnone
attribute instead, to prevent the compiler from applying even the most obvious of optimizations (as this one is).
Options for Debugging Your Program - GCC - 调试相关的编译选项
To tell GCC
to emit extra information for use by a debugger, in almost all cases you need only to add -g
to your other options. Some debug formats can co-exist (like DWARF with CTF) when each of them is enabled explicitly by adding the respective command line option to your other options.
GCC allows you to use -g
with -O
. The shortcuts taken by optimized code may occasionally be surprising: some variables you declared may not exist at all; flow of control may briefly move where you did not expect it; some statements may not be executed because they compute constant results or their values are already at hand; some statements may execute in different places because they have been moved out of loops. Nevertheless it is possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.
If you are not using some other optimization option, consider using -Og
(see Options That Control Optimization) with -g
. With no -O
option at all, some compiler passes that collect information useful for debugging do not run at all, so that -Og
may result in a better debugging experience.
-g
Produce debugging information in the operating system’s native format (stabs, COFF, XCOFF, or DWARF). GDB can work with this debugging information.
On most systems that use stabs format, -g
enables use of extra debugging information that only GDB can use; this extra information makes debugging work better in GDB but probably makes other debuggers crash or refuse to read the program. If you want to control for certain whether to generate the extra information, use -gvms
(see below).
-ggdb
Produce debugging information for use by GDB. This means to use the most expressive format available (DWARF, stabs, or the native format if neither of those are supported), including GDB extensions if at all possible.
-gdwarf / -gdwarf-version
Produce debugging information in DWARF format (if that is supported). The value of version may be either 2, 3, 4 or 5; the default version for most targets is 5 (with the exception of VxWorks, TPF and Darwin/Mac OS X, which default to version 2, and AIX, which defaults to version 4).
Note that with DWARF Version 2, some ports require and always use some non-conflicting DWARF 3 extensions in the unwind tables.
Version 4 may require GDB 7.0 and -fvar-tracking-assignments for maximum benefit. Version 5 requires GDB 8.0 or higher.
Include What You Use (A tool for use with clang to analyze #includes in C and C++ source files)
Here, the main benefit of include-what-you-use comes from the flip side: “don’t include what you don’t use.”
参考:https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md
- 更快的编译。当
cpp
文件包含冗余头文件时,编译器会读取、预处理和解析更多的代码,如果有模板存在,则会引入更多的代码,这会加大编译构建时间。 - 更好的重构。假如准备重构
foo.h
,使得它不再使用vector
,很可能会从foo.h
文件中移除#include<vector>
。理论上可以这么做,但实际上不行,因为其他文件可能会通过foo.h
来间接引用vector
,贸然移除会造成其他文件编译失败。iwyu 工具可以找到并去掉这种间接引用。 - 头文件自注释。通过查看必须头文件注释,可知道该功能依赖于其他哪些子功能。
- 使用前向声明代替 include 语句,减少依赖,减少可执行程序大小。
Since some coding standards have taken to discourage forward declarations, IWYU has grown a
--no_fwd_decls
mode to embrace this alternative strategy. Where IWYU’s default behavior is to minimize the number of include directives, IWYU with--no_fwd_decls
will attempt to minimize the number of times each type is redeclared. The result is that include directives will always be preferred over local forward declarations, even if it means including a header just for a name-only type declaration.
For more in-depth documentation, see docs.
NOTE: Include-what-you-use makes heavy use of Clang internals, and will occasionally break when Clang is updated. We build IWYU regularly against Clang mainline to detect and fix such compatibility breaks as soon as possible.
Build
How to build standalone
This build mode assumes you already have compiled LLVM and Clang libraries on your system, either via packages for your platform or built from source. To set up an environment for building IWYU:
- Create a directory for IWYU development, e.g.
iwyu
- Clone the IWYU Git repo:
iwyu$ git clone https://github.com/include-what-you-use/include-what-you-use.git
- Presumably, you’ll be building IWYU with a released version of LLVM and Clang, so check out the corresponding branch. For example, if you have Clang 6.0 installed, use the
clang_6.0
branch. IWYUmaster
tracks LLVM & Clangmain
:
iwyu$ cd include-what-you-use
iwyu/include-what-you-use$ git checkout clang_6.0
- Create a build root and use CMake to generate a build system linked with LLVM/Clang prebuilts:
# This example uses the Makefile generator, but anything should work.
iwyu/include-what-you-use$ cd ..
iwyu$ mkdir build && cd build
# For IWYU 0.10/Clang 6 and earlier
iwyu/build$ cmake -G "Unix Makefiles" -DIWYU_LLVM_ROOT_PATH=/usr/lib/llvm-6.0 ../include-what-you-use
# For IWYU 0.11/Clang 7 and later
iwyu/build$ cmake -G "Unix Makefiles" -DCMAKE_PREFIX_PATH=/usr/lib/llvm-7 ../include-what-you-use
(substitute the llvm-6.0
or llvm-7
suffixes with the actual version compatible with your IWYU branch)
or, if you have a local LLVM and Clang build tree, you can specify that as CMAKE_PREFIX_PATH
for IWYU 0.11 and later:
iwyu/build$ cmake -G "Unix Makefiles" -DCMAKE_PREFIX_PATH=~/llvm-project/build ../include-what-you-use
- Once CMake has generated a build system, you can invoke it directly from
build
, e.g.
iwyu/build$ make
How to build as part of LLVM
Instructions for building LLVM and Clang are available at https://clang.llvm.org/get_started.html.
To include IWYU in the LLVM build, use the LLVM_EXTERNAL_PROJECTS
and LLVM_EXTERNAL_*_SOURCE_DIR
CMake variables when configuring LLVM:
llvm-project/build$ cmake -G "Unix Makefiles" -DLLVM_ENABLE_PROJECTS=clang -DLLVM_EXTERNAL_PROJECTS=iwyu -DLLVM_EXTERNAL_IWYU_SOURCE_DIR=/path/to/iwyu /path/to/llvm-project/llvm
llvm-project/build$ make
This builds all of LLVM, Clang and IWYU in a single tree.
Usage
$include-what-you-use --help
USAGE: include-what-you-use [-Xiwyu --iwyu_opt]... <clang opts> <source file>
Here are the <iwyu_opts> you can specify (e.g. -Xiwyu --verbose=3):
--check_also=<glob>: tells iwyu to print iwyu-violation info
for all files matching the given glob pattern (in addition
to the default of reporting for the input .cc file and its
associated .h files). This flag may be specified multiple
times to specify multiple glob patterns.
--keep=<glob>: tells iwyu to always keep these includes.
This flag may be specified multiple times to specify
multiple glob patterns.
--mapping_file=<filename>: gives iwyu a mapping file.
--no_default_mappings: do not add iwyu's default mappings.
--pch_in_code: mark the first include in a translation unit as a
precompiled header. Use --pch_in_code to prevent IWYU from
removing necessary PCH includes. Though Clang forces PCHs
to be listed as prefix headers, the PCH-in-code pattern can
be used with GCC and is standard practice on MSVC
(e.g. stdafx.h).
--prefix_header_includes=<value>: tells iwyu what to do with
in-source includes and forward declarations involving
prefix headers. Prefix header is a file included via
command-line option -include. If prefix header makes
include or forward declaration obsolete, presence of such
include can be controlled with the following values
add: new lines are added
keep: new lines aren't added, existing are kept intact
remove: new lines aren't added, existing are removed
Default value is 'add'.
--transitive_includes_only: do not suggest that a file add
foo.h unless foo.h is already visible in the file's
transitive includes.
--max_line_length: maximum line length for includes.
Note that this only affects comments and alignment thereof,
the maximum line length can still be exceeded with long
file names (default: 80).
--no_comments: do not add 'why' comments.
--no_fwd_decls: do not use forward declarations.
--verbose=<level>: the higher the level, the more output.
--quoted_includes_first: when sorting includes, place quoted
ones first.
--cxx17ns: suggests the more concise syntax introduced in C++17
In addition to IWYU-specific options you can specify the following
options without -Xiwyu prefix:
--help: prints this help and exits.
--version: prints version and exits.
Running on single source file
The simplest way to use IWYU is to run it against a single source file:
include-what-you-use $CXXFLAGS myfile.cc
where $CXXFLAGS
are the flags you would normally pass to the compiler.
Plugging into existing build system
Typically there is already a build system containing the relevant compiler flags for all source files. Replace your compiler with include-what-you-use
to generate a large batch of IWYU advice. Depending on your build system/build tools, this can take many forms, but for a simple GNU Make system it might look like this:
make -k CXX=include-what-you-use CXXFLAGS="-Xiwyu --error_always"
The additional
-Xiwyu --error_always
switch makesinclude-what-you-use
always exit with an error code, so the build system knows it didn’t build a .o file. Hence the need for-k
.
In this mode include-what-you-use
only analyzes the .cc
(or .cpp
) files known to your build system, along with their corresponding .h
files. If your project has a .h
file with no corresponding .cc
file, IWYU will ignore it unless you use the --check_also
switch to add it for analysis together with a .cc
file. It is possible to run IWYU against individual header files, provided the compiler flags are carefully constructed to match all includers.
Using with CMake
CMake has grown native support for IWYU as of version 3.3. See their documentation for CMake-side details.
New in version 3.3.
This property is implemented only when <LANG> is C or CXX.
Specify a semicolon-separated list containing a command line for the include-what-you-use tool. The Makefile Generators and the Ninja generator will run this tool along with the compiler and report a warning if the tool reports any problems.
The CMAKE_CXX_INCLUDE_WHAT_YOU_USE
option enables a mode where CMake first compiles a source file, and then runs IWYU on it.
Use it like this:
mkdir build && cd build
CC="clang" CXX="clang++" cmake -DCMAKE_CXX_INCLUDE_WHAT_YOU_USE=include-what-you-use ...
These examples assume that include-what-you-use
is in the PATH
. If it isn’t, consider changing the value to an absolute path. Arguments to IWYU can be added using CMake’s semicolon-separated list syntax, e.g.:
... cmake -DCMAKE_CXX_INCLUDE_WHAT_YOU_USE="include-what-you-use;-w;-Xiwyu;--verbose=7" ...
The option appears to be separately supported for both C and C++, so use CMAKE_C_INCLUDE_WHAT_YOU_USE
for C code.
Using with a compilation database
The iwyu_tool.py
script pre-dates the native CMake support, and works off the compilation database format. For example, CMake generates such a database named compile_commands.json
with the CMAKE_EXPORT_COMPILE_COMMANDS
option enabled.
The script’s command-line syntax is designed to mimic Clang’s LibTooling, but they are otherwise unrelated. It can be used like this:
mkdir build && cd build
CC="clang" CXX="clang++" cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ...
iwyu_tool.py -p .
Unless a source filename is provided, all files in the project will be analyzed.
See iwyu_tool.py --help
for more options.
Applying fixes
We also include a tool that automatically fixes up your source files based on the IWYU recommendations. This is also alpha-quality software! Here’s how to use it (requires python3):
make -k CXX=include-what-you-use CXXFLAGS="-Xiwyu --error_always" 2> /tmp/iwyu.out
python3 fix_includes.py < /tmp/iwyu.out
If you don’t like the way fix_includes.py
munges your #include
lines, you can control its behavior via flags. fix_includes.py --help
will give a full list, but these are some common ones:
-b
: Put blank lines between system and Google includes--nocomments
: Don’t add the ‘why’ comments next to includes
Which pragma should I use?
Ideally, IWYU should be smart enough to understand your intentions (and intentions of the authors of libraries you use), so the first answer should always be: none.
In practice, intentions are not so clear – it might be ambiguous whether an #include
is there by clever design or by mistake, whether an #include
serves to export symbols from a private header through a public facade or if it’s just a left-over after some clean-up. Even when intent is obvious, IWYU can make mistakes due to bugs or not-yet-implemented policies.
IWYU pragmas have some overlap, so it can sometimes be hard to choose one over the other. Here’s a guide based on how I understand them at the moment:
- Use
IWYU pragma: keep
to force IWYU to keep any#include
directive that would be discarded under its normal policies. - Use
IWYU pragma: always_keep
to force IWYU to keep a header in all includers, whether they contribute any used symbols or not. - Use
IWYU pragma: export
to tell IWYU that one header serves as the provider for all symbols in another, included header (e.g. facade headers). UseIWYU pragma: begin_exports/end_exports
for a whole group of included headers. - Use
IWYU pragma: no_include
to tell IWYU that the file in which the pragma is defined should never#include
a specific header (the header may already be included via some other#include
.) - Use
IWYU pragma: no_forward_declare
to tell IWYU that the file in which the pragma is defined should never forward-declare a specific symbol (a forward declaration may already be available via some other#include
.) - Use
IWYU pragma: private
to tell IWYU that the header in which the pragma is defined is private, and should not be included directly. - Use
IWYU pragma: private, include "public.h"
to tell IWYU that the header in which the pragma is defined is private, andpublic.h
should always be included instead. - Use
IWYU pragma: friend ".*favorites.*"
to overrideIWYU pragma: private
selectively, so that a set of files identified by a regex can include the file even if it’s private.
The pragmas come in three different classes;
- Ones that apply to a single
#include
directive (keep
,export
) - Ones that apply to a file being included (
private
,friend
,always_keep
) - Ones that apply to a file including other headers (
no_include
,no_forward_declare
)
Some files are both included and include others, so it can make sense to mix and match.
Why include-what-you-use is difficult
This section is informational, for folks who are wondering why include-what-you-use requires so much code and yet still has so many errors.
Include-what-you-use has the most problems with templates and macros. If your code doesn’t use either, IWYU will probably do great. And, you’re probably not actually programming in C++…
编译二进制大小优化
在使用 Clang 编译器时,有多种方法可以优化生成的二进制文件大小。以下是一些建议:
- 优化级别:使用
-Os
或-Oz
优化选项。这些选项专门针对生成较小的二进制文件进行优化。
clang -Os -o output_file input_file.c
或者
clang -Oz -o output_file input_file.c
-
去除调试信息:如果不需要调试信息,请确保不使用
-g
选项。如果需要调试信息,但希望减小文件大小,可以考虑使用-Wl,--compress-debug-sections=zlib
将调试信息压缩。 -
链接时优化(LTO):使用链接时优化可以在链接阶段进行更多优化,这可能有助于减小生成的二进制文件大小。要启用 LTO,请使用
-flto
选项.
clang -flto -Os -o output_file input_file.c
- 去除未使用的代码和数据:使用
-ffunction-sections
和-fdata-sections
选项将函数和数据放入单独的节(section),然后使用链接器选项--gc-sections
删除未使用的节:
clang -Os -ffunction-sections -fdata-sections -o output_file input_file.c -Wl,--gc-sections
-
静态链接:尽量避免静态链接,因为它会将库的整个内容包含到二进制文件中。相反,使用动态链接可以减小二进制文件大小。
-
符号剥离:使用
strip
工具删除不必要的符号信息。这不仅可以减小二进制文件大小,还可以防止其他人轻松地逆向工程您的代码。在编译完成后,运行以下命令:
strip output_file
请注意,这将删除所有符号信息,使调试变得困难。因此,仅在不需要调试信息时执行此操作。
- 代码优化:在源代码级别进行优化。例如,删除不必要的代码,减少全局变量的使用,使用更小的数据类型等。
通过结合使用这些技巧,可以在使用 Clang 编译器时优化生成的二进制文件大小。请注意,某些优化可能会影响程序的性能和可调试性,因此在选择优化方法时要权衡利弊。
删除不使用的 Dead Codes (-fdata-sections / -ffunction-sections / -Wl,–gc-sections)
参考Compilation options通过下面两步,去除代码没有使用的函数:
- 添加编译选项
CFLAGS += -fdata-sections -ffunction-sections
- 添加链接选项
-Wl,--gc-sections
通过上面两步,会将函数代码生成为独立的section,并在链接的时候去除不用的Dead Codes。
注意:此选项对gcc和clang都生效。
$readelf -t demo.o
There are 26 section headers, starting at offset 0x10b0:
Section Headers:
[Nr] Name
Type Address Offset Link
Size EntSize Info Align
Flags
[ 0]
NULL NULL 0000000000000000 0000000000000000 0
0000000000000000 0000000000000000 0 0
[0000000000000000]:
[ 1] .text
PROGBITS PROGBITS 0000000000000000 0000000000000040 0
0000000000000000 0000000000000000 0 4
[0000000000000006]: ALLOC, EXEC
[ 2] .data
PROGBITS PROGBITS 0000000000000000 0000000000000040 0
0000000000000000 0000000000000000 0 4
[0000000000000003]: WRITE, ALLOC
[ 3] .bss
NOBITS NOBITS 0000000000000000 0000000000000040 0
0000000000000000 0000000000000000 0 4
[0000000000000003]: WRITE, ALLOC
[ 4] .text._Z4funcv
PROGBITS PROGBITS 0000000000000000 0000000000000040 0
0000000000000010 0000000000000000 0 1
[0000000000000006]: ALLOC, EXEC
[ 5] .text._Z5func2v
PROGBITS PROGBITS 0000000000000000 0000000000000050 0
0000000000000010 0000000000000000 0 1
[0000000000000006]: ALLOC, EXEC
[ 6] .rodata
PROGBITS PROGBITS 0000000000000000 0000000000000060 0
0000000000000007 0000000000000000 0 1
[0000000000000002]: ALLOC
[ 7] .text.main
PROGBITS PROGBITS 0000000000000000 0000000000000067 0
0000000000000049 0000000000000000 0 1
[0000000000000006]: ALLOC, EXEC
[ 8] .rela.text.main
RELA RELA 0000000000000000 0000000000001970 24
0000000000000048 0000000000000018 7 8
[0000000000000000]:
[ 9] .debug_info
PROGBITS PROGBITS 0000000000000000 00000000000000b0 0
000000000000076f 0000000000000000 0 1
[0000000000000000]:
...
并且最终的链接代码中,不会存在未使用函数的代码:
$nm -C demo | grep func
00000000004005f0 T func()
查看mapfile,可以看到func2被discard了:
Discarded input sections
...
.text 0x0000000000000000 0x0 demo.o
.data 0x0000000000000000 0x0 demo.o
.bss 0x0000000000000000 0x0 demo.o
.text._Z5func2v
0x0000000000000000 0x10 demo.o
...
$objdump -s -j .text._Z5func2v demo.o
demo.o: file format elf64-x86-64
Contents of section .text._Z5func2v:
0000 554889e5 c745fc04 0000008b 45fc5dc3 UH...E......E.].
refer:
- https://stackoverflow.com/questions/6687630/how-to-remove-unused-c-c-symbols-with-gcc-and-ld
- https://stackoverflow.com/questions/54996229/is-ffunction-sections-fdata-sections-and-gc-sections-not-working
- https://stackoverflow.com/questions/17710024/clang-removing-dead-code-during-static-linking-gcc-equivalent-of-wl-gc-sect
strip
strip
用于删除目标文件中的符号(Discard symbols from object files),通常用于删除已生成的可执行文件和库中不需要的符号。- 在想要减少文件的大小,并保留对调试有用的信息时,使用
-d
选项,可以删除不使用的信息(文件名和行号等),并可以保留函数名等一般的符号,用gdb进行调试时,只要保留了函数名,即便不知道文件名和行号,也可以进行调试。 - 使用
-R
选项,是可删除其他任意信息的选项,在执行strip -R .text demo1
后,程序的text部分(代码部分)会被完全删除,从而导致程序的无法运行。 - 实际上,对
.o
文件以及.a
文件使用strip后,就不能进行和其他目标文件的链接操作。这是由于文件对链接器符号有依赖性,所以最好不要从.o
和.a
文件中删除符号。 - 对release的版本strip,当用户环境产生coredump后,可以通过包含调试信息的开发版本在开发环境进行调试。
- 虽然在磁盘容量足够大的PC中,可能不会出现想要将可执行文件变小的情况。但在容量有限的环境,或想要通过网络复制并运行程序时,
strip
却是一个方便的工具。
# 去除目标文件中的符号
$ strip objfile
$ nm objfile
nm: objfile: no symbols
# 删除代码段
$ strip -R .text demo1
$ ./demo1
Segmentation fault (core dumped)
objcopy (分离调试信息)
objcopy
- copy and translate object file- 实际上,在
objcopy
上使用-strip-*
选项后也能进行与strip
同样的处理。
例如:
# 拷贝出一个符号表文件
$ objcopy --only-keep-debug mainO3 mainO3.symbol
# 拷贝出一个不包含调试信息的执行文件
$ objcopy --strip-debug mainO3 mainO3.bin
$ objcopy --add-gnu-debuglink=mainO3.symbol mainO3
- 用
objcopy
嵌入可执行文件的数据,objcopy
可以将任意文件转换为可以链接的目标文件。
例如:可以将foo.jpg
转换为x86用的ELF32形式的目标文件foo.o
$ objcopy -I binary -O elf32-i386 -B i386 foo.jpg foo.o
脚本工具:
#!/bin/bash
scriptdir=`dirname ${0}`
scriptdir=`(cd ${scriptdir}; pwd)`
scriptname=`basename ${0}`
set -e
function errorexit()
{
errorcode=${1}
shift
echo $@
exit ${errorcode}
}
function usage()
{
echo "USAGE ${scriptname} <tostrip>"
}
tostripdir=`dirname "$1"`
tostripfile=`basename "$1"`
if [ -z ${tostripfile} ] ; then
usage
errorexit 0 "tostrip must be specified"
fi
cd "${tostripdir}"
debugdir=.debug
debugfile="${tostripfile}.debug"
if [ ! -d "${debugdir}" ] ; then
echo "creating dir ${tostripdir}/${debugdir}"
mkdir -p "${debugdir}"
fi
echo "stripping ${tostripfile}, putting debug info into ${debugfile}"
objcopy --only-keep-debug "${tostripfile}" "${debugdir}/${debugfile}"
strip --strip-debug --strip-unneeded "${tostripfile}"
objcopy --add-gnu-debuglink="${debugdir}/${debugfile}" "${tostripfile}"
chmod -x "${debugdir}/${debugfile}"
refer: How to generate gcc debug symbol outside the build target?
LLVM Link Time Optimization: Design and Implementation - LTO
使用 LTO 通过牺牲更多的编译时间,通过跨模块的上下文信息,实现编译优化。
由于编译器一次只编译优化一个编译单元,所以只是在做局部优化,而利用 LTO,利用链接时的全局视角进行操作,从而得到能够进行更加极致的优化。
跨模块优化的效果,也即开启 LTO 主要有这几点好处:
- 将一些函数內联化
- 去除了一些无用代码
- 对程序有全局的优化作用
比较体验不好的是,LTO 会导致编译和链接变慢,以及会使用更多的内存,所以即使到现在,也没有看到 LTO 被广泛地使用。
- 代码优化利器 LTO 介绍
- 开启Link Time Optimization(LTO)后到底有什么优化?
- Link Time Optimizations: New Way to Do Compiler Optimizations
- ThinLTO: Scalable and Incremental LTO
- LTO with LLVM and CMake
- Clang: How to check if LTO was performed
Description
LLVM features powerful intermodular optimizations which can be used at link time. Link Time Optimization (LTO) is another name for intermodular optimization when performed during the link stage. This document describes the interface and design between the LTO optimizer and the linker.
Design Philosophy
The LLVM Link Time Optimizer provides complete transparency, while doing intermodular optimization, in the compiler tool chain. Its main goal is to let the developer take advantage of intermodular optimizations without making any significant changes to the developer’s makefiles or build system. This is achieved through tight integration with the linker. In this model, the linker treats LLVM bitcode files like native object files and allows mixing and matching among them. The linker uses libLTO, a shared object, to handle LLVM bitcode files. This tight integration between the linker and LLVM optimizer helps to do optimizations that are not possible in other models. The linker input allows the optimizer to avoid relying on conservative escape analysis.
Example of link time optimization
The following example illustrates the advantages of LTO’s integrated approach and clean interface. This example requires a system linker which supports LTO through the interface described in this document. Here, clang transparently invokes system linker.
- Input source file
a.c
is compiled into LLVM bitcode form. - Input source file
main.c
is compiled into native object code.
--- a.h ---
extern int foo1(void);
extern void foo2(void);
extern void foo4(void);
--- a.c ---
#include "a.h"
static signed int i = 0;
void foo2(void) {
i = -1;
}
static int foo3() {
foo4();
return 10;
}
int foo1(void) {
int data = 0;
if (i < 0)
data = foo3();
data = data + 42;
return data;
}
--- main.c ---
#include <stdio.h>
#include "a.h"
void foo4(void) {
printf("Hi\n");
}
int main() {
return foo1();
}
To compile, run:
% clang -flto -c a.c -o a.o # <-- a.o is LLVM bitcode file
% clang -c main.c -o main.o # <-- main.o is native object file
% clang -flto a.o main.o -o main # <-- standard link command with -flto
- In this example, the linker recognizes that
foo2()
is an externally visible symbol defined in LLVM bitcode file. The linker completes its usual symbol resolution pass and finds thatfoo2()
is not used anywhere. This information is used by the LLVM optimizer and it removesfoo2()
. - As soon as
foo2()
is removed, the optimizer recognizes that conditioni < 0
is alwaysfalse
, which meansfoo3()
is never used. Hence, the optimizer also removesfoo3()
. - And this in turn, enables linker to remove
foo4()
.
This example illustrates the advantage of tight integration with the linker. Here, the optimizer can not remove foo3()
without the linker’s input.
Alternative Approaches
- Compiler driver invokes link time optimizer separately.
In this model the link time optimizer is not able to take advantage of information collected during the linker’s normal symbol resolution phase. In the above example, the optimizer can not remove foo2()
without the linker’s input because it is externally visible. This in turn prohibits the optimizer from removing foo3()
.
- Use separate tool to collect symbol information from all object files.
In this model, a new, separate, tool or library replicates the linker’s capability to collect information for link time optimization. Not only is this code duplication difficult to justify, but it also has several other disadvantages. For example, the linking semantics and the features provided by the linker on various platform are not unique. This means, this new tool needs to support all such features and platforms in one super tool or a separate tool per platform is required. This increases maintenance cost for link time optimizer significantly, which is not necessary. This approach also requires staying synchronized with linker developments on various platforms, which is not the main focus of the link time optimizer. Finally, this approach increases end user’s build time due to the duplication of work done by this separate tool and the linker itself.
Multi-phase communication between libLTO and linker
The linker collects information about symbol definitions and uses in various link objects which is more accurate than any information collected by other tools during typical build cycles. The linker collects this information by looking at the definitions and uses of symbols in native .o files and using symbol visibility information. The linker also uses user-supplied information, such as a list of exported symbols. LLVM optimizer collects control flow information, data flow information and knows much more about program structure from the optimizer’s point of view. Our goal is to take advantage of tight integration between the linker and the optimizer by sharing this information during various linking phases.
问题
-fdebug-types-section
When using DWARF Version 4 or higher, type DIEs can be put into their own .debug_types
section instead of making them part of the .debug_info
section. It is more efficient to put them in a separate comdat section since the linker can then remove duplicates. But not all DWARF consumers support .debug_types
sections yet and on some objects .debug_types
produces larger instead of smaller debugging information.
-fdebug-types-section
选项用于在生成 DWARF 调试信息时将类型定义(type DIEs)放入单独的 .debug_types 节中,而不是将它们作为 .debug_info 节的一部分。这个选项适用于 DWARF 版本4及更高版本。
将类型定义放入单独的 .debug_types 节有以下优势:
链接器效率:链接器可以通过合并重复的类型定义来减小生成的调试信息的大小。将类型定义放入单独的 .debug_types 节(通常是comdat节)可以让链接器更容易地识别和删除重复的类型定义。
然而,使用 -fdebug-types-section 选项也存在一些限制和问题:
DWARF 消费者的兼容性:并非所有处理 DWARF 调试信息的工具(如调试器和分析器)都支持 .debug_types 节。在这种情况下,使用 -fdebug-types-section 可能会导致兼容性问题。
调试信息大小:在某些情况下,使用 .debug_types 节可能会导致生成的调试信息更大,而不是更小。这取决于具体的对象文件和类型定义。
总之,-fdebug-types-section 选项用于将类型定义放入单独的 .debug_types 节,以提高链接器效率。然而,在使用此选项时,请注意兼容性和调试信息大小的潜在问题。在选择是否使用此选项时,请根据您的项目需求和目标平台进行权衡。
–compress-debug-sections=zlib
--compress-debug-sections=none
--compress-debug-sections=zlib
--compress-debug-sections=zlib-gnu
--compress-debug-sections=zlib-gabi
--compress-debug-sections=zstd
On ELF platforms, these options control how DWARF debug sections are compressed using zlib.
使用 -Wl,–compress-debug-sections=zlib 可以压缩调试信息,从而减小生成的二进制文件大小。然而,在使用此选项时,有一些注意事项和可能的问题:
调试器兼容性:并非所有调试器都支持压缩后的调试信息。在使用压缩调试信息的二进制文件进行调试时,请确保您的调试器(如GDB)支持处理压缩后的调试节。较新版本的GDB通常支持这一点。
解压缩开销:虽然压缩调试信息可以减小文件大小,但在调试过程中,调试器需要解压缩这些信息。这可能会导致调试过程稍微变慢。对于大型项目,解压缩时间可能会有所增加。
链接器支持:并非所有链接器都支持 –compress-debug-sections 选项。在使用此选项时,请确保您的链接器支持它。通常,较新版本的GNU ld链接器支持此功能。
二进制文件可移植性:如果您需要将二进制文件分发给其他用户,他们可能使用不同的调试器或操作系统。在这种情况下,使用压缩调试信息可能会导致兼容性问题。在将二进制文件分发给其他用户之前,请确保他们的环境支持处理压缩后的调试信息。
总之,在使用 -Wl,–compress-debug-sections=zlib 选项时,请确保您的工具链和调试器支持处理压缩后的调试信息。同时,请注意,在某些情况下,这可能会影响调试过程的性能。
通过 readelf -S
可以查看 .debug_info
在使用压缩后的大小变化:
$ls -lh unittestsvr*
-rwxr-xr-x 1 gerryyang users 224M 7月 22 12:04 unittestsvr.nozip
-rwxr-xr-x 1 gerryyang users 106M 7月 22 12:04 unittestsvr.zip
$readelf -S unittestsvr.nozip | grep -A1 .debug_info
[34] .debug_info PROGBITS 0000000000000000 02b55390
000000000442cdcd 0000000000000000 0 0 1
$readelf -S unittestsvr.zip | grep -A1 .debug_info
[34] .debug_info PROGBITS 0000000000000000 0396840d
0000000001e87e1a 0000000000000000 C 0 0 1
截至目前(2022年2月),Clang编译器和GNU ld链接器尚未支持 –compress-debug-sections=zstd 选项。目前,GNU ld链接器支持的调试信息压缩方法是zlib(–compress-debug-sections=zlib)。
如果您希望使用 zstd 压缩调试信息,可以考虑在编译和链接完成后手动压缩调试信息。以下是一个使用 objcopy 工具手动压缩调试信息的示例:
首先,使用-g选项编译源代码以生成调试信息:
clang -g -o output_file input_file.c
使用objcopy将未压缩的调试信息从二进制文件中提取到单独的文件:
objcopy --only-keep-debug output_file output_file.debug
使用zstd手动压缩提取的调试信息:
zstd -o output_file.debug.zst output_file.debug
将压缩后的调试信息与二进制文件关联:
objcopy --add-gnu-debuglink=output_file.debug.zst output_file
请注意,这种方法可能不被所有调试器支持,因为它们可能无法识别zstd压缩的调试信息。在使用此方法之前,请确保您的调试器支持处理zstd压缩的调试信息。
其他
-fno-rtti / -frtti
- https://desk.zoho.com.cn/portal/sylixos/zh/kb/articles/c-%E7%BC%96%E8%AF%91%E9%80%89%E9%A1%B9-fno-rtti-%E5%92%8C-frtti%E6%B5%85%E6%9E%90
- https://stackoverflow.com/questions/23912955/disable-rtti-for-some-classes
- https://stackoverflow.com/questions/36261573/gcc-c-override-frtti-for-single-class
-Wl,–start-group / -Wl,–end-group
What are the –start-group and –end-group command line options?
What is the purpose of those command line options? Please help to decipher the meaning of the following command line:
-Wl,--start-group -lmy_lib -lyour_lib -lhis_lib -Wl,--end-group -ltheir_lib
Apparently it has something to do with linking, but the GNU manual is quiet what exactly grouping means.
Answers:
It is for resolving circular dependences between several libraries (listed between -(
and -)
).
Citing Why does the order in which libraries are linked sometimes cause errors in GCC? or man ld http://linux.die.net/man/1/ld
-(
archives-)
or--start-group
archives--end-group
The archives should be a list of archive files. They may be either explicit file names, or -l options.
The specified archives are searched repeatedly until no new undefined references are created. Normally, an archive is searched only once in the order that it is specified on the command line. If a symbol in that archive is needed to resolve an undefined symbol referred to by an object in an archive that appears later on the command line, the linker would not be able to resolve that reference. By grouping the archives, they all be searched repeatedly until all possible references are resolved.
Using this option has a significant performance cost. It is best to use it only when there are unavoidable circular references between two or more archives.
So, libraries inside the group can be searched for new symbols several time, and you need no ugly constructs like -llib1 -llib2 -llib1
PS archive means basically a static library (*.a
files)
优化调试
time-trace: timeline / flame chart profiler for Clang
Q&A
Relocation overflow and code models
There are several strategies to mitigate relocation overflow issues.
- Make the program smaller by reducing code and data size.
- Partition the large monolithic executable into the main executable and a few shared objects.
- Switch to the medium code model
- Use compiler options such as
-Os
,-Oz
and link-time optimization that focuses on decreasing the code size. - For compiler instrumentations (e.g.
-fsanitize=address
,-fprofile-generate
), move some data to large data sections. - Use linker script commands
INSERT BEFORE
andINSERT AFTER
to reorder output sections.
在某些情况下,当静态链接的二进制文件超过2GB时,可能会遇到relocation overflow问题。这是因为在大型程序中,某些指针和地址可能超出了编译器为其分配的空间。为了解决这个问题,可以尝试以下方法:
- 使用大型模型或大型地址空间:在编译时,可以选择使用大型模型(例如-mcmodel=large)或大型地址空间(例如-mlarge-address-aware)。这将允许编译器和链接器使用更大的地址空间,以便处理大型程序。具体的编译选项可能因编译器而异,请参阅编译器文档以获取适当的选项。
- 分割程序:如果可能,将程序分割成多个较小的模块或库。这可以减小每个模块的大小,降低relocation overflow的风险。此外,使用动态链接库(DLL)或共享对象(SO)可以进一步减小二进制文件的大小。
- 优化代码:检查代码以查找潜在的优化点,例如删除未使用的代码、减少全局变量的使用、优化数据结构和算法等。这可以帮助减小二进制文件的大小,从而降低relocation overflow的风险。
- 更新编译器和链接器:确保使用的编译器和链接器是最新版本,因为它们可能包含解决relocation overflow问题的修复和改进。此外,尝试使用其他编译器,看看它们是否能更好地处理大型程序。
- 考虑使用动态链接:虽然静态链接可以将所有依赖项打包到单个二进制文件中,但它可能导致文件过大。如果可能,考虑改用动态链接,将依赖项链接为共享库或动态链接库。这样可以减小二进制文件的大小,并减轻relocation overflow问题。
请注意,解决relocation overflow问题可能需要对代码、编译选项和链接过程进行多方面的调整。在尝试上述方法时,请根据具体情况选择合适的策略。
Refer
- C++ 服务编译耗时优化原理及实践 (美团)