<akpm@osdl.org>
[PATCH] Add CONFIG for -mregparm=3
From: Andi Kleen <ak@muc.de>, me.
Using -mregparm=3 shrinks the kernel further:
(compiled with gcc 3.4, without -funit-at-a-time, using the later and
together with -Os shrinks .text even more, making over 700KB difference)
4129346 708629 207240 5045215 4cfbdf vmlinux
3892905 708629 207240 4808774 496046 vmlinux-regparm
This one helps even more, >236KB .text difference. Clearly worth
the effort.
This patch adds an option to use -mregparm=3 while compiling the kernel. I
did an LTP run and it showed no additional failures over an non regparm
kernel.
According to some gcc developers it should be safe to use in all gccs that
are still supports (2.95 and up)
I didn't make it the default because it will break all binary only modules
(although they can be fixed by adding a wrapper that calls them with
"asmlinkage" ). Actually it may be a good idea to make this default with
2.7.1 or somesuch.
We add new kbuild infrastructure: the command
scripts/gcc-version.sh $(CC)
will print out the version of gcc in a canonical 4-digit form suitable for
performing numerical tests against.
DESC
arch/i386/Makefile,scripts/gcc-version.sh,Makefile small fixes
EDESC
From: Serge Belyshev <33554432@mtu-net.ru>
arch/i386/Makefile:
* omitted $(KBUILD_SRC)/ in script call.
scripts/gcc-version.sh:
* GNU tail no longer supports 'tail -1' syntax.
We should consider adding -fweb option:
vanilla:
$ size vmlinux
text data bss dec hex filename
3056270 526780 386056 3969106 3c9052 vmlinux
with -fweb:
$ size vmlinux
text data bss dec hex filename
3049523 526780 386056 3962359 3c75f7 vmlinux
Also note 0.1 ... 1.0% speedup in various benchmarks.
This option is not enabled by default at -O2 because it
(like -fomit-frame-pointer) makes debugging impossible.
<akpm@osdl.org>
[PATCH] Use -funit-at-a-time on ia32
From: Andi Kleen <ak@muc.de>
The upcomming gcc 3.4 has a new compilation mode called unit-at-a-time.
What it does is to first load the whole file into memory and then generate
the output. This allows it to use a better inlining strategy, drop unused
static functions and use -mregparm automatically for static functions.
It does not seem to compile significantly slower.
This is also available in some of the 3.3 based "hammer branch"
compilers used in distributions (at least in SuSE and Mandrake)
Some tests show impressive .text shrinkage from unit-at-a-time.
e.g. here is the same kernel compiled with -fno-unit-at-a-time and
-funit-at-a-time with a gcc 3.4 snapshot. The gains are really
impressive:
text data bss dec hex filename
4129346 708629 207240 5045215 4cfbdf vmlinux-nounitatatime
3999250 674853 207208 4881311 4a7b9f vmlinux-unitatatime
.text shrinks by over 130KB!. And .data shrinks too.
At first look the numbers look nearly too good to be true, but they have been
verified with several configurations and seem to be real. It looks like
we have a lot of stupid inlines or dead functions. I'm really not
sure why it is that much better. But it's hard to argue with hard
numbers.
[A bloat-o-meter comparision between the two vmlinuxes can be found in
http://www.firstfloor.org/~andi/unit-vs-no-unit.gz . It doesn't show
any obvious candidates unfortunately, just lots of small changes]
With the gcc 3.3-hammer from SuSE 9.0 the gains are a bit smaller, but
still noticeable (>100KB on .text)
This patch enables -funit-at-a-time on ia32 if the compiler is gcc-3.4 or
later. We had several reports of gcc-3.3 producing very early lockups. |