2021-03-12 12:01:36

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 0/2] x86: Remove ideal_nops[]

Hi!

A while ago Steve complained about x86 being weird for having different NOPs [1]

Having cursed the same thing before, I figured it was time to look at the NOP
situation.

32bit simply isn't a performance target anymore, so all we need is a set of
NOPs that works on all.

x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
P6 and is architecturally mandated for x86_64. However, some uarchs made the
choice to limit NOPL decoding to a single port, which obviously limits NOPL
throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
prefixes, hobbling prefix NOP throughput.

But the thing is, all the modern uarchs can handle both without issue; that is
AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
exception is Atom, which has the prefix penalty.

Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
simply irrelevant today, remove variable NOPs and use NOPL.

This gives us deterministic NOPs and restores sanity.



[1] https://lkml.kernel.org/r/[email protected]


2021-03-12 14:32:09

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 1:00 PM Peter Zijlstra <[email protected]> wrote:
>
> Hi!
>
> A while ago Steve complained about x86 being weird for having different NOPs [1]
>
> Having cursed the same thing before, I figured it was time to look at the NOP
> situation.
>
> 32bit simply isn't a performance target anymore, so all we need is a set of
> NOPs that works on all.
>
> x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
> P6 and is architecturally mandated for x86_64. However, some uarchs made the
> choice to limit NOPL decoding to a single port, which obviously limits NOPL
> throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
> prefixes, hobbling prefix NOP throughput.
>
> But the thing is, all the modern uarchs can handle both without issue; that is
> AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
> exception is Atom, which has the prefix penalty.
>
> Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> simply irrelevant today, remove variable NOPs and use NOPL.
>

Hi Peter,

I am an Intel SandyBridge power user and want the ultimate performance
on my hardware.

What does this change exactly mean to/for me?

I got this laptop as the last gift for my birthday in 2012 from my mother.
She died the same year.
So, this is a bit sentimental hardware for me.

It's amazing what this laptop all was involved in.
10+ years of LLVM/Clang for Linux-kernel and Linux graphics stack.
Worked in a Ubuntu/precise 12.04 LTS WUBI (installation) environment -
5 years (full LTS period) long!
How many Linux-kernel bugs got reported and/or fixed...
Debian/stretch...Debian/bullseye with no fresh installation. Rolling release.

I remember my decision in March 2012 not to choose that Asus notebook
with the first hardware-revision of IvyBridge and bought
conservatively a SandyBridge Gen. 2 Samsung notebook.

It's a pity to see no or restricted/limited Vulkan support.

If you are not concerned - life goes on for you.

It's like being white colored not understanding what "Black Lives
Matter" really means.
If people use or talk about white/black listings then allow/deny lists.
Or being a female software developer having a 10-15% less salary
because you are not male - in the same department!
This week we had our 100th anniversary of International Women's Day.
I am not black - I am male - I am not concerned - Live goes on?

Again, this machine is able to do fast Linux-kernel builds with an
adapted Debian Linux v5.10 kernel-config.
If you do NOT use Debian's LLVM/Clang - means build a selfmade
stage1-only LLVM toolchain (saves ~1 hour of build-time) - or a
ThinLTO+PGO optimized LLVM toolchain (saves again ~1 hour of
build-time).
Latest Linus Git plus With Clang-CFI took me today approx. 04:20
[hh:mm] with a selfmade stage1-only LLVM toolchain version 12.0.0-rc3.
Again, this is amazing.

What I wanna try to say is:
This is old hardware but you can - if you are a smart enough -
optimize your builds.

On the other hand I can understand dropping support for XXX whatever hardware...
Where is the limit(ation):
Support 10 years or 7 years old hardware?

Sorry, I am a bit concerned that this is the beginning - or a backdoor
? - to drop (optimized) Intel SandyBridge support.

So, what do I need to do - to have "ultimate performance" back for
SandyBridge with your patchset :-)?

Yes, you are right: Life goes on.

Regards,
- Sedat -


> This gives us deterministic NOPs and restores sanity.
>
>
>
> [1] https://lkml.kernel.org/r/[email protected]
>

2021-03-12 14:49:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> What does this change exactly mean to/for me?

Probably nothing.

I would be very surprised if it would be at all noticeable for you -
it's not like the kernel is executing long streams of NOPs in fast
paths.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-12 17:28:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, 12 Mar 2021 15:47:26 +0100
Borislav Petkov <[email protected]> wrote:

> On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> > What does this change exactly mean to/for me?
>
> Probably nothing.
>
> I would be very surprised if it would be at all noticeable for you -
> it's not like the kernel is executing long streams of NOPs in fast
> paths.
>

With ftrace enabled, every function starts with a NOP. But that said, the
simple answer is for Sedat to apply the patches on his box and do some
performance testing. It doesn't matter if you are white, black, male,
female, or anything in between. As my daughter's swim coach said; it's the
numbers that matter here. Run a bunch of benchmarks on your box on the
latest kernel, apply Peter's patches, and then run the benchmarks again on
the latest kernel with Peter's patches and then report the difference. If
it's negligible then there's nothing to worry about.

-- Steve

2021-03-12 17:38:12

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 6:26 PM Steven Rostedt <[email protected]> wrote:
>
> On Fri, 12 Mar 2021 15:47:26 +0100
> Borislav Petkov <[email protected]> wrote:
>
> > On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> > > What does this change exactly mean to/for me?
> >
> > Probably nothing.
> >
> > I would be very surprised if it would be at all noticeable for you -
> > it's not like the kernel is executing long streams of NOPs in fast
> > paths.
> >
>
> With ftrace enabled, every function starts with a NOP. But that said, the
> simple answer is for Sedat to apply the patches on his box and do some
> performance testing. It doesn't matter if you are white, black, male,
> female, or anything in between. As my daughter's swim coach said; it's the
> numbers that matter here. Run a bunch of benchmarks on your box on the
> latest kernel, apply Peter's patches, and then run the benchmarks again on
> the latest kernel with Peter's patches and then report the difference. If
> it's negligible then there's nothing to worry about.
>

Hey Steve, you degraded me to a number :-).

I dunno which Git tree this patchset applies to, but I check if I can
apply the patchset to my current local Git.
Then build a kernel in the same build-environment.
Lemme see.

To say with Linus's words:
"Numbers talk - bullshit walks."

- Sedat -

2021-03-12 17:49:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, 12 Mar 2021 18:35:45 +0100
Sedat Dilek <[email protected]> wrote:


> Hey Steve, you degraded me to a number :-).

It's the internet, everyone is a number.

>
> I dunno which Git tree this patchset applies to, but I check if I can
> apply the patchset to my current local Git.

Try Linus's latest.

> Then build a kernel in the same build-environment.
> Lemme see.
>
> To say with Linus's words:
> "Numbers talk - bullshit walks."

Exactly.

-- Steve

2021-03-12 17:49:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 06:35:45PM +0100, Sedat Dilek wrote:
> Hey Steve, you degraded me to a number :-).

How did he degrade you to a number?! Actually, he went the length to
patiently explain what you could do.

> I dunno which Git tree this patchset applies to, but I check if I can

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

master branch.

> apply the patchset to my current local Git.
> Then build a kernel in the same build-environment.

Yes, what Steve said. You can run some benchmarks and compare
before/after numbers.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-12 18:15:28

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 6:47 PM Steven Rostedt <[email protected]> wrote:
>
> On Fri, 12 Mar 2021 18:35:45 +0100
> Sedat Dilek <[email protected]> wrote:
>
>
> > Hey Steve, you degraded me to a number :-).
>
> It's the internet, everyone is a number.
>
> >
> > I dunno which Git tree this patchset applies to, but I check if I can
> > apply the patchset to my current local Git.
>
> Try Linus's latest.
>

$ git describe origin/HEAD
v5.12-rc2-338-gf78d76e72a46

I adapted 1/2 in arch/x86/include/asm/jump_label.h to fit ^^^, see attachment.

- Sedat -




> > Then build a kernel in the same build-environment.
> > Lemme see.
> >
> > To say with Linus's words:
> > "Numbers talk - bullshit walks."
>
> Exactly.
>
> -- Steve


Attachments:
20210312_peterz_x86_remove_ideal_nops-dileks-v2.mbx (25.42 kB)

2021-03-12 19:05:44

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 7:13 PM Sedat Dilek <[email protected]> wrote:
>
> On Fri, Mar 12, 2021 at 6:47 PM Steven Rostedt <[email protected]> wrote:
> >
> > On Fri, 12 Mar 2021 18:35:45 +0100
> > Sedat Dilek <[email protected]> wrote:
> >
> >
> > > Hey Steve, you degraded me to a number :-).
> >
> > It's the internet, everyone is a number.
> >
> > >
> > > I dunno which Git tree this patchset applies to, but I check if I can
> > > apply the patchset to my current local Git.
> >
> > Try Linus's latest.
> >
>
> $ git describe origin/HEAD
> v5.12-rc2-338-gf78d76e72a46
>
> I adapted 1/2 in arch/x86/include/asm/jump_label.h to fit ^^^, see attachment.
>

Forget this.

With latest Linus Git you need to apply "x86/jump_label: Mark
arguments as const to satisfy asm constraints" from tip Git.

- Sedat -

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

>
> > > Then build a kernel in the same build-environment.
> > > Lemme see.
> > >
> > > To say with Linus's words:
> > > "Numbers talk - bullshit walks."
> >
> > Exactly.
> >
> > -- Steve

2021-03-12 21:01:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> simply irrelevant today, remove variable NOPs and use NOPL.

Just ran them on my SNB box:

cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
stepping : 7

with the usual perf stat kernel build workload with
CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
a NOP at its beginning when ftrace is disabled (thx Steve).

./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage

before: tip-master

Performance counter stats for 'make -s -j9 bzImage' (5 runs):

3,213,728.10 msec task-clock # 7.307 CPUs utilized ( +- 0.01% )
339,270 context-switches # 0.106 K/sec ( +- 0.09% )
31,472 cpu-migrations # 0.010 K/sec ( +- 0.64% )
62,070,684 page-faults # 0.019 M/sec ( +- 0.01% )
11,498,198,009,323 cycles # 3.578 GHz ( +- 0.01% ) (83.33%)
8,235,957,366,696 stalled-cycles-frontend # 71.63% frontend cycles idle ( +- 0.01% ) (83.33%)
5,976,456,688,814 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.02% ) (66.67%)
7,553,156,344,376 instructions # 0.66 insn per cycle
# 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,635,468,917,524 branches # 508.901 M/sec ( +- 0.00% ) (83.34%)
51,888,292,932 branch-misses # 3.17% of all branches ( +- 0.02% ) (83.33%)

439.809 +- 0.156 seconds time elapsed ( +- 0.04% )


after: tip-master-nops

Performance counter stats for 'make -s -j9 bzImage' (5 runs):

3,217,113.67 msec task-clock # 7.307 CPUs utilized ( +- 0.03% )
339,425 context-switches # 0.106 K/sec ( +- 0.20% )
31,724 cpu-migrations # 0.010 K/sec ( +- 0.54% )
62,027,130 page-faults # 0.019 M/sec ( +- 0.01% )
11,508,779,965,901 cycles # 3.577 GHz ( +- 0.03% ) (83.34%)
8,241,212,210,440 stalled-cycles-frontend # 71.61% frontend cycles idle ( +- 0.04% ) (83.33%)
5,982,615,533,177 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.06% ) (66.66%)
7,546,407,430,314 instructions # 0.66 insn per cycle
# 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,634,187,006,479 branches # 507.967 M/sec ( +- 0.00% ) (83.33%)
51,941,580,371 branch-misses # 3.18% of all branches ( +- 0.01% ) (83.33%)

440.266 +- 0.195 seconds time elapsed ( +- 0.04% )


So here's numbers talk, bullshit walks. And with those numbers no
bullshit can remain lingering around anyway.

Cheers!

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 05:43:06

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Fri, Mar 12, 2021 at 10:00 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> > Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> > simply irrelevant today, remove variable NOPs and use NOPL.
>
> Just ran them on my SNB box:
>
> cpu family : 6
> model : 45
> model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
> stepping : 7
>
> with the usual perf stat kernel build workload with
> CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
> a NOP at its beginning when ftrace is disabled (thx Steve).
>
> ./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage
>
> before: tip-master
>
> Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
> 3,213,728.10 msec task-clock # 7.307 CPUs utilized ( +- 0.01% )
> 339,270 context-switches # 0.106 K/sec ( +- 0.09% )
> 31,472 cpu-migrations # 0.010 K/sec ( +- 0.64% )
> 62,070,684 page-faults # 0.019 M/sec ( +- 0.01% )
> 11,498,198,009,323 cycles # 3.578 GHz ( +- 0.01% ) (83.33%)
> 8,235,957,366,696 stalled-cycles-frontend # 71.63% frontend cycles idle ( +- 0.01% ) (83.33%)
> 5,976,456,688,814 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.02% ) (66.67%)
> 7,553,156,344,376 instructions # 0.66 insn per cycle
> # 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
> 1,635,468,917,524 branches # 508.901 M/sec ( +- 0.00% ) (83.34%)
> 51,888,292,932 branch-misses # 3.17% of all branches ( +- 0.02% ) (83.33%)
>
> 439.809 +- 0.156 seconds time elapsed ( +- 0.04% )
>
>
> after: tip-master-nops
>
> Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
> 3,217,113.67 msec task-clock # 7.307 CPUs utilized ( +- 0.03% )
> 339,425 context-switches # 0.106 K/sec ( +- 0.20% )
> 31,724 cpu-migrations # 0.010 K/sec ( +- 0.54% )
> 62,027,130 page-faults # 0.019 M/sec ( +- 0.01% )
> 11,508,779,965,901 cycles # 3.577 GHz ( +- 0.03% ) (83.34%)
> 8,241,212,210,440 stalled-cycles-frontend # 71.61% frontend cycles idle ( +- 0.04% ) (83.33%)
> 5,982,615,533,177 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.06% ) (66.66%)
> 7,546,407,430,314 instructions # 0.66 insn per cycle
> # 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
> 1,634,187,006,479 branches # 507.967 M/sec ( +- 0.00% ) (83.33%)
> 51,941,580,371 branch-misses # 3.18% of all branches ( +- 0.01% ) (83.33%)
>
> 440.266 +- 0.195 seconds time elapsed ( +- 0.04% )
>
>
> So here's numbers talk, bullshit walks. And with those numbers no
> bullshit can remain lingering around anyway.
>

Here are my numbers.

My CPU:

cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz
stepping : 7

My base was Linus Git:

$ git describe master
v5.12-rc2-338-gf78d76e72a46

I used Peter's patchset plus a required pre-patch so that it cleanly
applies against Linus Git:

x86/jump_label: Mark arguments as const to satisfy asm constraints
x86: Remove dynamic NOP selection
objtool,x86: Use asm/nops.h

My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
on Debian/testing AMD64.

Patchset applied for a first build:

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
[email protected]
KBUILD_BUILD_TIMESTAMP=2021-03-12 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-7~bullseye+dileks1':

55605704.79 msec task-clock # 3.568 CPUs
utilized
8317406 context-switches # 0.150 K/sec
261843 cpu-migrations # 0.005 K/sec
288312867 page-faults # 0.005 M/sec
107642573933061 cycles # 1.936 GHz
82531165255218 stalled-cycles-frontend # 76.67% frontend
cycles idle
64932777217096 stalled-cycles-backend # 60.32% backend
cycles idle
59591288273663 instructions # 0.55 insn per
cycle
# 1.38 stalled
cycles per insn
10906545460023 branches # 196.141 M/sec
489809039153 branch-misses # 4.49% of all
branches

15582.829443660 seconds time elapsed

53102.403996000 seconds user
2547.134916000 seconds sys

Building on a kernel where above patchset was applied and booted into
and rebuild with the same code-base:

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-8-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
[email protected]
KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-8~bullseye+dileks1':

56976758.12 msec task-clock # 3.589 CPUs
utilized
8334519 context-switches # 0.146 K/sec
269340 cpu-migrations # 0.005 K/sec
288451841 page-faults # 0.005 M/sec
110795226760909 cycles # 1.945 GHz
85643743105935 stalled-cycles-frontend # 77.30% frontend
cycles idle
68146424096780 stalled-cycles-backend # 61.51% backend
cycles idle
59559370217381 instructions # 0.54 insn per
cycle
# 1.44 stalled
cycles per insn
10902087911812 branches # 191.343 M/sec
490447660403 branch-misses # 4.50% of all
branches

15875.267204283 seconds time elapsed

54502.552543000 seconds user
2519.914516000 seconds sys

Simply comparing the build-times:
~15583 vs. ~15875 means approx. 5mins more build-time.

Attached are my linux-configs and above mentioned build-times (in case
Gmail has truncated them).

- Sedat -


Attachments:
build-time_5.12.0-rc2-7-amd64-clang12-cfi.txt (1.31 kB)
config-5.12.0-rc2-7-amd64-clang12-cfi (233.78 kB)
config-5.12.0-rc2-8-amd64-clang12-cfi (233.78 kB)
build-time_5.12.0-rc2-8-amd64-clang12-cfi.txt (1.31 kB)
Download all attachments

2021-03-13 08:53:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 06:26:15AM +0100, Sedat Dilek wrote:
> x86/jump_label: Mark arguments as const to satisfy asm constraints

Where do I find this patch?

> x86: Remove dynamic NOP selection
> objtool,x86: Use asm/nops.h
>
> My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
> on Debian/testing AMD64.
>
> Patchset applied for a first build:
>
> Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
> PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
> KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza

There's a reason I have -s for silent in the build - printing output
during the build creates a *lot* of variance. And you have excessive
printing with V=1 and KBUILD_VERBOSE=1.

Also, you need to repeat those workloads a couple of times - one is not
enough. That's why I have --repeat 5 in there.

Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:

---
#!/bin/bash
echo $0

make -s clean
echo 3 > /proc/sys/vm/drop_caches
---

so that you can avoid pagecache influence.

Lemme rerun here with clang.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 11:29:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 09:49:23AM +0100, Borislav Petkov wrote:
> Lemme rerun here with clang.

clang11 is almost twice as slow as gcc but difference is still
negligible: ~0.6 seconds.

./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 LLVM=1 LLVM_IAS=1 bzImage

before:

Performance counter stats for 'make -s -j9 LLVM=1 LLVM_IAS=1 bzImage' (5 runs):

5,576,081.48 msec task-clock # 7.664 CPUs utilized ( +- 0.03% )
496,841 context-switches # 0.089 K/sec ( +- 0.11% )
30,245 cpu-migrations # 0.005 K/sec ( +- 0.53% )
49,702,714 page-faults # 0.009 M/sec ( +- 0.00% )
19,954,704,926,347 cycles # 3.579 GHz ( +- 0.02% ) (83.33%)
15,920,125,996,460 stalled-cycles-frontend # 79.78% frontend cycles idle ( +- 0.03% ) (83.33%)
13,177,812,137,935 stalled-cycles-backend # 66.04% backend cycles idle ( +- 0.04% ) (66.67%)
8,778,060,061,848 instructions # 0.44 insn per cycle
# 1.81 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,852,121,066,032 branches # 332.155 M/sec ( +- 0.00% ) (83.33%)
84,048,262,434 branch-misses # 4.54% of all branches ( +- 0.02% ) (83.33%)

727.572 +- 0.305 seconds time elapsed ( +- 0.04% )

after:

Performance counter stats for 'make -s -j9 LLVM=1 LLVM_IAS=1 bzImage' (5 runs):

5,581,654.38 msec task-clock # 7.665 CPUs utilized ( +- 0.01% )
496,274 context-switches # 0.089 K/sec ( +- 0.12% )
30,645 cpu-migrations # 0.005 K/sec ( +- 0.54% )
49,711,551 page-faults # 0.009 M/sec ( +- 0.01% )
19,968,933,753,686 cycles # 3.578 GHz ( +- 0.01% ) (83.33%)
15,925,776,797,854 stalled-cycles-frontend # 79.75% frontend cycles idle ( +- 0.01% ) (83.33%)
13,182,158,323,446 stalled-cycles-backend # 66.01% backend cycles idle ( +- 0.01% ) (66.67%)
8,778,619,885,119 instructions # 0.44 insn per cycle
# 1.81 stalled cycles per insn ( +- 0.00% ) (83.33%)
1,852,096,100,464 branches # 331.818 M/sec ( +- 0.01% ) (83.33%)
84,264,257,355 branch-misses # 4.55% of all branches ( +- 0.03% ) (83.33%)

728.2400 +- 0.0613 seconds time elapsed ( +- 0.01% )

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 12:12:40

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 9:51 AM Borislav Petkov <[email protected]> wrote:
>
> On Sat, Mar 13, 2021 at 06:26:15AM +0100, Sedat Dilek wrote:
> > x86/jump_label: Mark arguments as const to satisfy asm constraints
>
> Where do I find this patch?
>

Here we go:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

> > x86: Remove dynamic NOP selection
> > objtool,x86: Use asm/nops.h
> >
> > My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
> > on Debian/testing AMD64.
> >
> > Patchset applied for a first build:
> >
> > Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
> > PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
> > KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
>
> There's a reason I have -s for silent in the build - printing output
> during the build creates a *lot* of variance. And you have excessive
> printing with V=1 and KBUILD_VERBOSE=1.
>
> Also, you need to repeat those workloads a couple of times - one is not
> enough. That's why I have --repeat 5 in there.
>
> Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:
>
> ---
> #!/bin/bash
> echo $0
>
> make -s clean
> echo 3 > /proc/sys/vm/drop_caches
> ---
>
> so that you can avoid pagecache influence.
>

OK, I see.

- Sedat -

> Lemme rerun here with clang.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 12:17:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 01:10:29PM +0100, Sedat Dilek wrote:
> Here we go:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

That's why I told earlier you to use tip/master - that patch is already
in it and all you would've needed to do is to apply the two nop patches.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 12:44:28

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 1:15 PM Borislav Petkov <[email protected]> wrote:
>
> On Sat, Mar 13, 2021 at 01:10:29PM +0100, Sedat Dilek wrote:
> > Here we go:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8
>
> That's why I told earlier you to use tip/master - that patch is already
> in it and all you would've needed to do is to apply the two nop patches.
>

Thanks for all your testings and suggestions.

For me it was easier to apply these 3 patches on top of my custom
patchset to see what impact Peter's patchset.

AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?
I run my "normal" workflow(s) (and build-script) for easier comparison
on my side.

Big thank-you for testing with LLVM/Clang v11.x - twice as slow as with GCC :-(.
A selfmade ThinLTO+PGO optimized LLVM tooolchain v11.x/v12-rcX/v13-git
is here as fast as Debian's GCC-v10.2.1 to build a Linux-kernel -
approx. 03:30 [hh:mm] - full adapted Debian v5.10.y kernel-config.
Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
week) binaries?

- Sedat -

2021-03-13 12:52:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 01:38:22PM +0100, Sedat Dilek wrote:
> AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?

The tailored .config for that particular test box.

> Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
> week) binaries?

The partition on that box I used is debian testing, so no:

$ apt search llvm-1* 2>/dev/null | grep llvm-1
libllvm-11-ocaml-dev/testing,testing 1:11.0.1-2 amd64
llvm-10/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-dev/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-runtime/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-tools/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-11/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-dev/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-doc/testing,testing 1:11.0.1-2 all
llvm-11-examples/testing,testing 1:11.0.1-2 all
llvm-11-runtime/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-tools/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 13:01:47

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 1:49 PM Borislav Petkov <[email protected]> wrote:
>
> On Sat, Mar 13, 2021 at 01:38:22PM +0100, Sedat Dilek wrote:
> > AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?
>
> The tailored .config for that particular test box.
>
> > Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
> > week) binaries?
>
> The partition on that box I used is debian testing, so no:
>
> $ apt search llvm-1* 2>/dev/null | grep llvm-1
> libllvm-11-ocaml-dev/testing,testing 1:11.0.1-2 amd64
> llvm-10/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-dev/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-runtime/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-tools/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-11/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-dev/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-doc/testing,testing 1:11.0.1-2 all
> llvm-11-examples/testing,testing 1:11.0.1-2 all
> llvm-11-runtime/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-tools/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
>

You can add Debian/experimental APT sources.list ...

[ /etc/apt/sources.list.d/debian-experimental.list ]
deb http://ftp.debian.org/debian experimental main contrib non-free
deb https://deb.debian.org/debian experimental main non-free contrib

[ /etc/apt/preferences.d/99_debian-experimental.pref ]
Package: *
Pin: release o=Debian,a=experimental
Pin-Priority: 99

This gives LLVM/Clang v12 packages an APT prio of 99 - meaning no
auto-upgrade installations will be done.
You have full control by doing it manually.

Renew informations from APT repositories:

root# apt-get update

What clang-12 version is/are available?

root# apt-cache policy clang-12

Simulate an install (note: --no-install-recommends option):

root# apt-get install llvm-12 clang-12 lld-12 llvm-12-tools
--no-install-recommends -t experimental -s

option -s: simulate

Really do an installation of LLVM/Clang v12 stuff:

root# apt-get install llvm-12 clang-12 lld-12 llvm-12-tools
--no-install-recommends -t experimental -y

option -y: yes

If you like to test.

Of course you can use packages from <apt-llvm.org> repositories.
I can give you APT sources.list plus pref files if you desire.

Have more fun.

- Sedat -

2021-03-13 13:31:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 01:58:56PM +0100, Sedat Dilek wrote:
> You can add Debian/experimental APT sources.list ...

I could but I don't expect clang12 to behave any differently here.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-13 13:50:43

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 2:29 PM Borislav Petkov <[email protected]> wrote:
>
> On Sat, Mar 13, 2021 at 01:58:56PM +0100, Sedat Dilek wrote:
> > You can add Debian/experimental APT sources.list ...
>
> I could but I don't expect clang12 to behave any differently here.
>

Agreed in things of build-time.
There were some improvements and optimizations to LLVM/Clang but twice
as slow is really hard compared with GCC.

I was thinking more in the direction of "compatibility" of tip tree
with recent LLVM/Clang other than what is officially supported via
Kbuild-system.

Let me look if I will do a selfmade ThinLTO+PGO optimized LLVM
toolchain v12.0.0-rc3 this weekend.

- Sedat -

2021-03-15 17:09:07

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 13, 2021 at 2:47 PM Sedat Dilek <[email protected]> wrote:
[ ... ]
> Let me look if I will do a selfmade ThinLTO+PGO optimized LLVM
> toolchain v12.0.0-rc3 this weekend.
>

I did it.

Here some fresh numbers:

[ Selfmade LLVM toolchain v12.0.0-rc3 "stage1-only" ]
[ Host-Kernel: 5.12.0-rc2-8-amd64-clang12-cfi includes Peter's NOPS patchset ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-9-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
[email protected]
KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-9~bullseye+dileks1':

55936351.95 msec task-clock # 3.580 CPUs
utilized
8291848 context-switches # 0.148 K/sec
269686 cpu-migrations # 0.005 K/sec
288389721 page-faults # 0.005 M/sec
108344049253836 cycles # 1.937 GHz
83228135285263 stalled-cycles-frontend # 76.82% frontend
cycles idle
65616255370809 stalled-cycles-backend # 60.56% backend
cycles idle
59590373937199 instructions # 0.55 insn per
cycle
# 1.40 stalled
cycles per insn
10906265495505 branches # 194.976 M/sec
488578274434 branch-misses # 4.48% of all
branches

15622.926203302 seconds time elapsed

53453.974928000 seconds user
2526.773533000 seconds sys


[ Selfmade LLVM toolchain v12.0.0-rc3 "thinlto_pgo_optimized" ]
[ Host-Kernel: Debian's 5.10.19-1 kernel ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-10-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
[email protected]
KBUILD_BUILD_TIMESTAMP=2021-03-14 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-10~bullseye+dileks1':

40223080.69 msec task-clock # 3.434 CPUs
utilized
7438923 context-switches # 0.185 K/sec
245636 cpu-migrations # 0.006 K/sec
288073015 page-faults # 0.007 M/sec
77325441657129 cycles # 1.922 GHz
55357463522675 stalled-cycles-frontend # 71.59% frontend
cycles idle
38978871249074 stalled-cycles-backend # 50.41% backend
cycles idle
55178265045056 instructions # 0.71 insn per
cycle
# 1.00 stalled
cycles per insn
9749166033571 branches # 242.377 M/sec
431303563167 branch-misses # 4.42% of all
branches

11714.751645982 seconds time elapsed

37951.117840000 seconds user
2313.807151000 seconds sys


[ Selfmade LLVM toolchain v12.0.0-rc3 "thinlto_pgo_optimized" ]
[ Host-Kernel: 5.12.0-rc2-10-amd64-clang12-cfi includes Peter's NOPS patchset ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-1-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
[email protected]
KBUILD_BUILD_TIMESTAMP=2021-03-15 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc3-1~bullseye+dileks1':

40632207.25 msec task-clock # 3.406 CPUs
utilized
8216832 context-switches # 0.202 K/sec
277610 cpu-migrations # 0.007 K/sec
281331052 page-faults # 0.007 M/sec
77031538570411 cycles # 1.896 GHz
(83.33%)
55247905369487 stalled-cycles-frontend # 71.72% frontend
cycles idle (83.33%)
39046795510242 stalled-cycles-backend # 50.69% backend
cycles idle (66.67%)
54592585444704 instructions # 0.71 insn per
cycle
# 1.01 stalled
cycles per insn (83.33%)
9641589406714 branches # 237.289 M/sec
(83.33%)
435317273069 branch-misses # 4.51% of all
branches (83.33%)

11928.047003788 seconds time elapsed

38187.685111000 seconds user
2502.075987000 seconds sys

As said in an earlier email:
A ThinLTO+PGO optimized LLVM-toolchain saves here approx. 60mins of build-time.

Depending on the host-kernel including Peter's NOPS patchset: 3mins
longer build-time.
Brewing time of one single Turkish Tea bag.

Attached are the 3 build-time log-files.

- Sedat -


Attachments:
build-time_5.12.0-rc2-9-amd64-clang12-cfi.txt (1.31 kB)
build-time_5.12.0-rc2-10-amd64-clang12-cfi.txt (1.31 kB)
build-time_5.12.0-rc3-1-amd64-clang12-cfi.txt (1.37 kB)
Download all attachments

2021-03-15 17:17:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
> Here some fresh numbers:

Lemme paste my previous reply which still holds true here:

"There's a reason I have -s for silent in the build - printing output
during the build creates a *lot* of variance. And you have excessive
printing with V=1 and KBUILD_VERBOSE=1.

Also, you need to repeat those workloads a couple of times - one is not
enough. That's why I have --repeat 5 in there.

Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:

---
#!/bin/bash
echo $0

make -s clean
echo 3 > /proc/sys/vm/drop_caches
---

so that you can avoid pagecache influence."

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-15 17:21:53

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 6:15 PM Borislav Petkov <[email protected]> wrote:
>
> On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
> > Here some fresh numbers:
>
> Lemme paste my previous reply which still holds true here:
>
> "There's a reason I have -s for silent in the build - printing output
> during the build creates a *lot* of variance. And you have excessive
> printing with V=1 and KBUILD_VERBOSE=1.
>

I have this for diagnostic reasons.
Yes, I can drop V=1 and KBUILD_VERBOSE=1.
This is a good idea for a fast build.

> Also, you need to repeat those workloads a couple of times - one is not
> enough. That's why I have --repeat 5 in there.
>
> Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:
>
> ---
> #!/bin/bash
> echo $0
>
> make -s clean
> echo 3 > /proc/sys/vm/drop_caches
> ---
>
> so that you can avoid pagecache influence."
>

With my next build I try to apply this.

- Sedat -

2021-03-15 18:40:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 06:19:34PM +0100, Sedat Dilek wrote:
> With my next build I try to apply this.

Your perf tool command should look something like this:

perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 LLVM=1 LLVM_IAS=1 bzImage

Also, needless to say, your box needs to not run anything else during
the measurement.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-03-15 23:13:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:

> make V=1 -j4 LLVM=1 LLVM_IAS=1

So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs
when building with -march=sandybridge, they always use MOPL.

Furthermore, the kernel explicitly sets: -falign-jumps=1
-falign-loops=1, which, when not specified, default to 16 or so.

This means that your userspace is *littered* with NOPL, even when you
build your entire distro from source with -march=sandybridge.
(arch/gentoo FTW I suppose).

(The only good new is that recent LLVM has a pass to use alternative
instruction encoding in order to grow a basic block in size in order to
minimize the amount of NOP it needs to emit at the end in order to
satisfy the jump/loop alignment.)

So if you *really* deeply care about NOP performance on your SNB, start
by teaching LLVM about prefix NOPs and rebuild your complete userspace.
At that point, you can do some trivial patches to the kernel to make it
use -march=sandybridge and prefix NOPs too.

Until that time, the vast majority of NOPs your CPU will execute will be
NOPL.

2021-03-15 23:27:24

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 7:10 PM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
>
> > make V=1 -j4 LLVM=1 LLVM_IAS=1
>
> So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs
> when building with -march=sandybridge, they always use MOPL.
>
> Furthermore, the kernel explicitly sets: -falign-jumps=1
> -falign-loops=1, which, when not specified, default to 16 or so.
>
> This means that your userspace is *littered* with NOPL, even when you
> build your entire distro from source with -march=sandybridge.
> (arch/gentoo FTW I suppose).
>

That reminds me of the Git repo of the wireguard maintainer.

"x86: enable additional cpu optimizations for gcc v9.1+"

You mean something like that ^^?

- Sedat -

[1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686

> (The only good new is that recent LLVM has a pass to use alternative
> instruction encoding in order to grow a basic block in size in order to
> minimize the amount of NOP it needs to emit at the end in order to
> satisfy the jump/loop alignment.)
>
> So if you *really* deeply care about NOP performance on your SNB, start
> by teaching LLVM about prefix NOPs and rebuild your complete userspace.
> At that point, you can do some trivial patches to the kernel to make it
> use -march=sandybridge and prefix NOPs too.
>
> Until that time, the vast majority of NOPs your CPU will execute will be
> NOPL.

2021-03-16 05:50:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 07:23:29PM +0100, Sedat Dilek wrote:

> You mean something like that ^^?
>
> - Sedat -
>
> [1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686

*shudder*, I was more thinking you'd simply add it to you CFLAGS when
building. I don't see any point in having that in Kconfig.

2021-03-16 13:04:57

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Mon, Mar 15, 2021 at 11:14 PM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Mar 15, 2021 at 07:23:29PM +0100, Sedat Dilek wrote:
>
> > You mean something like that ^^?
> >
> > - Sedat -
> >
> > [1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686
>
> *shudder*, I was more thinking you'd simply add it to you CFLAGS when
> building. I don't see any point in having that in Kconfig.

Simply adding the CFLAGS to arch/x86/Makefile.

If I forgot to mention:

Tested-by: Sedat Dilek <[email protected]>. # LLVM/Clang v12.0.0-rc3

- Sedat -

2021-03-27 12:09:33

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

Out of curiosity I tried in my build-environment and my testing-rules
to have comparable numbers...

..without passing "V=1" and "KBUILD_VERBOSE=1" as make-options:

NOTE: Identical linux-config plus LLVM/Clang v12.0.0-rc3.

debian-5.10.19 as host-kernel:
11655.755564957 seconds time elapsed

dileks-5.12-rc3 plus x86-nops as host-kernel:
11941.439350080 seconds time elapsed

I compared the build-times only:
Approx. 04:45 [mm:ss] in the worst case.
( Brewing time of a strong Turkish tea-bag ~5mins. )

I will keep both make-options to see what's going on in my builds.

- Sedat -

2021-03-27 20:04:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 27, 2021 at 5:08 AM Sedat Dilek <[email protected]> wrote:
>
> debian-5.10.19 as host-kernel:
> 11655.755564957 seconds time elapsed
>
> dileks-5.12-rc3 plus x86-nops as host-kernel:
> 11941.439350080 seconds time elapsed

That's 2.5% - a huge difference. Particularly since kernel build times
shouldn't even be that kernel-intensive.

I think there's something else going on than the nops. Same config?
There are likely many other differences between 5.10.19 and 5.12-rc3.

So can you check just plain 5.12-rc3 and then 5.12-rc3 plus x86-nops,
with otherwise identical configuration?

Linus

2021-03-30 12:35:36

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: Remove ideal_nops[]

On Sat, Mar 27, 2021 at 9:02 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, Mar 27, 2021 at 5:08 AM Sedat Dilek <[email protected]> wrote:
> >
> > debian-5.10.19 as host-kernel:
> > 11655.755564957 seconds time elapsed
> >
> > dileks-5.12-rc3 plus x86-nops as host-kernel:
> > 11941.439350080 seconds time elapsed
>
> That's 2.5% - a huge difference. Particularly since kernel build times
> shouldn't even be that kernel-intensive.
>
> I think there's something else going on than the nops. Same config?
> There are likely many other differences between 5.10.19 and 5.12-rc3.
>
> So can you check just plain 5.12-rc3 and then 5.12-rc3 plus x86-nops,
> with otherwise identical configuration?
>

Hi Linus,

I re-checked my linux-config and custom patchset.

I had "kbuild: add CONFIG_VMLINUX_MAP expert option" in my queue and
build with CONFIG_VMLINUX_MAP=y.
This option generated here an approx. 30MiB big vmlinux.map file.
Cannot say how long this is taking in seconds but that can explain the
the time-diff.

[ The above option is helpful to analyze a recent Linux-kernel build
with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.
Always, I was able to build but not boot on bare metal with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.
With a LLVM toolchain, of course. ]

( In the meantime Debian has a 5.20.26 kernel released - so if you
want I can re-test with Linux v5.12-rc5. )

Regards,
- Sedat -

[1] https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git/commit/?h=kbuild&id=babd8cd96d333cb83c9b8abf4f01ab1f161d6ec4