2018-05-23 22:09:16

by Nick Desaulniers

[permalink] [raw]
Subject: [clang] stack protector and f1f029c7bf

H. Peter,

It was reported [0] that compiling the Linux kernel with Clang +
CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(), due to
how GCC does not emit a stack guard for static inline functions (see
Alistair's excellent report in [1]) but Clang does.

When working with the LLVM release maintainers, Tom had suggested [2]
changing the inline assembly constraint in native_save_fl() from '=rm' to
'=r', and Alistair had verified the disassembly:

(good) code generated w/o -fstack-protector-strong:

native_save_fl:
pushfq
popq -8(%rsp)
movq -8(%rsp), %rax
retq

(good) code generated w/ =r input constraint:

native_save_fl:
pushfq
popq %rax
retq

(bad) code generated with -fstack-protector-strong:

native_save_fl:
subq $24, %rsp
movq %fs:40, %rax
movq %rax, 16(%rsp)
pushfq
popq 8(%rsp)
movq 8(%rsp), %rax
movq %fs:40, %rcx
cmpq 16(%rsp), %rcx
jne .LBB0_2
addq $24, %rsp
retq
.LBB0_2:
callq __stack_chk_fail

It looks like the sugguestion is actually a revert of your commit:
ab94fcf528d127fcb490175512a8910f37e5b346:
x86: allow "=rm" in native_save_fl()

It seemed like there was a question internally about why worry about pop
adjusting the stack if the stack could be avoided altogether.

I think Sedat can retest this, but I was curious as well about the commit
message in ab94fcf528d: "[ Impact: performance ]", but Alistair's analysis
of the disassembly seems to indicate there is no performance impact (in
fact, looks better as there's one less mov).

Is there a reason we should not revert ab94fcf528d12, or maybe a better
approach?

[0] https://lkml.org/lkml/2018/5/7/534
[1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
[2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22
--
Thanks,
~Nick Desaulniers


2018-05-24 10:36:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 12:08 AM, Nick Desaulniers
<[email protected]> wrote:
> H. Peter,
>
> It was reported [0] that compiling the Linux kernel with Clang +
> CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(), due to
> how GCC does not emit a stack guard for static inline functions (see
> Alistair's excellent report in [1]) but Clang does.
>
> When working with the LLVM release maintainers, Tom had suggested [2]
> changing the inline assembly constraint in native_save_fl() from '=rm' to
> '=r', and Alistair had verified the disassembly:
>
> (good) code generated w/o -fstack-protector-strong:
>
> native_save_fl:
> pushfq
> popq -8(%rsp)
> movq -8(%rsp), %rax
> retq
>
> (good) code generated w/ =r input constraint:
>
> native_save_fl:
> pushfq
> popq %rax
> retq
>
> (bad) code generated with -fstack-protector-strong:
>
> native_save_fl:
> subq $24, %rsp
> movq %fs:40, %rax
> movq %rax, 16(%rsp)
> pushfq
> popq 8(%rsp)
> movq 8(%rsp), %rax
> movq %fs:40, %rcx
> cmpq 16(%rsp), %rcx
> jne .LBB0_2
> addq $24, %rsp
> retq
> .LBB0_2:
> callq __stack_chk_fail
>
> It looks like the sugguestion is actually a revert of your commit:
> ab94fcf528d127fcb490175512a8910f37e5b346:
> x86: allow "=rm" in native_save_fl()
>
> It seemed like there was a question internally about why worry about pop
> adjusting the stack if the stack could be avoided altogether.
>
> I think Sedat can retest this, but I was curious as well about the commit
> message in ab94fcf528d: "[ Impact: performance ]", but Alistair's analysis
> of the disassembly seems to indicate there is no performance impact (in
> fact, looks better as there's one less mov).
>
> Is there a reason we should not revert ab94fcf528d12, or maybe a better
> approach?
>
> [0] https://lkml.org/lkml/2018/5/7/534
> [1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
> [2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22

[ CC Linux/x86 folks ]

Hi,

with reverting...

commit ab94fcf528d127fcb490175512a8910f37e5b346
"x86: allow "=rm" in native_save_fl()"

...I had success to boot into a paravirtualized/strong-stackprotected
Linux-kernel v4.14.43 on *bare metal* (Lenovo ThinkPad T470).

For my experiments I used LLVM/Clang version 7~svn332830 from <apt.llvm.org>.
My host runs Debian/testing AMD64.

My patchset is against linux v4.14.43.
The base was Matthias (mka) "v4.14_ext" Git branch (which bases on
Linux v4.14 vanilla) [2].

[1] Testing patch for x86 (x86/paravirt/stackprotector)...

Revert "x86: allow "=rm" in native_save_fl()"

[2] Additional patches when using HOSTCC in make-line and wanna-build
XEN (stolen from Linus tree)...

x86: xen: remove the use of VLAIS
kbuild: clang: remove crufty HOSTCFLAGS

[3] Revert kbuild-fixes included in v4.14.43 and revert clang3/clang4
as I use clang7...

Revert "UPSTREAM: kbuild: fix linker feature test macros when cross
compiling with Clang"
Revert "BACKPORT: kbuild: Set KBUILD_CFLAGS before incl. arch Makefile"
Revert "BACKPORT: kbuild: disable clang's default use of -fmerge-all-constants"
Revert "CLANG4: crypto: arm64/aes-ce: Explicitly pass through assembler options"
Revert "CLANG4: kbuild: Add -meabi gnu to the clang parameters"
Revert "CLANG4: arm64: prefetch: Use __builtin_arm_prefetch() for clang"
Revert "CLANG4: Disable lkdtm when ftrace is enabled"
Revert "CLANG4: futex: don't optimize futex_detect_cmpxchg() on ARM64"
Revert "CLANG3: core: clang: work around x86 regparm / intrinsics bug"

My kernel-config, dmesg-log and qemu-log are attached.

Hope this helps.

Regards,
- Sedat -

[1] https://bugs.llvm.org/show_bug.cgi?id=37512#c26
[2] https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/sandbox/mka/llvm/v4.14_ext


Attachments:
qemu-log.txt (19.70 kB)
config-4.14.43-2-iniza-llvmlinux (192.50 kB)
dmesg_4.14.43-2-iniza-llvmlinux.txt (66.43 kB)
Download all attachments

2018-05-25 02:36:50

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 3:33 AM Sedat Dilek <[email protected]> wrote:
> with reverting...
> commit ab94fcf528d127fcb490175512a8910f37e5b346
> "x86: allow "=rm" in native_save_fl()"

> ...I had success to boot into a paravirtualized/strong-stackprotected
> Linux-kernel v4.14.43 on *bare metal* (Lenovo ThinkPad T470).

preping a patch in
https://bugs.llvm.org/show_bug.cgi?id=37512

will send shortly.
--
Thanks,
~Nick Desaulniers

2018-05-25 02:37:57

by Alistair Strachan

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

(Resent plain text)

On Thu, May 24, 2018 at 11:24 AM Nick Desaulniers <[email protected]>
wrote:

> On Thu, May 24, 2018 at 11:20 AM <[email protected]> wrote:
> > A stack canary on an *inlined* function? That's bound to break things
> elsewhere too sooner or later.

> But it's *not* inlined by GCC or Clang.

FWIW, GCC can also insert a stack guard in an out-of-lined inline function,
it just doesn't for this one. The -fstack-protector and
-fstack-protector-strong flags are heuristic and the heuristic does not
match between gcc and clang for this function. It is working on GCC purely
by chance.

> While the function is marked `static inline`, it's not in
> arch/x86/kernel/paravirt.o due to:
> arch/x86/kernel/paravirt.c:326

> 325 __visible struct pv_irq_ops pv_irq_ops = {

> 326 .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),

> see comparison of disassembly attached in:
> https://bugs.llvm.org/attachment.cgi?id=20338
> --
> Thanks,
> ~Nick Desaulniers

2018-05-25 02:38:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 23, 2018 3:08:19 PM PDT, Nick Desaulniers <[email protected]> wrote:
>H. Peter,
>
>It was reported [0] that compiling the Linux kernel with Clang +
>CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(), due
>to
>how GCC does not emit a stack guard for static inline functions (see
>Alistair's excellent report in [1]) but Clang does.
>
>When working with the LLVM release maintainers, Tom had suggested [2]
>changing the inline assembly constraint in native_save_fl() from '=rm'
>to
>'=r', and Alistair had verified the disassembly:
>
>(good) code generated w/o -fstack-protector-strong:
>
>native_save_fl:
> pushfq
> popq -8(%rsp)
> movq -8(%rsp), %rax
> retq
>
>(good) code generated w/ =r input constraint:
>
>native_save_fl:
> pushfq
> popq %rax
> retq
>
>(bad) code generated with -fstack-protector-strong:
>
>native_save_fl:
> subq $24, %rsp
> movq %fs:40, %rax
> movq %rax, 16(%rsp)
> pushfq
> popq 8(%rsp)
> movq 8(%rsp), %rax
> movq %fs:40, %rcx
> cmpq 16(%rsp), %rcx
> jne .LBB0_2
> addq $24, %rsp
> retq
>.LBB0_2:
> callq __stack_chk_fail
>
>It looks like the sugguestion is actually a revert of your commit:
>ab94fcf528d127fcb490175512a8910f37e5b346:
>x86: allow "=rm" in native_save_fl()
>
>It seemed like there was a question internally about why worry about
>pop
>adjusting the stack if the stack could be avoided altogether.
>
>I think Sedat can retest this, but I was curious as well about the
>commit
>message in ab94fcf528d: "[ Impact: performance ]", but Alistair's
>analysis
>of the disassembly seems to indicate there is no performance impact (in
>fact, looks better as there's one less mov).
>
>Is there a reason we should not revert ab94fcf528d12, or maybe a better
>approach?
>
>[0] https://lkml.org/lkml/2018/5/7/534
>[1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
>[2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22

A stack canary on an *inlined* function? That's bound to break things elsewhere too sooner or later.

It feels like *once again* clang is asking for the Linux kernel to change to paper over technical or compatibility problems in clang/LLVM. This is not exactly helping the feeling that we should just rip out any and all clang hacks and call it a loss.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 02:38:32

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 11:20 AM <[email protected]> wrote:
> A stack canary on an *inlined* function? That's bound to break things
elsewhere too sooner or later.

But it's *not* inlined by GCC or Clang.

While the function is marked `static inline`, it's not in
arch/x86/kernel/paravirt.o due to:
arch/x86/kernel/paravirt.c:326

325 __visible struct pv_irq_ops pv_irq_ops = {

326 .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),

see comparison of disassembly attached in:
https://bugs.llvm.org/attachment.cgi?id=20338
--
Thanks,
~Nick Desaulniers

2018-05-25 02:39:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 23, 2018 3:08:19 PM PDT, Nick Desaulniers <[email protected]> wrote:
>H. Peter,
>
>It was reported [0] that compiling the Linux kernel with Clang +
>CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(), due
>to
>how GCC does not emit a stack guard for static inline functions (see
>Alistair's excellent report in [1]) but Clang does.
>
>When working with the LLVM release maintainers, Tom had suggested [2]
>changing the inline assembly constraint in native_save_fl() from '=rm'
>to
>'=r', and Alistair had verified the disassembly:
>
>(good) code generated w/o -fstack-protector-strong:
>
>native_save_fl:
> pushfq
> popq -8(%rsp)
> movq -8(%rsp), %rax
> retq
>
>(good) code generated w/ =r input constraint:
>
>native_save_fl:
> pushfq
> popq %rax
> retq
>
>(bad) code generated with -fstack-protector-strong:
>
>native_save_fl:
> subq $24, %rsp
> movq %fs:40, %rax
> movq %rax, 16(%rsp)
> pushfq
> popq 8(%rsp)
> movq 8(%rsp), %rax
> movq %fs:40, %rcx
> cmpq 16(%rsp), %rcx
> jne .LBB0_2
> addq $24, %rsp
> retq
>.LBB0_2:
> callq __stack_chk_fail
>
>It looks like the sugguestion is actually a revert of your commit:
>ab94fcf528d127fcb490175512a8910f37e5b346:
>x86: allow "=rm" in native_save_fl()
>
>It seemed like there was a question internally about why worry about
>pop
>adjusting the stack if the stack could be avoided altogether.
>
>I think Sedat can retest this, but I was curious as well about the
>commit
>message in ab94fcf528d: "[ Impact: performance ]", but Alistair's
>analysis
>of the disassembly seems to indicate there is no performance impact (in
>fact, looks better as there's one less mov).
>
>Is there a reason we should not revert ab94fcf528d12, or maybe a better
>approach?
>
>[0] https://lkml.org/lkml/2018/5/7/534
>[1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
>[2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22

Ok, this is the *second* thing about LLVM-originated bug reports that drives me batty. When you *do* identify a real problem, you propose a paper over and/or talk about it as an LLVM issue and don't consider the often far bigger picture.

Issue 1: Fundamentally, the compiler is doing The Wrong Thing if it generates worse code with a less constrained =rm than with =r. That is a compiler optimization bug, period. The whole point with the less constrained option is to give the compiler the freedom of action.

You are claiming it doesn't buy us anything, but you are only looking at the paravirt case which is kind of "special" (in the short bus kind of way), and only because the compiler in question makes an incredibly stupid decision.

Issue 2: What you are flagging seems to be a far more fundamental problem, which would affect *any* use of push/pop in inline assembly. If that is true, we need to pull in the gcc people too and create an interface to let the compiler know that online assembly needs a certain number of stack slots. We do a lot of push/pop in assembly. The other option is to turn stack canary explicitly off for all such functions.

Issue 3: Let's face it, reading and writing the flags should be builtins, exactly because it has to do stack operations, which really means the compiler should be involved.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 02:41:36

by Tom Stellard

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/24/2018 11:19 AM, [email protected] wrote:
> On May 23, 2018 3:08:19 PM PDT, Nick Desaulniers <[email protected]> wrote:
>> H. Peter,
>>
>> It was reported [0] that compiling the Linux kernel with Clang +
>> CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(), due
>> to
>> how GCC does not emit a stack guard for static inline functions (see
>> Alistair's excellent report in [1]) but Clang does.
>>
>> When working with the LLVM release maintainers, Tom had suggested [2]
>> changing the inline assembly constraint in native_save_fl() from '=rm'
>> to
>> '=r', and Alistair had verified the disassembly:
>>
>> (good) code generated w/o -fstack-protector-strong:
>>
>> native_save_fl:
>> pushfq
>> popq -8(%rsp)
>> movq -8(%rsp), %rax
>> retq
>>
>> (good) code generated w/ =r input constraint:
>>
>> native_save_fl:
>> pushfq
>> popq %rax
>> retq
>>
>> (bad) code generated with -fstack-protector-strong:
>>
>> native_save_fl:
>> subq $24, %rsp
>> movq %fs:40, %rax
>> movq %rax, 16(%rsp)
>> pushfq
>> popq 8(%rsp)
>> movq 8(%rsp), %rax
>> movq %fs:40, %rcx
>> cmpq 16(%rsp), %rcx
>> jne .LBB0_2
>> addq $24, %rsp
>> retq
>> .LBB0_2:
>> callq __stack_chk_fail
>>
>> It looks like the sugguestion is actually a revert of your commit:
>> ab94fcf528d127fcb490175512a8910f37e5b346:
>> x86: allow "=rm" in native_save_fl()
>>
>> It seemed like there was a question internally about why worry about
>> pop
>> adjusting the stack if the stack could be avoided altogether.
>>
>> I think Sedat can retest this, but I was curious as well about the
>> commit
>> message in ab94fcf528d: "[ Impact: performance ]", but Alistair's
>> analysis
>> of the disassembly seems to indicate there is no performance impact (in
>> fact, looks better as there's one less mov).
>>
>> Is there a reason we should not revert ab94fcf528d12, or maybe a better
>> approach?
>>
>> [0] https://lkml.org/lkml/2018/5/7/534
>> [1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
>> [2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22
>
> A stack canary on an *inlined* function? That's bound to break things elsewhere too sooner or later.
>
> It feels like *once again* clang is asking for the Linux kernel to change to paper over technical or compatibility problems in clang/LLVM. This is not exactly helping the feeling that we should just rip out any and all clang hacks and call it a loss.
>

In this case this fix is working-around a bug in the kernel. The problem
here is that the caller of native_save_fl() assumes that it won't
clobber any of the general purpose register even though the function
is defined as a normal c function which tells the compiler that it's
OK to clobber the registers.

This is something that really needs to be fixed in the kernel. Relying
on heuristics of internal compiler algorithms in order for code to work
correctly is always going to lead to problems like this.

-Tom


2018-05-25 02:43:13

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 1:26 PM Nick Desaulniers <[email protected]>
wrote:
> On Thu, May 24, 2018 at 11:59 AM <[email protected]> wrote:
> > Issue 3: Let's face it, reading and writing the flags should be
builtins,
> exactly because it has to do stack operations, which really means the
> compiler should be involved.

> I'm happy to propose that as a feature request to llvm+gcc.

Oh, looks like both clang and gcc have:
__builtin_ia32_readeflags_u64()

https://godbolt.org/g/SwPjhq

Maybe native_save_fl() and native_restore_fl() should be replaced in the
kernel with
__builtin_ia32_readeflags_u64() and __builtin_ia32_writeeflags_u64()?
--
Thanks,
~Nick Desaulniers

2018-05-25 02:43:28

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 11:59 AM <[email protected]> wrote:
> Ok, this is the *second* thing about LLVM-originated bug reports that
drives me batty. When you *do* identify a real problem, you propose a paper
over and/or talk about it as an LLVM issue and don't consider the often far
bigger picture.

Considering the bigger picture is why we have this thread going. You have
much more context about the kernel than I or the llvm maintainers do. We
can't consider the bigger picture until we ask.

> Issue 1: Fundamentally,

Fundamentally, the Linux kernel should not rely on GCC's heuristics for
adding or not adding a stack protector to functions with custom calling
conventions that are not marked in any way to let the compiler know.

Should GCC change their heuristic, deciding to insert stack guards for
non-inlined but marked static inline functions for
-fstack-protector-strong, or the kernel decide to use
-fstack-protector-all, the paravirt code will break in the same way.

> You are claiming it doesn't buy us anything, but you are only looking at
the paravirt case which is kind of "special" (in the short bus kind of way),

That's fair. Is another possible solution to have paravirt maybe not use
native_save_fl() then, but its own non-static-inline-without-m-constraint
implementation?

> Issue 2: ... The other option is to turn stack canary explicitly off for
all such functions.

We're looking to add the compiler attribute no_stack_protector. It's added
in mainline clang, the llvm bug cited earlier is about getting it
backported into clang-6.0.1 release. Sedat has tested/verified a set of
patches to the kernel that use this new feature in:
https://marc.info/?l=linux-kernel&m=152697630812366&w=2

> Issue 3: Let's face it, reading and writing the flags should be builtins,
exactly because it has to do stack operations, which really means the
compiler should be involved.

I'm happy to propose that as a feature request to llvm+gcc.
--
Thanks,
~Nick Desaulniers

2018-05-25 02:44:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 24, 2018 12:49:56 PM PDT, Tom Stellard <[email protected]> wrote:
>On 05/24/2018 11:19 AM, [email protected] wrote:
>> On May 23, 2018 3:08:19 PM PDT, Nick Desaulniers
><[email protected]> wrote:
>>> H. Peter,
>>>
>>> It was reported [0] that compiling the Linux kernel with Clang +
>>> CC_STACKPROTECTOR_STRONG was causing a crash in native_save_fl(),
>due
>>> to
>>> how GCC does not emit a stack guard for static inline functions (see
>>> Alistair's excellent report in [1]) but Clang does.
>>>
>>> When working with the LLVM release maintainers, Tom had suggested
>[2]
>>> changing the inline assembly constraint in native_save_fl() from
>'=rm'
>>> to
>>> '=r', and Alistair had verified the disassembly:
>>>
>>> (good) code generated w/o -fstack-protector-strong:
>>>
>>> native_save_fl:
>>> pushfq
>>> popq -8(%rsp)
>>> movq -8(%rsp), %rax
>>> retq
>>>
>>> (good) code generated w/ =r input constraint:
>>>
>>> native_save_fl:
>>> pushfq
>>> popq %rax
>>> retq
>>>
>>> (bad) code generated with -fstack-protector-strong:
>>>
>>> native_save_fl:
>>> subq $24, %rsp
>>> movq %fs:40, %rax
>>> movq %rax, 16(%rsp)
>>> pushfq
>>> popq 8(%rsp)
>>> movq 8(%rsp), %rax
>>> movq %fs:40, %rcx
>>> cmpq 16(%rsp), %rcx
>>> jne .LBB0_2
>>> addq $24, %rsp
>>> retq
>>> .LBB0_2:
>>> callq __stack_chk_fail
>>>
>>> It looks like the sugguestion is actually a revert of your commit:
>>> ab94fcf528d127fcb490175512a8910f37e5b346:
>>> x86: allow "=rm" in native_save_fl()
>>>
>>> It seemed like there was a question internally about why worry about
>>> pop
>>> adjusting the stack if the stack could be avoided altogether.
>>>
>>> I think Sedat can retest this, but I was curious as well about the
>>> commit
>>> message in ab94fcf528d: "[ Impact: performance ]", but Alistair's
>>> analysis
>>> of the disassembly seems to indicate there is no performance impact
>(in
>>> fact, looks better as there's one less mov).
>>>
>>> Is there a reason we should not revert ab94fcf528d12, or maybe a
>better
>>> approach?
>>>
>>> [0] https://lkml.org/lkml/2018/5/7/534
>>> [1] https://bugs.llvm.org/show_bug.cgi?id=37512#c15
>>> [2] https://bugs.llvm.org/show_bug.cgi?id=37512#c22
>>
>> A stack canary on an *inlined* function? That's bound to break things
>elsewhere too sooner or later.
>>
>> It feels like *once again* clang is asking for the Linux kernel to
>change to paper over technical or compatibility problems in clang/LLVM.
>This is not exactly helping the feeling that we should just rip out any
>and all clang hacks and call it a loss.
>>
>
>In this case this fix is working-around a bug in the kernel. The
>problem
>here is that the caller of native_save_fl() assumes that it won't
>clobber any of the general purpose register even though the function
>is defined as a normal c function which tells the compiler that it's
>OK to clobber the registers.
>
>This is something that really needs to be fixed in the kernel. Relying
>on heuristics of internal compiler algorithms in order for code to work
>correctly is always going to lead to problems like this.
>
>-Tom

Ok, yet another problem (this time with paravirtualization, no surprise there.). Good, let's find the actual problems and fix them where they should be fixed.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 02:44:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 24, 2018 1:52:16 PM PDT, Nick Desaulniers <[email protected]> wrote:
>On Thu, May 24, 2018 at 1:26 PM Nick Desaulniers
><[email protected]>
>wrote:
>> On Thu, May 24, 2018 at 11:59 AM <[email protected]> wrote:
>> > Issue 3: Let's face it, reading and writing the flags should be
>builtins,
>> exactly because it has to do stack operations, which really means the
>> compiler should be involved.
>
>> I'm happy to propose that as a feature request to llvm+gcc.
>
>Oh, looks like both clang and gcc have:
>__builtin_ia32_readeflags_u64()
>
>https://godbolt.org/g/SwPjhq
>
>Maybe native_save_fl() and native_restore_fl() should be replaced in
>the
>kernel with
>__builtin_ia32_readeflags_u64() and __builtin_ia32_writeeflags_u64()?

It should, at least when the compiler supports them (we keep support in for old compilers back to about gcc 3.4 or 4.1 or so for x86, but they are intently special cases.)

For the PV case, I think we should create it as an explicit assembly function, meaning it should be an extern inline in C code. That way we're fixing the kernel right.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 02:44:45

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 1:52 PM Nick Desaulniers <[email protected]>
wrote:
> On Thu, May 24, 2018 at 1:26 PM Nick Desaulniers <[email protected]>
> wrote:
> > On Thu, May 24, 2018 at 11:59 AM <[email protected]> wrote:
> > > Issue 3: Let's face it, reading and writing the flags should be
> builtins,
> > exactly because it has to do stack operations, which really means the
> > compiler should be involved.

> > I'm happy to propose that as a feature request to llvm+gcc.

> Oh, looks like both clang and gcc have:
> __builtin_ia32_readeflags_u64()

> https://godbolt.org/g/SwPjhq

> Maybe native_save_fl() and native_restore_fl() should be replaced in the
> kernel with
> __builtin_ia32_readeflags_u64() and __builtin_ia32_writeeflags_u64()?
> --
> Thanks,
> ~Nick Desaulniers

Looks like those builtins got added to GCC around the 4.9 timeframe:
https://godbolt.org/g/9VS2E9

Problematically, it seems that GCC does not have __has_builtin to do
feature detection:
https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros
https://godbolt.org/g/oku8ux
--
Thanks,
~Nick Desaulniers

2018-05-25 02:44:50

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 2:12 PM Nick Desaulniers <[email protected]>
wrote:
> On Thu, May 24, 2018 at 1:52 PM Nick Desaulniers <[email protected]>
> wrote:
> > On Thu, May 24, 2018 at 1:26 PM Nick Desaulniers <
[email protected]>
> > wrote:
> > > On Thu, May 24, 2018 at 11:59 AM <[email protected]> wrote:
> > > > Issue 3: Let's face it, reading and writing the flags should be
> > builtins,
> > > exactly because it has to do stack operations, which really means the
> > > compiler should be involved.

> > > I'm happy to propose that as a feature request to llvm+gcc.

> > Oh, looks like both clang and gcc have:
> > __builtin_ia32_readeflags_u64()

> > https://godbolt.org/g/SwPjhq

> > Maybe native_save_fl() and native_restore_fl() should be replaced in the
> > kernel with
> > __builtin_ia32_readeflags_u64() and __builtin_ia32_writeeflags_u64()?
> > --
> > Thanks,
> > ~Nick Desaulniers

> Looks like those builtins got added to GCC around the 4.9 timeframe:
> https://godbolt.org/g/9VS2E9

> Problematically, it seems that GCC does not have __has_builtin to do
> feature detection:

https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros
> https://godbolt.org/g/oku8ux

Is there a more canonical way the kernel does feature detection that looks
better than:

diff --git a/arch/x86/include/asm/irqflags.h
b/arch/x86/include/asm/irqflags.h
index 89f08955fff7..90974b5d023c 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -13,6 +13,8 @@
* Interrupt control:
*/

+#ifdef GCC_VERSION
+#if GCC_VERSION < 40900
static inline unsigned long native_save_fl(void)
{
unsigned long flags;
@@ -38,6 +40,17 @@ static inline void native_restore_fl(unsigned long flags)
:"g" (flags)
:"memory", "cc");
}
+#else
+static inline unsigned long native_save_fl(void)
+{
+ return __builtin_ia32_readeflags_u64();
+}
+static inline void native_restore_fl(unsigned long flags)
+{
+ __builtin_ia32_writeeflags_u64(flags);
+}
+#endif
+

static inline void native_irq_disable(void)
{

--
Thanks,
~Nick Desaulniers

2018-05-25 02:45:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/24/18 14:12, Nick Desaulniers wrote:
>
> Looks like those builtins got added to GCC around the 4.9 timeframe:
> https://godbolt.org/g/9VS2E9
>
> Problematically, it seems that GCC does not have __has_builtin to do
> feature detection:
> https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros
> https://godbolt.org/g/oku8ux
>

We have support in the kernel build system for doing that, though.

I just added a concurring opinion to the gcc bug report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970

2018-05-25 02:45:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/24/18 14:27, Nick Desaulniers wrote:
> https://godbolt.org/g/oku8ux
>
> Is there a more canonical way the kernel does feature detection that looks
> better than:
>

Hm. I though we had, but it doesn't seem so. For cc we only seem to be
able to detect flags. For as we have rules that can detect new
instructions; something similar in Kbuild would be good.

> diff --git a/arch/x86/include/asm/irqflags.h
> b/arch/x86/include/asm/irqflags.h
> index 89f08955fff7..90974b5d023c 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -13,6 +13,8 @@
> * Interrupt control:
> */
>
> +#ifdef GCC_VERSION
> +#if GCC_VERSION < 40900

We don't actually test if GCC_VERSION is defined anywhere in the kernel.
We probably should explicitly define it to either 0 or some baseline
version if it is undefined. (Clang defines __GNUC__ so it will define
GCC_VERSION.)

-hpa

2018-05-25 02:46:03

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]> wrote:
> COMPILER AR: "=rm" should NEVER generate worse code than "=r". That is
> unequivocally a compiler bug.

Filed: https://bugs.llvm.org/show_bug.cgi?id=37583

> >> You are claiming it doesn't buy us anything, but you are only looking
at
> > the paravirt case which is kind of "special" (in the short bus kind of
way),
> >
> > That's fair. Is another possible solution to have paravirt maybe not
use
> > native_save_fl() then, but its own
non-static-inline-without-m-constraint
> > implementation?

> KERNEL AR: change native_save_fl() to an extern inline with an assembly
> out-of-line implementation, to satisfy the paravirt requirement that no
> GPRs other than %rax are clobbered.

i'm happy to add that, do you have a recommendation if it should go in an
existing .S file or a new one (and if so where/what shall I call it?).
--
Thanks,
~Nick Desaulniers

2018-05-25 02:46:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/24/18 13:26, Nick Desaulniers wrote:
>
> Considering the bigger picture is why we have this thread going. You have
> much more context about the kernel than I or the llvm maintainers do. We
> can't consider the bigger picture until we ask.
>

And this is correct. However, *every* LLVM request we get seem to be of
the form "if you apply this patch it will patch over one particular
instance of a problem with the kernel-LLVM."

There are two problems with that:

1. It focuses narrowly on an issue, which might (and more than a handful
of times have) mask much more fundamental, serious bugs, in both the
kernel and LLVM.

2. It makes it sound like an LLVM problem that wants to be hacked in the
kernel. More than a handful of times that is exactly what they have
been.

Look how many fundamental issues we have uncovered from this simple
issue, which would have gotten completely lost had we just applied this
patch.

>> Issue 1: Fundamentally,
>
> Fundamentally, the Linux kernel should not rely on GCC's heuristics for
> adding or not adding a stack protector to functions with custom calling
> conventions that are not marked in any way to let the compiler know.

This is true, although it is actually NOT the fundamental issue (see
below) but you also deleted the exact LLVM problem that I pointed out:

COMPILER AR: "=rm" should NEVER generate worse code than "=r". That is
unequivocally a compiler bug.

The other problem here is that omitting the stack canary is really not
the right thing to do, either. The right thing to do is to add a
mechanism for the compiler to know to reserve some number of stack
slots. Ideally this should work even if a function includes an inline
function or macro that contains the actual inline assembly.

COMPILER AR: Add a mechanism for the compiler to know that inline
assembly needs a certain amount of slack stack space. In the ideal
universe, this should ALSO mean the compiler offsets all "m" references
that contain references to the stack pointer.

KERNEL AR: Add those annotations to all cases of inline assembly that
modifies the stack pointer.

>> You are claiming it doesn't buy us anything, but you are only looking at
> the paravirt case which is kind of "special" (in the short bus kind of way),
>
> That's fair. Is another possible solution to have paravirt maybe not use
> native_save_fl() then, but its own non-static-inline-without-m-constraint
> implementation?

KERNEL AR: change native_save_fl() to an extern inline with an assembly
out-of-line implementation, to satisfy the paravirt requirement that no
GPRs other than %rax are clobbered.

>> Issue 3: Let's face it, reading and writing the flags should be builtins,
> exactly because it has to do stack operations, which really means the
> compiler should be involved.
>
> I'm happy to propose that as a feature request to llvm+gcc.

Being discussed elsewhere, of course.

-hpa



2018-05-25 02:47:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers <[email protected]> wrote:
>On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]> wrote:
>> COMPILER AR: "=rm" should NEVER generate worse code than "=r". That
>is
>> unequivocally a compiler bug.
>
>Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
>
>> >> You are claiming it doesn't buy us anything, but you are only
>looking
>at
>> > the paravirt case which is kind of "special" (in the short bus kind
>of
>way),
>> >
>> > That's fair. Is another possible solution to have paravirt maybe
>not
>use
>> > native_save_fl() then, but its own
>non-static-inline-without-m-constraint
>> > implementation?
>
>> KERNEL AR: change native_save_fl() to an extern inline with an
>assembly
>> out-of-line implementation, to satisfy the paravirt requirement that
>no
>> GPRs other than %rax are clobbered.
>
>i'm happy to add that, do you have a recommendation if it should go in
>an
>existing .S file or a new one (and if so where/what shall I call it?).

How about irqflags.c since that is what the .h file is called.

It should simply be:

push %rdi
popf
ret

pushf
pop %rax
ret

... but with all the regular assembly decorations, of course.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 08:24:38

by Sedat Dilek

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 10:26 PM, Nick Desaulniers
<[email protected]> wrote:
[...]
>> Issue 2: ... The other option is to turn stack canary explicitly off for
> all such functions.
>
> We're looking to add the compiler attribute no_stack_protector. It's added
> in mainline clang, the llvm bug cited earlier is about getting it
> backported into clang-6.0.1 release. Sedat has tested/verified a set of
> patches to the kernel that use this new feature in:
> https://marc.info/?l=linux-kernel&m=152697630812366&w=2
>
Hi Nick,

sorry, if I was not clear/precise on this.
You referenced the patches from my 1st tryouts which were wrong in the
sense of "did-not-compile".
I have attached the correct two patches.
The commit-bodies needs some more "massage" to quote Thomas Gleixner,
useful web-links and credits should be added also.
I appreciate your help here.

From my understanding...
The more correct approach in fixing the issue is to add Clang's
"no_stack_protector" function attribute support and mark
native_save_fl() accordingly.
The 2nd solution partly revert "x86: allow "=rm" in native_save_fl()"
is not favoured as the impact on other parts of the Linux-kernel are
not clear.
The 1st solution requires a Clang >= 7-svn331925.
Is that correct?

What does that mean in fixing the issue?
[ Linux-kernel side ]
I guess GCC and marking native_save_fl() should be OK?
After some rework, do you plan to push patches from the 1st solution
to Linus uptream?
[ LLVM/Clang side ]
What about backporting "no_stack_protector" to LLVM/Clang v6.0.1?

Thanks to all involved people.

Sunshiny greetings from North-West Germany,
- Sedat -


Attachments:
0001-compiler-clang.h-Add-no_stack_protector-function-att.patch (1.52 kB)
0002-x86-paravirt-Mark-native_save_fl-with-__nostackprote.patch (963.00 B)
Download all attachments

2018-05-25 16:28:19

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Thu, May 24, 2018 at 3:43 PM <[email protected]> wrote:
> On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers <[email protected]>
wrote:
> >On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]> wrote:
> >> COMPILER AR: "=rm" should NEVER generate worse code than "=r". That
> >is
> >> unequivocally a compiler bug.
> >
> >Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
> >
> >> >> You are claiming it doesn't buy us anything, but you are only
> >looking
> >at
> >> > the paravirt case which is kind of "special" (in the short bus kind
> >of
> >way),
> >> >
> >> > That's fair. Is another possible solution to have paravirt maybe
> >not
> >use
> >> > native_save_fl() then, but its own
> >non-static-inline-without-m-constraint
> >> > implementation?
> >
> >> KERNEL AR: change native_save_fl() to an extern inline with an
> >assembly
> >> out-of-line implementation, to satisfy the paravirt requirement that
> >no
> >> GPRs other than %rax are clobbered.
> >
> >i'm happy to add that, do you have a recommendation if it should go in
> >an
> >existing .S file or a new one (and if so where/what shall I call it?).

> How about irqflags.c since that is what the .h file is called.

> It should simply be:

> push %rdi
> popf
> ret

> pushf
> pop %rax
> ret

> ... but with all the regular assembly decorations, of course.

Something like the following?


diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
new file mode 100644
index 000000000000..59dc21bd3327
--- /dev/null
+++ b/arch/x86/kernel/irqflags.c
@@ -0,0 +1,24 @@
+#include <asm/asm.h>
+
+extern unsigned long native_save_fl(void);
+extern void native_restore_fl(unsigned long flags);
+
+asm(
+".pushsection .text;"
+".global native_save_fl;"
+".type native_save_fl, @function;"
+"native_save_fl:"
+"pushf;"
+"pop %" _ASM_AX ";"
+"ret;"
+".popsection");
+
+asm(
+".pushsection .text;"
+".global native_restore_fl;"
+".type native_restore_fl, @function;"
+"native_restore_fl:"
+"push %" _ASM_DI ";"
+"popf;"
+"ret;"
+".popsection");

And change the declaration in arch/x86/include/asm/irqflags.h to:
+extern inline unsigned long native_save_fl(void);
+extern inline void native_restore_fl(unsigned long flags);

This seems to work, but
1. arch/x86/boot/compressed/misc.o warns that native_save_fl() is never
defined (arch_local_save_flags() uses it). Does that mean
arch_local_save_flags(), and friends would also have to move to the newly
created .c file as well?
2. `extern inline` doesn't inline any instances (from what I can tell from
disassembling vmlinux). I think this is strictly worse. Don't we only want
pv_irq_ops.save_fl to be non-inlined in a way that no stack protector can
be added? If that's the case, should my assembly based implementation have
a different identifier (`native_save_fl_paravirt` or something). That would
also fix point #1 above. But now the paravirt code has its own copy of the
function.
--
Thanks,
~Nick Desaulniers

2018-05-25 16:34:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers <[email protected]> wrote:
>On Thu, May 24, 2018 at 3:43 PM <[email protected]> wrote:
>> On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers
><[email protected]>
>wrote:
>> >On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]>
>wrote:
>> >> COMPILER AR: "=rm" should NEVER generate worse code than "=r".
>That
>> >is
>> >> unequivocally a compiler bug.
>> >
>> >Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
>> >
>> >> >> You are claiming it doesn't buy us anything, but you are only
>> >looking
>> >at
>> >> > the paravirt case which is kind of "special" (in the short bus
>kind
>> >of
>> >way),
>> >> >
>> >> > That's fair. Is another possible solution to have paravirt
>maybe
>> >not
>> >use
>> >> > native_save_fl() then, but its own
>> >non-static-inline-without-m-constraint
>> >> > implementation?
>> >
>> >> KERNEL AR: change native_save_fl() to an extern inline with an
>> >assembly
>> >> out-of-line implementation, to satisfy the paravirt requirement
>that
>> >no
>> >> GPRs other than %rax are clobbered.
>> >
>> >i'm happy to add that, do you have a recommendation if it should go
>in
>> >an
>> >existing .S file or a new one (and if so where/what shall I call
>it?).
>
>> How about irqflags.c since that is what the .h file is called.
>
>> It should simply be:
>
>> push %rdi
>> popf
>> ret
>
>> pushf
>> pop %rax
>> ret
>
>> ... but with all the regular assembly decorations, of course.
>
>Something like the following?
>
>
>diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
>new file mode 100644
>index 000000000000..59dc21bd3327
>--- /dev/null
>+++ b/arch/x86/kernel/irqflags.c
>@@ -0,0 +1,24 @@
>+#include <asm/asm.h>
>+
>+extern unsigned long native_save_fl(void);
>+extern void native_restore_fl(unsigned long flags);
>+
>+asm(
>+".pushsection .text;"
>+".global native_save_fl;"
>+".type native_save_fl, @function;"
>+"native_save_fl:"
>+"pushf;"
>+"pop %" _ASM_AX ";"
>+"ret;"
>+".popsection");
>+
>+asm(
>+".pushsection .text;"
>+".global native_restore_fl;"
>+".type native_restore_fl, @function;"
>+"native_restore_fl:"
>+"push %" _ASM_DI ";"
>+"popf;"
>+"ret;"
>+".popsection");
>
>And change the declaration in arch/x86/include/asm/irqflags.h to:
>+extern inline unsigned long native_save_fl(void);
>+extern inline void native_restore_fl(unsigned long flags);
>
>This seems to work, but
>1. arch/x86/boot/compressed/misc.o warns that native_save_fl() is never
>defined (arch_local_save_flags() uses it). Does that mean
>arch_local_save_flags(), and friends would also have to move to the
>newly
>created .c file as well?
>2. `extern inline` doesn't inline any instances (from what I can tell
>from
>disassembling vmlinux). I think this is strictly worse. Don't we only
>want
>pv_irq_ops.save_fl to be non-inlined in a way that no stack protector
>can
>be added? If that's the case, should my assembly based implementation
>have
>a different identifier (`native_save_fl_paravirt` or something). That
>would
>also fix point #1 above. But now the paravirt code has its own copy of
>the
>function.

Sorry, I meant irqflags.S.

It still should be available as as inline, however, but now "extern inline".
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 16:35:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers <[email protected]> wrote:
>On Thu, May 24, 2018 at 3:43 PM <[email protected]> wrote:
>> On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers
><[email protected]>
>wrote:
>> >On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]>
>wrote:
>> >> COMPILER AR: "=rm" should NEVER generate worse code than "=r".
>That
>> >is
>> >> unequivocally a compiler bug.
>> >
>> >Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
>> >
>> >> >> You are claiming it doesn't buy us anything, but you are only
>> >looking
>> >at
>> >> > the paravirt case which is kind of "special" (in the short bus
>kind
>> >of
>> >way),
>> >> >
>> >> > That's fair. Is another possible solution to have paravirt
>maybe
>> >not
>> >use
>> >> > native_save_fl() then, but its own
>> >non-static-inline-without-m-constraint
>> >> > implementation?
>> >
>> >> KERNEL AR: change native_save_fl() to an extern inline with an
>> >assembly
>> >> out-of-line implementation, to satisfy the paravirt requirement
>that
>> >no
>> >> GPRs other than %rax are clobbered.
>> >
>> >i'm happy to add that, do you have a recommendation if it should go
>in
>> >an
>> >existing .S file or a new one (and if so where/what shall I call
>it?).
>
>> How about irqflags.c since that is what the .h file is called.
>
>> It should simply be:
>
>> push %rdi
>> popf
>> ret
>
>> pushf
>> pop %rax
>> ret
>
>> ... but with all the regular assembly decorations, of course.
>
>Something like the following?
>
>
>diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
>new file mode 100644
>index 000000000000..59dc21bd3327
>--- /dev/null
>+++ b/arch/x86/kernel/irqflags.c
>@@ -0,0 +1,24 @@
>+#include <asm/asm.h>
>+
>+extern unsigned long native_save_fl(void);
>+extern void native_restore_fl(unsigned long flags);
>+
>+asm(
>+".pushsection .text;"
>+".global native_save_fl;"
>+".type native_save_fl, @function;"
>+"native_save_fl:"
>+"pushf;"
>+"pop %" _ASM_AX ";"
>+"ret;"
>+".popsection");
>+
>+asm(
>+".pushsection .text;"
>+".global native_restore_fl;"
>+".type native_restore_fl, @function;"
>+"native_restore_fl:"
>+"push %" _ASM_DI ";"
>+"popf;"
>+"ret;"
>+".popsection");
>
>And change the declaration in arch/x86/include/asm/irqflags.h to:
>+extern inline unsigned long native_save_fl(void);
>+extern inline void native_restore_fl(unsigned long flags);
>
>This seems to work, but
>1. arch/x86/boot/compressed/misc.o warns that native_save_fl() is never
>defined (arch_local_save_flags() uses it). Does that mean
>arch_local_save_flags(), and friends would also have to move to the
>newly
>created .c file as well?
>2. `extern inline` doesn't inline any instances (from what I can tell
>from
>disassembling vmlinux). I think this is strictly worse. Don't we only
>want
>pv_irq_ops.save_fl to be non-inlined in a way that no stack protector
>can
>be added? If that's the case, should my assembly based implementation
>have
>a different identifier (`native_save_fl_paravirt` or something). That
>would
>also fix point #1 above. But now the paravirt code has its own copy of
>the
>function.

You keep compiling with paravirtualization enabled... that code will not do any inlining.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 16:48:35

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers <[email protected]>
wrote:
> >On Thu, May 24, 2018 at 3:43 PM <[email protected]> wrote:
> >> On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers
> ><[email protected]>
> >wrote:
> >> >On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]>
> >wrote:
> >> >> COMPILER AR: "=rm" should NEVER generate worse code than "=r".
> >That
> >> >is
> >> >> unequivocally a compiler bug.
> >> >
> >> >Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
> >> >
> >> >> >> You are claiming it doesn't buy us anything, but you are only
> >> >looking
> >> >at
> >> >> > the paravirt case which is kind of "special" (in the short bus
> >kind
> >> >of
> >> >way),
> >> >> >
> >> >> > That's fair. Is another possible solution to have paravirt
> >maybe
> >> >not
> >> >use
> >> >> > native_save_fl() then, but its own
> >> >non-static-inline-without-m-constraint
> >> >> > implementation?
> >> >
> >> >> KERNEL AR: change native_save_fl() to an extern inline with an
> >> >assembly
> >> >> out-of-line implementation, to satisfy the paravirt requirement
> >that
> >> >no
> >> >> GPRs other than %rax are clobbered.
> >> >
> >> >i'm happy to add that, do you have a recommendation if it should go
> >in
> >> >an
> >> >existing .S file or a new one (and if so where/what shall I call
> >it?).
> >
> >> How about irqflags.c since that is what the .h file is called.
> >
> >> It should simply be:
> >
> >> push %rdi
> >> popf
> >> ret
> >
> >> pushf
> >> pop %rax
> >> ret
> >
> >> ... but with all the regular assembly decorations, of course.
> >
> >Something like the following?
> >
> >
> >diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
> >new file mode 100644
> >index 000000000000..59dc21bd3327
> >--- /dev/null
> >+++ b/arch/x86/kernel/irqflags.c
> >@@ -0,0 +1,24 @@
> >+#include <asm/asm.h>
> >+
> >+extern unsigned long native_save_fl(void);
> >+extern void native_restore_fl(unsigned long flags);
> >+
> >+asm(
> >+".pushsection .text;"
> >+".global native_save_fl;"
> >+".type native_save_fl, @function;"
> >+"native_save_fl:"
> >+"pushf;"
> >+"pop %" _ASM_AX ";"
> >+"ret;"
> >+".popsection");
> >+
> >+asm(
> >+".pushsection .text;"
> >+".global native_restore_fl;"
> >+".type native_restore_fl, @function;"
> >+"native_restore_fl:"
> >+"push %" _ASM_DI ";"
> >+"popf;"
> >+"ret;"
> >+".popsection");
> >
> >And change the declaration in arch/x86/include/asm/irqflags.h to:
> >+extern inline unsigned long native_save_fl(void);
> >+extern inline void native_restore_fl(unsigned long flags);
> >
> >This seems to work, but
> >1. arch/x86/boot/compressed/misc.o warns that native_save_fl() is never
> >defined (arch_local_save_flags() uses it). Does that mean
> >arch_local_save_flags(), and friends would also have to move to the
> >newly
> >created .c file as well?
> >2. `extern inline` doesn't inline any instances (from what I can tell
> >from
> >disassembling vmlinux). I think this is strictly worse. Don't we only
> >want
> >pv_irq_ops.save_fl to be non-inlined in a way that no stack protector
> >can
> >be added? If that's the case, should my assembly based implementation
> >have
> >a different identifier (`native_save_fl_paravirt` or something). That
> >would
> >also fix point #1 above. But now the paravirt code has its own copy of
> >the
> >function.

> Sorry, I meant irqflags.S.

> It still should be available as as inline, however, but now "extern
inline".

Heh, ok I was confused. But in testing, I had also created:

arch/x86/lib/irqflags.S
/* SPDX-License-Identifier: GPL-2.0 */

#include <asm/asm.h>
#include <asm/export.h>
#include <linux/linkage.h>

/*
* unsigned long native_save_fl(void)
*/
ENTRY(native_save_fl)
pushf
pop %_ASM_AX
ret
ENDPROC(native_save_fl)
EXPORT_SYMBOL(native_save_fl)

/*
* void native_restore_fl(unsigned long flags)
* %rdi: flags
*/
ENTRY(native_restore_fl)
push %_ASM_DI
popf
ret
ENDPROC(native_restore_fl)
EXPORT_SYMBOL(native_restore_fl)

The issue is that this still has issues 1 & 2 listed earlier (and the
disassembly has a lot more trailing nops added).

When you say

> It still should be available as as inline, however, but now "extern
inline".

Am I understanding correctly that native_save_fl should be inlined into all
call sites (modulo the problematic pv_irq_ops.save_fl case)? Because for
these two assembly implementations, it's not, but maybe there's something
missing in my implementation?
--
Thanks,
~Nick Desaulniers

2018-05-25 16:55:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <[email protected]> wrote:
>On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
>> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
><[email protected]>
>wrote:
>> >On Thu, May 24, 2018 at 3:43 PM <[email protected]> wrote:
>> >> On May 24, 2018 3:31:05 PM PDT, Nick Desaulniers
>> ><[email protected]>
>> >wrote:
>> >> >On Thu, May 24, 2018 at 3:05 PM H. Peter Anvin <[email protected]>
>> >wrote:
>> >> >> COMPILER AR: "=rm" should NEVER generate worse code than "=r".
>> >That
>> >> >is
>> >> >> unequivocally a compiler bug.
>> >> >
>> >> >Filed: https://bugs.llvm.org/show_bug.cgi?id=37583
>> >> >
>> >> >> >> You are claiming it doesn't buy us anything, but you are
>only
>> >> >looking
>> >> >at
>> >> >> > the paravirt case which is kind of "special" (in the short
>bus
>> >kind
>> >> >of
>> >> >way),
>> >> >> >
>> >> >> > That's fair. Is another possible solution to have paravirt
>> >maybe
>> >> >not
>> >> >use
>> >> >> > native_save_fl() then, but its own
>> >> >non-static-inline-without-m-constraint
>> >> >> > implementation?
>> >> >
>> >> >> KERNEL AR: change native_save_fl() to an extern inline with an
>> >> >assembly
>> >> >> out-of-line implementation, to satisfy the paravirt requirement
>> >that
>> >> >no
>> >> >> GPRs other than %rax are clobbered.
>> >> >
>> >> >i'm happy to add that, do you have a recommendation if it should
>go
>> >in
>> >> >an
>> >> >existing .S file or a new one (and if so where/what shall I call
>> >it?).
>> >
>> >> How about irqflags.c since that is what the .h file is called.
>> >
>> >> It should simply be:
>> >
>> >> push %rdi
>> >> popf
>> >> ret
>> >
>> >> pushf
>> >> pop %rax
>> >> ret
>> >
>> >> ... but with all the regular assembly decorations, of course.
>> >
>> >Something like the following?
>> >
>> >
>> >diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
>> >new file mode 100644
>> >index 000000000000..59dc21bd3327
>> >--- /dev/null
>> >+++ b/arch/x86/kernel/irqflags.c
>> >@@ -0,0 +1,24 @@
>> >+#include <asm/asm.h>
>> >+
>> >+extern unsigned long native_save_fl(void);
>> >+extern void native_restore_fl(unsigned long flags);
>> >+
>> >+asm(
>> >+".pushsection .text;"
>> >+".global native_save_fl;"
>> >+".type native_save_fl, @function;"
>> >+"native_save_fl:"
>> >+"pushf;"
>> >+"pop %" _ASM_AX ";"
>> >+"ret;"
>> >+".popsection");
>> >+
>> >+asm(
>> >+".pushsection .text;"
>> >+".global native_restore_fl;"
>> >+".type native_restore_fl, @function;"
>> >+"native_restore_fl:"
>> >+"push %" _ASM_DI ";"
>> >+"popf;"
>> >+"ret;"
>> >+".popsection");
>> >
>> >And change the declaration in arch/x86/include/asm/irqflags.h to:
>> >+extern inline unsigned long native_save_fl(void);
>> >+extern inline void native_restore_fl(unsigned long flags);
>> >
>> >This seems to work, but
>> >1. arch/x86/boot/compressed/misc.o warns that native_save_fl() is
>never
>> >defined (arch_local_save_flags() uses it). Does that mean
>> >arch_local_save_flags(), and friends would also have to move to the
>> >newly
>> >created .c file as well?
>> >2. `extern inline` doesn't inline any instances (from what I can
>tell
>> >from
>> >disassembling vmlinux). I think this is strictly worse. Don't we
>only
>> >want
>> >pv_irq_ops.save_fl to be non-inlined in a way that no stack
>protector
>> >can
>> >be added? If that's the case, should my assembly based
>implementation
>> >have
>> >a different identifier (`native_save_fl_paravirt` or something).
>That
>> >would
>> >also fix point #1 above. But now the paravirt code has its own copy
>of
>> >the
>> >function.
>
>> Sorry, I meant irqflags.S.
>
>> It still should be available as as inline, however, but now "extern
>inline".
>
>Heh, ok I was confused. But in testing, I had also created:
>
>arch/x86/lib/irqflags.S
>/* SPDX-License-Identifier: GPL-2.0 */
>
>#include <asm/asm.h>
>#include <asm/export.h>
>#include <linux/linkage.h>
>
>/*
> * unsigned long native_save_fl(void)
> */
>ENTRY(native_save_fl)
>pushf
>pop %_ASM_AX
>ret
>ENDPROC(native_save_fl)
>EXPORT_SYMBOL(native_save_fl)
>
>/*
> * void native_restore_fl(unsigned long flags)
> * %rdi: flags
> */
>ENTRY(native_restore_fl)
>push %_ASM_DI
>popf
>ret
>ENDPROC(native_restore_fl)
>EXPORT_SYMBOL(native_restore_fl)
>
>The issue is that this still has issues 1 & 2 listed earlier (and the
>disassembly has a lot more trailing nops added).
>
>When you say
>
>> It still should be available as as inline, however, but now "extern
>inline".
>
>Am I understanding correctly that native_save_fl should be inlined into
>all
>call sites (modulo the problematic pv_irq_ops.save_fl case)? Because
>for
>these two assembly implementations, it's not, but maybe there's
>something
>missing in my implementation?

Yes, that's what "extern inline" means. Maybe it needs a must inline annotation, but that's really messed up.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 17:33:58

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 9:53 AM <[email protected]> wrote:
> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <[email protected]>
wrote:
> >On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
> >> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
> ><[email protected]> wrote:
> >When you say
> >
> >> It still should be available as as inline, however, but now "extern
> >inline".
> >
> >Am I understanding correctly that native_save_fl should be inlined into
> >all
> >call sites (modulo the problematic pv_irq_ops.save_fl case)? Because
> >for
> >these two assembly implementations, it's not, but maybe there's
> >something
> >missing in my implementation?

> Yes, that's what "extern inline" means. Maybe it needs a must inline
annotation, but that's really messed up.

I don't think it's possible to inline a function from an external
translation unit without something like LTO.

If I move the implementation of native_save_fl() to a separate .c (with out
of line assembly) or .S, neither clang nor gcc will inline that assembly to
any call sites, whether the declaration of native_save_fl() looks like:

extern inline unsigned long native_save_fl(void);

or

__attribute__((always_inline))

extern inline unsigned long native_save_fl(void);

I think an external copy is the best approach for the paravirt code:

diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
new file mode 100644
index 000000000000..e173ba8bee7b
--- /dev/null
+++ b/arch/x86/kernel/irqflags.c
@@ -0,0 +1,24 @@
+#include <asm/asm.h>
+
+extern unsigned long native_save_fl_no_stack_protector(void);
+extern void native_restore_fl_no_stack_protector(unsigned long flags);
+
+asm(
+".pushsection .text;"
+".global native_save_fl_no_stack_protector;"
+".type native_save_fl_no_stack_protector, @function;"
+"native_save_fl_no_stack_protector:"
+"pushf;"
+"pop %" _ASM_AX ";"
+"ret;"
+".popsection");
+
+asm(
+".pushsection .text;"
+".global native_restore_fl_no_stack_protector;"
+".type native_restore_fl_no_stack_protector, @function;"
+"native_restore_fl_no_stack_protector:"
+"push %" _ASM_DI ";"
+"popf;"
+"ret;"
+".popsection");
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..8824d01c0c35 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -61,6 +61,7 @@ obj-y += alternative.o i8253.o
hw_breakpoint.o
obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
obj-y += pci-iommu_table.o
obj-y += resource.o
+obj-y += irqflags.o

obj-y += process.o
obj-y += fpu/
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -322,9 +322,12 @@ struct pv_time_ops pv_time_ops = {
.steal_clock = native_steal_clock,
};

+extern unsigned long native_save_fl_no_stack_protector(void);
+extern void native_restore_fl_no_stack_protector(unsigned long flags);
+
__visible struct pv_irq_ops pv_irq_ops = {
- .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
- .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
+ .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl_no_stack_protector),
+ .restore_fl =
__PV_IS_CALLEE_SAVE(native_restore_fl_no_stack_protector),
.irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
.safe_halt = native_safe_halt,

Thoughts?
--
Thanks,
~Nick Desaulniers

2018-05-25 17:35:46

by Tom Stellard

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/25/2018 10:31 AM, Nick Desaulniers wrote:
> On Fri, May 25, 2018 at 9:53 AM <[email protected]> wrote:
>> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <[email protected]>
> wrote:
>>> On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
>>>> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
>>> <[email protected]> wrote:
>>> When you say
>>>
>>>> It still should be available as as inline, however, but now "extern
>>> inline".
>>>
>>> Am I understanding correctly that native_save_fl should be inlined into
>>> all
>>> call sites (modulo the problematic pv_irq_ops.save_fl case)? Because
>>> for
>>> these two assembly implementations, it's not, but maybe there's
>>> something
>>> missing in my implementation?
>
>> Yes, that's what "extern inline" means. Maybe it needs a must inline
> annotation, but that's really messed up.
>

What about doing something like suggested here:
https://bugs.llvm.org/show_bug.cgi?id=37512#c17

This would keep the definition in C and make it easier for compilers
to inline.

-Tom

> I don't think it's possible to inline a function from an external
> translation unit without something like LTO.
>
> If I move the implementation of native_save_fl() to a separate .c (with out
> of line assembly) or .S, neither clang nor gcc will inline that assembly to
> any call sites, whether the declaration of native_save_fl() looks like:
>
> extern inline unsigned long native_save_fl(void);
>
> or
>
> __attribute__((always_inline))
>
> extern inline unsigned long native_save_fl(void);
>
> I think an external copy is the best approach for the paravirt code:
>
> diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
> new file mode 100644
> index 000000000000..e173ba8bee7b
> --- /dev/null
> +++ b/arch/x86/kernel/irqflags.c
> @@ -0,0 +1,24 @@
> +#include <asm/asm.h>
> +
> +extern unsigned long native_save_fl_no_stack_protector(void);
> +extern void native_restore_fl_no_stack_protector(unsigned long flags);
> +
> +asm(
> +".pushsection .text;"
> +".global native_save_fl_no_stack_protector;"
> +".type native_save_fl_no_stack_protector, @function;"
> +"native_save_fl_no_stack_protector:"
> +"pushf;"
> +"pop %" _ASM_AX ";"
> +"ret;"
> +".popsection");
> +
> +asm(
> +".pushsection .text;"
> +".global native_restore_fl_no_stack_protector;"
> +".type native_restore_fl_no_stack_protector, @function;"
> +"native_restore_fl_no_stack_protector:"
> +"push %" _ASM_DI ";"
> +"popf;"
> +"ret;"
> +".popsection");
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..8824d01c0c35 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -61,6 +61,7 @@ obj-y += alternative.o i8253.o
> hw_breakpoint.o
> obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
> obj-y += pci-iommu_table.o
> obj-y += resource.o
> +obj-y += irqflags.o
>
> obj-y += process.o
> obj-y += fpu/
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -322,9 +322,12 @@ struct pv_time_ops pv_time_ops = {
> .steal_clock = native_steal_clock,
> };
>
> +extern unsigned long native_save_fl_no_stack_protector(void);
> +extern void native_restore_fl_no_stack_protector(unsigned long flags);
> +
> __visible struct pv_irq_ops pv_irq_ops = {
> - .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
> - .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
> + .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl_no_stack_protector),
> + .restore_fl =
> __PV_IS_CALLEE_SAVE(native_restore_fl_no_stack_protector),
> .irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
> .irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
> .safe_halt = native_safe_halt,
>
> Thoughts?
>


2018-05-25 17:50:19

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 10:35 AM Tom Stellard <[email protected]> wrote:
> On 05/25/2018 10:31 AM, Nick Desaulniers wrote:
> > On Fri, May 25, 2018 at 9:53 AM <[email protected]> wrote:
> >> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <
[email protected]>
> > wrote:
> >>> On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
> >>>> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
> >>> <[email protected]> wrote:
> >>> When you say
> >>>
> >>>> It still should be available as as inline, however, but now "extern
> >>> inline".
> >>>
> >>> Am I understanding correctly that native_save_fl should be inlined
into
> >>> all
> >>> call sites (modulo the problematic pv_irq_ops.save_fl case)? Because
> >>> for
> >>> these two assembly implementations, it's not, but maybe there's
> >>> something
> >>> missing in my implementation?
> >
> >> Yes, that's what "extern inline" means. Maybe it needs a must inline
> > annotation, but that's really messed up.
> >

> What about doing something like suggested here:
> https://bugs.llvm.org/show_bug.cgi?id=37512#c17

> This would keep the definition in C and make it easier for compilers
> to inline.

The GCC docs for __attribute__((naked)) seem to imply this is a machine
specific constraint (of which x86 is not listed):
https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html

But let's try with source:
https://godbolt.org/g/aJ4gZB

Clang errors:
<source>:3:3: error: non-ASM statement in naked function is not supported
unsigned long flags;
^

Is it valid to use assembly to place the results in %rax and mark the c
function somehow?

gcc doesn't support this attribute until 4.9 (but we can add a feature test
for attributes with gcc (unlike builtins)), but warns that:

warning: ‘naked’ attribute directive ignored [-Wattributes]

gcc 8.1 and trunk inserts a `ud2` instruction (what?!) (let me see if I can
repro outside of godbolt, and will file a bug report).
--
Thanks,
~Nick Desaulniers

2018-05-25 17:56:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 25, 2018 10:31:51 AM PDT, Nick Desaulniers <[email protected]> wrote:
>On Fri, May 25, 2018 at 9:53 AM <[email protected]> wrote:
>> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers
><[email protected]>
>wrote:
>> >On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
>> >> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
>> ><[email protected]> wrote:
>> >When you say
>> >
>> >> It still should be available as as inline, however, but now
>"extern
>> >inline".
>> >
>> >Am I understanding correctly that native_save_fl should be inlined
>into
>> >all
>> >call sites (modulo the problematic pv_irq_ops.save_fl case)?
>Because
>> >for
>> >these two assembly implementations, it's not, but maybe there's
>> >something
>> >missing in my implementation?
>
>> Yes, that's what "extern inline" means. Maybe it needs a must inline
>annotation, but that's really messed up.
>
>I don't think it's possible to inline a function from an external
>translation unit without something like LTO.
>
>If I move the implementation of native_save_fl() to a separate .c (with
>out
>of line assembly) or .S, neither clang nor gcc will inline that
>assembly to
>any call sites, whether the declaration of native_save_fl() looks like:
>
>extern inline unsigned long native_save_fl(void);
>
>or
>
>__attribute__((always_inline))
>
>extern inline unsigned long native_save_fl(void);
>
>I think an external copy is the best approach for the paravirt code:
>
>diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
>new file mode 100644
>index 000000000000..e173ba8bee7b
>--- /dev/null
>+++ b/arch/x86/kernel/irqflags.c
>@@ -0,0 +1,24 @@
>+#include <asm/asm.h>
>+
>+extern unsigned long native_save_fl_no_stack_protector(void);
>+extern void native_restore_fl_no_stack_protector(unsigned long flags);
>+
>+asm(
>+".pushsection .text;"
>+".global native_save_fl_no_stack_protector;"
>+".type native_save_fl_no_stack_protector, @function;"
>+"native_save_fl_no_stack_protector:"
>+"pushf;"
>+"pop %" _ASM_AX ";"
>+"ret;"
>+".popsection");
>+
>+asm(
>+".pushsection .text;"
>+".global native_restore_fl_no_stack_protector;"
>+".type native_restore_fl_no_stack_protector, @function;"
>+"native_restore_fl_no_stack_protector:"
>+"push %" _ASM_DI ";"
>+"popf;"
>+"ret;"
>+".popsection");
>diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>index 02d6f5cf4e70..8824d01c0c35 100644
>--- a/arch/x86/kernel/Makefile
>+++ b/arch/x86/kernel/Makefile
>@@ -61,6 +61,7 @@ obj-y += alternative.o i8253.o
>hw_breakpoint.o
> obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
> obj-y += pci-iommu_table.o
> obj-y += resource.o
>+obj-y += irqflags.o
>
> obj-y += process.o
> obj-y += fpu/
>--- a/arch/x86/kernel/paravirt.c
>+++ b/arch/x86/kernel/paravirt.c
>@@ -322,9 +322,12 @@ struct pv_time_ops pv_time_ops = {
> .steal_clock = native_steal_clock,
> };
>
>+extern unsigned long native_save_fl_no_stack_protector(void);
>+extern void native_restore_fl_no_stack_protector(unsigned long flags);
>+
> __visible struct pv_irq_ops pv_irq_ops = {
>- .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
>- .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
>+ .save_fl =
>__PV_IS_CALLEE_SAVE(native_save_fl_no_stack_protector),
>+ .restore_fl =
>__PV_IS_CALLEE_SAVE(native_restore_fl_no_stack_protector),
> .irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
> .irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
> .safe_halt = native_safe_halt,
>
>Thoughts?

You need the extern inline in the .h file and the out-of-line .S file both.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 17:58:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On May 25, 2018 10:49:28 AM PDT, Nick Desaulniers <[email protected]> wrote:
>On Fri, May 25, 2018 at 10:35 AM Tom Stellard <[email protected]>
>wrote:
>> On 05/25/2018 10:31 AM, Nick Desaulniers wrote:
>> > On Fri, May 25, 2018 at 9:53 AM <[email protected]> wrote:
>> >> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <
>[email protected]>
>> > wrote:
>> >>> On Fri, May 25, 2018 at 9:33 AM <[email protected]> wrote:
>> >>>> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
>> >>> <[email protected]> wrote:
>> >>> When you say
>> >>>
>> >>>> It still should be available as as inline, however, but now
>"extern
>> >>> inline".
>> >>>
>> >>> Am I understanding correctly that native_save_fl should be
>inlined
>into
>> >>> all
>> >>> call sites (modulo the problematic pv_irq_ops.save_fl case)?
>Because
>> >>> for
>> >>> these two assembly implementations, it's not, but maybe there's
>> >>> something
>> >>> missing in my implementation?
>> >
>> >> Yes, that's what "extern inline" means. Maybe it needs a must
>inline
>> > annotation, but that's really messed up.
>> >
>
>> What about doing something like suggested here:
>> https://bugs.llvm.org/show_bug.cgi?id=37512#c17
>
>> This would keep the definition in C and make it easier for compilers
>> to inline.
>
>The GCC docs for __attribute__((naked)) seem to imply this is a machine
>specific constraint (of which x86 is not listed):
>https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html
>
>But let's try with source:
>https://godbolt.org/g/aJ4gZB
>
>Clang errors:
><source>:3:3: error: non-ASM statement in naked function is not
>supported
> unsigned long flags;
> ^
>
>Is it valid to use assembly to place the results in %rax and mark the c
>function somehow?
>
>gcc doesn't support this attribute until 4.9 (but we can add a feature
>test
>for attributes with gcc (unlike builtins)), but warns that:
>
>warning: ‘naked’ attribute directive ignored [-Wattributes]
>
>gcc 8.1 and trunk inserts a `ud2` instruction (what?!) (let me see if I
>can
>repro outside of godbolt, and will file a bug report).

No, we found that the paravirt code can do the wrong thing for a C implementation.

Nick, could you forward the list of problems so we all have it?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-25 18:00:56

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 10:49 AM Nick Desaulniers <[email protected]>
wrote:
> gcc 8.1 and trunk inserts a `ud2` instruction (what?!) (let me see if I
can
> repro outside of godbolt, and will file a bug report).

Filed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927
--
Thanks,
~Nick Desaulniers

2018-05-25 20:37:08

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 10:56 AM <[email protected]> wrote:
> You need the extern inline in the .h file and the out-of-line .S file
both.

But the out-of-line .S file looks like:

...
10 ENTRY(native_save_fl)

11 pushf

12 pop %_ASM_AX

13 ret

14 ENDPROC(native_save_fl)

15 EXPORT_SYMBOL(native_save_fl)
...

I don't see how you can specify to the linker from assembly source that
this function should be treated as `extern inline`?

I assume you don't literally mean the C keywords `extern inline` but
whatever the equivalent incantation is needed in terms of assembler
directives (which I also don't know).

I'm beginning to think that what you'd like to see cannot be expressed (at
least via `extern inline`).
--
Thanks,
~Nick Desaulniers

2018-05-25 21:07:03

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On 05/25/18 13:36, Nick Desaulniers wrote:
> On Fri, May 25, 2018 at 10:56 AM <[email protected]> wrote:
>> You need the extern inline in the .h file and the out-of-line .S file
> both.
>
> But the out-of-line .S file looks like:
>
> ...
> 10 ENTRY(native_save_fl)
>
> 11 pushf
>
> 12 pop %_ASM_AX
>
> 13 ret
>
> 14 ENDPROC(native_save_fl)
>
> 15 EXPORT_SYMBOL(native_save_fl)
> ...
>
> I don't see how you can specify to the linker from assembly source that
> this function should be treated as `extern inline`?
>

"extern inline" is a C directive. In the header file you should provide
the inlinable implementation (which is already there.) It means that "if
you don't want to inline this there is an external implementation
available."


> I assume you don't literally mean the C keywords `extern inline` but
> whatever the equivalent incantation is needed in terms of assembler
> directives (which I also don't know).
>
> I'm beginning to think that what you'd like to see cannot be expressed (at
> least via `extern inline`).
>


2018-05-25 22:38:51

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 2:06 PM H. Peter Anvin <[email protected]> wrote:
> On 05/25/18 13:36, Nick Desaulniers wrote:
> > On Fri, May 25, 2018 at 10:56 AM <[email protected]> wrote:
> >> You need the extern inline in the .h file and the out-of-line .S file
> > I don't see how you can specify to the linker from assembly source that
> > this function should be treated as `extern inline`?
> "extern inline" is a C directive. In the header file you should provide
> the inlinable implementation (which is already there.) It means that "if
> you don't want to inline this there is an external implementation
> available."

oh you put `extern inline` on the definition, not the declaration?! What?!

http://m68hc11.serveftp.org/inline-1.php

in fact documents that trick. Wow, I've never seen that before. Seems
like there's not too many instances in the kernel.

This seems to inline native_save_fl() properly into the call sites, but the
boot code now fails to link due to multiple definitions when trying link
the objects from arch/x86/boot/compressed/ into vmlinux.

When disassembling the the arch/x86/boot/compressed/ objects, I can see
they're pulling in the `extern inline` c versions, but still outlining
them! (gcc and clang).

That seems to contradict the statement from the above link:
"In no case is the function compiled on its own, not even if you refer to
its address explicitly"

This is kind of funny; first we were wondering why native_save_fl was
getting outlined as a static inline function, which Alistair tracked down
to the incorrect use as a function pointer. Now I'm playing why is
native_save_fl getting outlined as an extern inline function.

If I move the .S file to be in arch/x86/kernel/boot/compressed, the boot
code now links but then paravirt.o fails to link. Referring to the .S from
both Makefiles results in multiple definitions that the linker doesn't
like. (and the boot code still containing outlines).

I don't understand how `extern inline` signals to the linker that "this
version should be taken if you find no others". From what I can tell with
`readelf -a` and `objdump -d`, a function marked `extern inline` vs not
look similar.

Then again, arch/x86/boot/compressed/Makefile completely resets
KBUILD_CFLAGS and KBUILD_AFLAGS, which may be problematic (if there's some
kind of command line flag that affects `extern inline` linkage).
--
Thanks,
~Nick Desaulniers

2018-05-31 06:52:11

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Fri, May 25, 2018 at 3:38 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Fri, May 25, 2018 at 2:06 PM H. Peter Anvin <[email protected]> wrote:
> > On 05/25/18 13:36, Nick Desaulniers wrote:
> > > On Fri, May 25, 2018 at 10:56 AM <[email protected]> wrote:
> > >> You need the extern inline in the .h file and the out-of-line .S file
> > > I don't see how you can specify to the linker from assembly source that
> > > this function should be treated as `extern inline`?
> > "extern inline" is a C directive. In the header file you should provide
> > the inlinable implementation (which is already there.) It means that "if
> > you don't want to inline this there is an external implementation
> > available."
>
> oh you put `extern inline` on the definition, not the declaration?! What?!
>
> http://m68hc11.serveftp.org/inline-1.php
>
> in fact documents that trick. Wow, I've never seen that before. Seems
> like there's not too many instances in the kernel.
>
> This seems to inline native_save_fl() properly into the call sites, but the
> boot code now fails to link due to multiple definitions when trying link
> the objects from arch/x86/boot/compressed/ into vmlinux.
>
> When disassembling the the arch/x86/boot/compressed/ objects, I can see
> they're pulling in the `extern inline` c versions, but still outlining
> them! (gcc and clang).
>
> That seems to contradict the statement from the above link:
> "In no case is the function compiled on its own, not even if you refer to
> its address explicitly"
>
> This is kind of funny; first we were wondering why native_save_fl was
> getting outlined as a static inline function, which Alistair tracked down
> to the incorrect use as a function pointer. Now I'm playing why is
> native_save_fl getting outlined as an extern inline function.
>
> If I move the .S file to be in arch/x86/kernel/boot/compressed, the boot
> code now links but then paravirt.o fails to link. Referring to the .S from
> both Makefiles results in multiple definitions that the linker doesn't
> like. (and the boot code still containing outlines).
>
> I don't understand how `extern inline` signals to the linker that "this
> version should be taken if you find no others". From what I can tell with
> `readelf -a` and `objdump -d`, a function marked `extern inline` vs not
> look similar.
>
> Then again, arch/x86/boot/compressed/Makefile completely resets
> KBUILD_CFLAGS and KBUILD_AFLAGS, which may be problematic (if there's some
> kind of command line flag that affects `extern inline` linkage).

Aha! This (wiping away CFLAGS) was it! The rules for `extern inline`
for gnu89 and c99 are exactly the opposite! [0][1]

c99: "a function defined extern inline will always, emit an externally
visible function."

but then

gnu89: "gnu89 semantics of inline and extern inline are essentially
the exact opposite of those in C99"

This is elaborated more concisely in [2].

If I add `-std=gnu89` to the KBUILD_CFLAGS for compilation units that
are missing that flag but include irqflags.h, I stop getting multiple
definition linkage errors and link the expected kernel image!

This is something to seriously consider for whoever might ever want to
change the kernel's C standard: that you'll end up flipping the
semantics of `extern inline`. Sounds like `-fgnu89-inline` compiler
flag or `gnu_inline` attribute could be used to correct such a
situation.

+ KBUILD maintainers and mailing list.

Completely overwriting the KBUILD_CFLAGS from sub directory Makefiles
(using the := operator) has been making me feel uneasy, since we very
carefully try to set the necessary flags for Clang in the top level
Makefile. In this case, the C standard is different for at least 2
subdirectories, which has implications for at least the above case
with the linkage of `extern inline` functions.

Will do further testing and cut a patch set tomorrow.

[0] https://en.wikipedia.org/wiki/Inline_function#C99
[1] https://en.wikipedia.org/wiki/Inline_function#gnu89
[2] http://blahg.josefsipek.net/?p=529
--
Thanks,
~Nick Desaulniers

2018-06-01 17:14:52

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [clang] stack protector and f1f029c7bf

On Wed, May 30, 2018 at 11:50 PM Nick Desaulniers
<[email protected]> wrote:
> Will do further testing and cut a patch set tomorrow.

https://lkml.org/lkml/2018/6/1/679
(sorry, should have re-cc-ed everyone from this thread, will try to
re-add when there's feedback)
--
Thanks,
~Nick Desaulniers