Building with UBSAN_DIV_ZERO with clang produces numerous fallthrough
warnings from objtool.
In the case of uncheck division, UBSAN_DIV_ZERO may introduce new
control flow to check for division by zero. Because the result of the
division is undefined, LLVM may optimize the control flow such that
after the call to __ubsan_handle_divrem_overflow doesn't matter. If
panic_on_warn was set, __ubsan_handle_divrem_overflow would panic. The
problem is is that panic_on_warn is run time configurable. If it's
disabled, then we cannot guarantee that we will be able to recover
safely. Disable this config for clang until we can come up with a
solution in LLVM.
Link: https://github.com/ClangBuiltLinux/linux/issues/1657
Link: https://github.com/llvm/llvm-project/issues/56289
Link: https://lore.kernel.org/lkml/CAHk-=wj1qhf7y3VNACEexyp5EbkNpdcu_542k-xZpzmYLOjiCg@mail.gmail.com/
Reported-by: Sudip Mukherjee <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
Linus,
I still think we should add explicit checks to gaurd against divide by
zero.
lib/Kconfig.ubsan | 3 +++
1 file changed, 3 insertions(+)
diff --git a/lib/Kconfig.ubsan b/lib/Kconfig.ubsan
index a9f7eb047768..fd15230a703b 100644
--- a/lib/Kconfig.ubsan
+++ b/lib/Kconfig.ubsan
@@ -84,6 +84,9 @@ config UBSAN_SHIFT
config UBSAN_DIV_ZERO
bool "Perform checking for integer divide-by-zero"
depends on $(cc-option,-fsanitize=integer-divide-by-zero)
+ # https://github.com/ClangBuiltLinux/linux/issues/1657
+ # https://github.com/llvm/llvm-project/issues/56289
+ depends on !CC_IS_CLANG
help
This option enables -fsanitize=integer-divide-by-zero which checks
for integer division by zero. This is effectively redundant with the
--
2.37.0.170.g444d1eabd0-goog
On Thu, Jul 14, 2022 at 01:56:43PM -0700, Nick Desaulniers wrote:
> Building with UBSAN_DIV_ZERO with clang produces numerous fallthrough
> warnings from objtool.
>
> In the case of uncheck division, UBSAN_DIV_ZERO may introduce new
> control flow to check for division by zero. Because the result of the
> division is undefined, LLVM may optimize the control flow such that
> after the call to __ubsan_handle_divrem_overflow doesn't matter. If
> panic_on_warn was set, __ubsan_handle_divrem_overflow would panic. The
> problem is is that panic_on_warn is run time configurable. If it's
> disabled, then we cannot guarantee that we will be able to recover
> safely. Disable this config for clang until we can come up with a
> solution in LLVM.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1657
> Link: https://github.com/llvm/llvm-project/issues/56289
> Link: https://lore.kernel.org/lkml/CAHk-=wj1qhf7y3VNACEexyp5EbkNpdcu_542k-xZpzmYLOjiCg@mail.gmail.com/
> Reported-by: Sudip Mukherjee <[email protected]>
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Nick Desaulniers <[email protected]>
Acked-by: Nathan Chancellor <[email protected]>
> ---
> Linus,
> I still think we should add explicit checks to gaurd against divide by
> zero.
>
> lib/Kconfig.ubsan | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/lib/Kconfig.ubsan b/lib/Kconfig.ubsan
> index a9f7eb047768..fd15230a703b 100644
> --- a/lib/Kconfig.ubsan
> +++ b/lib/Kconfig.ubsan
> @@ -84,6 +84,9 @@ config UBSAN_SHIFT
> config UBSAN_DIV_ZERO
> bool "Perform checking for integer divide-by-zero"
> depends on $(cc-option,-fsanitize=integer-divide-by-zero)
> + # https://github.com/ClangBuiltLinux/linux/issues/1657
> + # https://github.com/llvm/llvm-project/issues/56289
> + depends on !CC_IS_CLANG
> help
> This option enables -fsanitize=integer-divide-by-zero which checks
> for integer division by zero. This is effectively redundant with the
> --
> 2.37.0.170.g444d1eabd0-goog
>
>
On Thu, Jul 14, 2022 at 1:56 PM Nick Desaulniers
<[email protected]> wrote:
>
> Linus,
> I still think we should add explicit checks to gaurd against divide by
> zero.
I mean, that's what UBSAN_DIV_ZERO is supposed to do.
The fact that clang then messes it up, and turns "I found undefined
behavior" into "I just crashed the machine" is why it needs to be
disabled.
Please conmvince clang people to fix the sanitizer.
san·i·tize
/ˈsanəˌtīz/
verb
make clean and hygienic; disinfect.
note how "sanitize" is meant to clean things of undefined behavior.
The way you do that is by warning, and giving it defined behavior. It
really is that simple.
Clang seems to warn and then just turn it into ANOTHER - and much
worse - undefined behavior.
In other words, clang doesn't "sanitize" anything at all. It just
moves the mess around and makes it worse.
Linus
On Thu, Jul 14, 2022 at 2:25 PM Linus Torvalds
<[email protected]> wrote:
>
> The way you do that is by warning, and giving it defined behavior. It
> really is that simple.
int do_div (int dividend, int divisor) {
return dividend / divisor;
}
has UB should divisor ever be zero, not much different from:
int deref (int *foo) {
return *foo;
}
when foo is NULL. Should the two of those be:
int do_div (int dividend, int divisor) {
if (!divisor)
return -EOOPS;
return dividend / divisor;
}
int deref (int *foo) {
if (!foo)
return -EOOPS;
return *foo;
}
or keep the unchecked versions and wait for a report from a user or
bot with a sanitizer splat?
I get the sanitizer doesn't work as advertised. I _agree_ with you.
Hence this patch (which I _think_ works towards your point, shouldn't
you Ack it?). I feel like you're talking past me without addressing
my point, let me try rephrasing it:
I _additionally_ think we should be adding more checks to guard
against division by zero to the kernel sources. Or are we happy to
wait and find out if divisors are ever zero and fix them as they pop
up/become problematic?
--
Thanks,
~Nick Desaulniers
On Thu, Jul 14, 2022 at 2:38 PM Nick Desaulniers
<[email protected]> wrote:
>
> int do_div (int dividend, int divisor) {
> return dividend / divisor;
> }
>
> has UB should divisor ever be zero, not much different from:
So?
What about the million other '/' in the kernel?
Adding one check to 'do_div()' is just stupid. It's like using a
bottle-cap as an umbrella.
Nick, that's the whole _point_ of having compiler support for things
like this - automation. Because doing them manually one at a time is
just completely broken and stupid.
Linus
On Thu, Jul 14, 2022 at 1:56 PM Nick Desaulniers
<[email protected]> wrote:
>
> Building with UBSAN_DIV_ZERO with clang produces numerous fallthrough
> warnings from objtool.
Ok, with this applied, things are better.
There are still the "__ubsan_handle_load_invalid_value() with UACCESS
enabled" messages, but those are misfeatures of the kvm cmpxchg
implementation.
I'm not entirely sure why the clang build warns but gcc doesn't, but I
*think* it's because clang is just being silly. It *looks* like it
checks that a "bool" has a value range of 0/1, and will complain if
not.
And the reason I say that's silly is that if I read it correctly, then
that value has literally been generated by clang itself, using "setz"
instruction.
It's the __try_cmpxchg_user_asm() macro, and with clang-14 I have it's
that CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT case, and the C code uses
inline asm and does
asm_volatile_goto("\n" \
"1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\
_ASM_EXTABLE_UA(1b, %l[label]) \
: CC_OUT(z) (success), \
where that CC_OUT() in this case turns into
# define CC_OUT(c) "=@cc" #c
and clang generates this code for it:
7d01e: f0 48 0f b1 4d 00 lock cmpxchg %rcx,0x0(%rbp)
7d024: 49 89 c5 mov %rax,%r13
7d027: 0f 94 c0 sete %al
7d02a: 41 88 c6 mov %al,%r14b
7d02d: bf 02 00 00 00 mov $0x2,%edi
7d032: 44 89 f6 mov %r14d,%esi
7d035: e8 00 00 00 00 call __sanitizer_cov_trace_const_cmp1
7d03a: 41 80 fe 01 cmp $0x1,%r14b
7d03e: 0f 87 af 01 00 00 ja 7d1f3
<emulator_cmpxchg_emulated+0x6b3>
where that last "ja 7d1f3" is the branch to the code that then
calls __ubsan_handle_load_invalid_value.
But look at that code: it's literally
sete %al
mov %al,%r14b
cmp $0x1,%r14b
where clang has generated that "sete itself, and then it verifies that
the result is "<= 1".
IOW, clang seems to be literally just checking that the "sete"
instruction works right.
That's silly.
Maybe I'm misreading this, but I think the reason the clang build
complains, but the gcc build does not, is simply because gcc isn't
doing crazy checks of how the CPU works.
Some mis-feature of the "asm with flag output" code, where clang
doesn't understand that it generated that code itself, and the "setcc"
instruction always returns 0/1?
The old issue with "memcpy/memset() leaves .noinstr.text section"
because clang has generated out-of-line functions for trivial copies
also remains, but whatever.
Linus
On Thu, Jul 14, 2022 at 4:15 PM Linus Torvalds
<[email protected]> wrote:
>
> There are still the "__ubsan_handle_load_invalid_value() with UACCESS
> enabled" messages, but those are misfeatures of the kvm cmpxchg
> implementation.
Ok, so it really is independent of the non-optimal kvm cmpxchg implementation.
The kvm code is fine - yes, it runs a bit too much code under the
CLAC/STAC pair, but that "too much code" really is just a few local
variable assignments.
So any reasonable compiler will keep them in registers, and even if
that's not the case, it would just be a stack spill or reload with AC
set, which is not really a problem.
But calling out to external functions within that SMAP region is
invalid, and objdump warns about it. Because we really want to
minimize the area where user accesses are ok, and it's a "don't do
this, you now lost the SMAP protection".
ANYWAY.
The reason the clang build warns - but the gcc build does not - is
simply that clang code generation is just nasty.
I decided to make a non-kernel test-case so that clang people can look
at it without having to worry about any kernel code issues or the
details of the SMAP rules, and it really shows how clang generates
horribly pointless and wrong code.
I'm not sure what the right thing to do is to get this sorted out, but
I created an "issue" on github for this:
https://github.com/llvm/llvm-project/issues/56568
in the hope that this can get fixed. Because it's very clearly a clang
misfeature, where clang basically generates insane code that violates
the rules we try to enforce with objdump.
Nick / Nathan / clang-built-linux people - if there are other better
ways than that github issue thing to make people aware of this, that
would be lovely, and please forward that issue to the right people.
Linus
On Sat, Jul 16, 2022 at 10:34 AM Linus Torvalds
<[email protected]> wrote:
>
> I decided to make a non-kernel test-case so that clang people can look
> at it without having to worry about any kernel code issues or the
> details of the SMAP rules, and it really shows how clang generates
> horribly pointless and wrong code.
Perfect, thank you. I was having a hard time reducing this one.
>
> I'm not sure what the right thing to do is to get this sorted out, but
> I created an "issue" on github for this:
>
> https://github.com/llvm/llvm-project/issues/56568
>
> in the hope that this can get fixed. Because it's very clearly a clang
> misfeature, where clang basically generates insane code that violates
> the rules we try to enforce with objdump.
>
> Nick / Nathan / clang-built-linux people - if there are other better
> ways than that github issue thing to make people aware of this, that
> would be lovely, and please forward that issue to the right people.
Perfect, thank you.
LLVM moved from bugzilla to github for issue tracking last year. It's
best to file issues there, or in our issue tracker (also github) which
is more specific to issues pertaining to the kernel:
https://github.com/ClangBuiltLinux/linux/issues
You can cc Nathan or myself in bugreports via mentioning @nathanchance
and @nickdesaulniers explicitly, or just forward us the link and we
can subscribe.
Putting out some other fires/regressions at the moment; it looks like
Yuanfang has already started chewing on your bug report. If they get
stuck, I'll dive in once these regressions are sorted. Should be an
optimization opportunity regardless of UBSAN when we know only certain
bits are set.
--
Thanks,
~Nick Desaulniers