Hi folks,
so I've been looking at a recent objtool noinstr warning from clang
builds:
vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
The issue is that clang generates a memcpy() call when a struct copy
happens:
if (regs != eregs)
*regs = *eregs;
see below for asm output.
While gcc does simply generate an actual "rep; movsq".
So, how hard would it be to make clang do that too pls?
Oh, and another thing while we're comparing asm: I'd love for clang's
-fverbose-asm to issue interleaved C source lines too, like gcc does.
That's it - no pink pony - just "normal" wishes. :-)
GCC:
====
sync_regs:
.LASANPC4246:
# arch/x86/kernel/traps.c:770: {
movq %rdi, %rsi # tmp91, eregs
# arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
#APP
# 771 "arch/x86/kernel/traps.c" 1
movq %gs:cpu_current_top_of_stack(%rip), %rax # cpu_current_top_of_stack, pfo_val__
# 0 "" 2
# arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
#NO_APP
subq $168, %rax #, <retval>
# arch/x86/kernel/traps.c:772: if (regs != eregs)
cmpq %rdi, %rax # eregs, <retval>
je .L387 #,
# arch/x86/kernel/traps.c:773: *regs = *eregs;
movl $21, %ecx #, tmp89
movq %rax, %rdi # <retval>, <retval>
rep movsq
.L387:
# arch/x86/kernel/traps.c:775: }
ret
CLANG:
======
.section .noinstr.text,"ax",@progbits
.globl sync_regs # -- Begin function sync_regs
.p2align 6, 0x90
.type sync_regs,@function
sync_regs: # @sync_regs
# %bb.0: # %entry
pushq %rbx
#APP
movq %gs:cpu_current_top_of_stack(%rip), %rbx
#NO_APP
addq $-168, %rbx
cmpq %rdi, %rbx
je .LBB19_2
# %bb.1: # %if.then
movq %rdi, %rsi
movl $168, %edx
movq %rbx, %rdi
callq memcpy@PLT
.LBB19_2: # %if.end
movq %rbx, %rax
popq %rbx
retq
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <[email protected]> wrote:
>
> Hi folks,
>
> so I've been looking at a recent objtool noinstr warning from clang
> builds:
>
> vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
>
> The issue is that clang generates a memcpy() call when a struct copy
> happens:
>
> if (regs != eregs)
> *regs = *eregs;
Specifically, this is copying one struct pt_regs to another. It looks
like the sizeof struct pt_regs is just large enough to have clang emit
the libcall.
https://godbolt.org/z/scx6aa8jq
Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
be raised so that we continue to inline the memcpy? *shrug*
Though, looking at the compiled memcpy (`llvm-objdump -D
--disassemble-symbols=memcpy vmlinux`), maybe we *should* try harder.
Filed
https://github.com/llvm/llvm-project/issues/54535.
>
> see below for asm output.
>
> While gcc does simply generate an actual "rep; movsq".
>
> So, how hard would it be to make clang do that too pls?
As Mark said in the sibling reply; I don't know of general ways to
inhibit libcall optimizations on the level you're looking for, short
of heavy handy methods of disabling optimizations entirely. There's
games that can be played with -fno-builtin-*, but they're not super
portable, and I think there's a handful of *blessed* functions that
must exist in any env, freestanding or not: memcpy, memmove, memset,
and memcmp for which you cannot yet express "these do not exist."
>
> Oh, and another thing while we're comparing asm: I'd love for clang's
> -fverbose-asm to issue interleaved C source lines too, like gcc does.
>
> That's it - no pink pony - just "normal" wishes. :-)
Looks like someone had started work on this in 2017, but it stalled out:
https://github.com/llvm/llvm-project/issues/17839#issuecomment-980923994
Linus has asked for this in the past, too. So it's something we'll add
to the TODO list to revisit. I highly suspect that clang has
discarded the AST by the time LLVM gets to asm emission, but I could
be wrong. https://github.com/llvm/llvm-project/issues/17839#issuecomment-1077933948
>
> GCC:
> ====
>
> sync_regs:
> .LASANPC4246:
> # arch/x86/kernel/traps.c:770: {
> movq %rdi, %rsi # tmp91, eregs
> # arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
> #APP
> # 771 "arch/x86/kernel/traps.c" 1
> movq %gs:cpu_current_top_of_stack(%rip), %rax # cpu_current_top_of_stack, pfo_val__
> # 0 "" 2
> # arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
> #NO_APP
> subq $168, %rax #, <retval>
> # arch/x86/kernel/traps.c:772: if (regs != eregs)
> cmpq %rdi, %rax # eregs, <retval>
> je .L387 #,
> # arch/x86/kernel/traps.c:773: *regs = *eregs;
> movl $21, %ecx #, tmp89
> movq %rax, %rdi # <retval>, <retval>
> rep movsq
> .L387:
> # arch/x86/kernel/traps.c:775: }
> ret
>
> CLANG:
> ======
>
> .section .noinstr.text,"ax",@progbits
> .globl sync_regs # -- Begin function sync_regs
> .p2align 6, 0x90
> .type sync_regs,@function
> sync_regs: # @sync_regs
> # %bb.0: # %entry
> pushq %rbx
> #APP
> movq %gs:cpu_current_top_of_stack(%rip), %rbx
> #NO_APP
> addq $-168, %rbx
> cmpq %rdi, %rbx
> je .LBB19_2
> # %bb.1: # %if.then
> movq %rdi, %rsi
> movl $168, %edx
> movq %rbx, %rdi
> callq memcpy@PLT
> .LBB19_2: # %if.end
> movq %rbx, %rax
> popq %rbx
> retq
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
--
Thanks,
~Nick Desaulniers
On Thu, Mar 24, 2022 at 12:19:19PM +0100, Borislav Petkov wrote:
> Hi folks,
Hi Boris,
> so I've been looking at a recent objtool noinstr warning from clang
> builds:
>
> vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
>
> The issue is that clang generates a memcpy() call when a struct copy
> happens:
>
> if (regs != eregs)
> *regs = *eregs;
>
> see below for asm output.
>
> While gcc does simply generate an actual "rep; movsq".
I think there's a more general soundness problem with noinstr here,
because with the options we pass today it's entirely legitimate for the
compiler to generate out-of-line calls to a number of support functions
(e.g. memcpy, but also memset and others), and we either need to inhibit
out-of-line calls to *any* of those, or ensure the out-of-line copies
used are never instrumented.
I'm not entirely sure how to prevent this on arm64 short of some
whole-compilation-unit shennanigans -- we don't have short sequence like
"rep movsq" that can be easily inlined, and we explicitly instrument
mem*() when certain KASAN options are selected.
I think we need more compiler help to make noinstr sound generally,
and/or may need to rethink the way we use noinstr.
Thanks,
Mark.
> So, how hard would it be to make clang do that too pls?
>
> Oh, and another thing while we're comparing asm: I'd love for clang's
> -fverbose-asm to issue interleaved C source lines too, like gcc does.
>
> That's it - no pink pony - just "normal" wishes. :-)
>
> GCC:
> ====
>
> sync_regs:
> .LASANPC4246:
> # arch/x86/kernel/traps.c:770: {
> movq %rdi, %rsi # tmp91, eregs
> # arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
> #APP
> # 771 "arch/x86/kernel/traps.c" 1
> movq %gs:cpu_current_top_of_stack(%rip), %rax # cpu_current_top_of_stack, pfo_val__
> # 0 "" 2
> # arch/x86/kernel/traps.c:771: struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
> #NO_APP
> subq $168, %rax #, <retval>
> # arch/x86/kernel/traps.c:772: if (regs != eregs)
> cmpq %rdi, %rax # eregs, <retval>
> je .L387 #,
> # arch/x86/kernel/traps.c:773: *regs = *eregs;
> movl $21, %ecx #, tmp89
> movq %rax, %rdi # <retval>, <retval>
> rep movsq
> .L387:
> # arch/x86/kernel/traps.c:775: }
> ret
>
> CLANG:
> ======
>
> .section .noinstr.text,"ax",@progbits
> .globl sync_regs # -- Begin function sync_regs
> .p2align 6, 0x90
> .type sync_regs,@function
> sync_regs: # @sync_regs
> # %bb.0: # %entry
> pushq %rbx
> #APP
> movq %gs:cpu_current_top_of_stack(%rip), %rbx
> #NO_APP
> addq $-168, %rbx
> cmpq %rdi, %rbx
> je .LBB19_2
> # %bb.1: # %if.then
> movq %rdi, %rsi
> movl $168, %edx
> movq %rbx, %rdi
> callq memcpy@PLT
> .LBB19_2: # %if.end
> movq %rbx, %rax
> popq %rbx
> retq
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
From: Nick Desaulniers
> Sent: 24 March 2022 18:44
>
> On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <[email protected]> wrote:
> >
> > Hi folks,
> >
> > so I've been looking at a recent objtool noinstr warning from clang
> > builds:
> >
> > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> >
> > The issue is that clang generates a memcpy() call when a struct copy
> > happens:
> >
> > if (regs != eregs)
> > *regs = *eregs;
>
> Specifically, this is copying one struct pt_regs to another. It looks
> like the sizeof struct pt_regs is just large enough to have clang emit
> the libcall.
> https://godbolt.org/z/scx6aa8jq
> Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
> be raised so that we continue to inline the memcpy? *shrug*
I've just looked at some instruction timings.
For 32 byte aligned copies it actually looks like 'rep movs'
(probably movsq) is actually reasonable for large buffers
on all mainstream Intel cpu since sandy brige.
On the more recent ones it runs at 32 bytes/clock.
It may not be that bad for shorter and non 32 byte aligned
buffers as well.
Certainly I can't see a reason for calling memcpy() for
large copies!
At least no one uses P4 any more, setup latency was something
like 186 clocks!
I was thinking that 'rep movsq' only made any sense with -Os.
But it seems to be better than I thought.
(I might even have measured it running 'fast' on Ivy bridge.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Mar 24, 2022 at 11:43:46AM -0700, Nick Desaulniers wrote:
> On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <[email protected]> wrote:
> >
> > Hi folks,
> >
> > so I've been looking at a recent objtool noinstr warning from clang
> > builds:
> >
> > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> >
> > The issue is that clang generates a memcpy() call when a struct copy
> > happens:
> >
> > if (regs != eregs)
> > *regs = *eregs;
>
> Specifically, this is copying one struct pt_regs to another. It looks
> like the sizeof struct pt_regs is just large enough to have clang emit
> the libcall.
> https://godbolt.org/z/scx6aa8jq
> Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
> be raised so that we continue to inline the memcpy? *shrug*
>
> Though, looking at the compiled memcpy (`llvm-objdump -D
> --disassemble-symbols=memcpy vmlinux`), maybe we *should* try harder.
> Filed
> https://github.com/llvm/llvm-project/issues/54535.
>
> > see below for asm output.
> >
> > While gcc does simply generate an actual "rep; movsq".
> >
> > So, how hard would it be to make clang do that too pls?
>
> As Mark said in the sibling reply; I don't know of general ways to
> inhibit libcall optimizations on the level you're looking for, short
> of heavy handy methods of disabling optimizations entirely. There's
> games that can be played with -fno-builtin-*, but they're not super
> portable, and I think there's a handful of *blessed* functions that
> must exist in any env, freestanding or not: memcpy, memmove, memset,
> and memcmp for which you cannot yet express "these do not exist."
Talking with Peter on IRC, I think there's an oversight on the compiler
side here w.r.t. the expectations around these blessed functions, since
either:
a) The compiler expects the out-of-line implementations of functions
ARE NOT instrumented by address-sanitizer.
If this is the case, then it's legitimate for the compiler to call
these functions anywhere, and we should NOT instrument the kernel
implementations of these. If the compiler wants those instrumented it
needs to add the instrumentation in the caller.
b) The compiler expects the out-of-line implementations of functions
ARE instrumented by address-sanitizer.
If this is the case, the compiler MUST NOT generate implicit calls to
these "blessed" functions from functions marked with:
__attribute__((no_sanitize_address)).
... or the compiler is violating the premise of that attribute.
AFAICT The two options for the compiler here are:
1) Always inline an uninstrumented form of the function in this case
2) Have distinct instrumented/uninstrumented out-of-line
implementations, and call the uninstrumented form in this case.
To see what clang and GCC do today, I hacked the following in:
| diff --git a/init/main.c b/init/main.c
| index 65fa2e41a9c0..30406c472b5d 100644
| --- a/init/main.c
| +++ b/init/main.c
| @@ -1637,3 +1637,31 @@ static noinline void __init kernel_init_freeable(void)
|
| integrity_load_keys();
| }
| +
| +void
| +test_implicit_memcpy(struct task_struct *dest,
| + const struct task_struct *src)
| +{
| + *dest = *src;
| +}
| +
| +void
| +test_explicit_memcpy(struct task_struct *dest,
| + const struct task_struct *src)
| +{
| + memcpy(dest, src, sizeof(*dest));
| +}
| +
| +void __no_sanitize_address
| +test_implicit_memcpy_nokasan(struct task_struct *dest,
| + const struct task_struct *src)
| +{
| + *dest = *src;
| +}
| +
| +void __no_sanitize_address
| +test_explicit_memcpy_nokasan(struct task_struct *dest,
| + const struct task_struct *src)
| +{
| + memcpy(dest, src, sizeof(*dest));
| +}
For arm64, GCC 11.1.0, KASAN_OUTLINE I see:
| <test_implicit_memcpy>:
| d503245f bti c
| d503233f paciasp
| a9be7bfd stp x29, x30, [sp, #-32]!
| 910003fd mov x29, sp
| a90153f3 stp x19, x20, [sp, #16]
| aa0103f3 mov x19, x1
| aa0003f4 mov x20, x0
| d281c001 mov x1, #0xe00 // #3584
| 940b9534 bl ffff8000082f9d90 <__asan_storeN>
| aa1303e0 mov x0, x19
| d281c001 mov x1, #0xe00 // #3584
| 940b951e bl ffff8000082f9d44 <__asan_loadN>
| aa1303e1 mov x1, x19
| aa1403e0 mov x0, x20
| d281c002 mov x2, #0xe00 // #3584
| 940b98c5 bl ffff8000082fabf0 <memcpy>
| a94153f3 ldp x19, x20, [sp, #16]
| a8c27bfd ldp x29, x30, [sp], #32
| d50323bf autiasp
| d65f03c0 ret
|
| <test_explicit_memcpy>:
| d503245f bti c
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| d281c002 mov x2, #0xe00 // #3584
| 910003fd mov x29, sp
| 940b98bb bl ffff8000082fabf0 <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
|
| <test_implicit_memcpy_nokasan>:
| d503245f bti c
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| d281c002 mov x2, #0xe00 // #3584
| 910003fd mov x29, sp
| 940b98b2 bl ffff8000082fabf0 <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
|
| <test_explicit_memcpy_nokasan>:
| d503245f bti c
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| d281c002 mov x2, #0xe00 // #3584
| 910003fd mov x29, sp
| 940b98a7 bl ffff8000082fabf0 <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
For x86_64, GCC 11.1.0, KASAN_OUTLINE I see:
| <test_implicit_memcpy>:
| 41 54 push %r12
| 49 89 fc mov %rdi,%r12
| 55 push %rbp
| 48 89 f5 mov %rsi,%rbp
| be 40 1c 00 00 mov $0x1c40,%esi
| e8 0d 9a 32 00 call ffffffff8132b0f0 <__asan_storeN>
| 48 89 ef mov %rbp,%rdi
| be 40 1c 00 00 mov $0x1c40,%esi
| e8 f0 99 32 00 call ffffffff8132b0e0 <__asan_loadN>
| 4c 89 e7 mov %r12,%rdi
| 48 89 ee mov %rbp,%rsi
| b9 88 03 00 00 mov $0x388,%ecx
| f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
| 5d pop %rbp
| 41 5c pop %r12
| c3 ret
|
| <test_explicit_memcpy>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e9 b6 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
|
|
| <test_implicit_memcpy_nokasan>:
| b9 88 03 00 00 mov $0x388,%ecx
| f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
| c3 ret
|
| <test_explicit_memcpy_nokasan>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e9 96 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
So from those examples it seems GCC falls into bucket (a), and assumes the
blessed functions ARE NOT instrumented.
We can make this noinstr-safe AND get instrumentation for the first two cases
by removing the instrumentation from the out-of-line copies (always using
noinstr asm implementations) and using ifdeffery to make the explicit calls
target as distinct kasan_instrumented_memcpy() or similar...
For arm64, clang 13.0.0, KASAN_OUTLINE I see:
| <test_implicit_memcpy>:
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| 910003fd mov x29, sp
| 5281c002 mov w2, #0xe00 // #3584
| 940c0f66 bl ffff8000083185fc <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
|
| <test_explicit_memcpy>:
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| 910003fd mov x29, sp
| 5281c002 mov w2, #0xe00 // #3584
| 940c0f5e bl ffff8000083185fc <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
|
| <test_implicit_memcpy_nokasan>:
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| 910003fd mov x29, sp
| 5281c002 mov w2, #0xe00 // #3584
| 940c0f56 bl ffff8000083185fc <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
|
| <test_explicit_memcpy_nokasan>:
| d503233f paciasp
| a9bf7bfd stp x29, x30, [sp, #-16]!
| 910003fd mov x29, sp
| 5281c002 mov w2, #0xe00 // #3584
| 940c0f4e bl ffff8000083185fc <memcpy>
| a8c17bfd ldp x29, x30, [sp], #16
| d50323bf autiasp
| d65f03c0 ret
For x86_64, clang 13.0.0, KASAN_OUTLINE I see:
| <test_implicit_memcpy>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e8 d6 94 36 00 call ffffffff8136a830 <memcpy>
| c3 ret
|
| <test_explicit_memcpy>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e8 c6 94 36 00 call ffffffff8136a830 <memcpy>
| c3 ret
|
|
| <test_implicit_memcpy_nokasan>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e9 b6 94 36 00 jmp ffffffff8136a830 <memcpy>
|
|
| <test_explicit_memcpy_nokasan>:
| ba 40 1c 00 00 mov $0x1c40,%edx
| e9 a6 94 36 00 jmp ffffffff8136a830 <memcpy>
... for which the first two suggests clang thinks the blessed functions *are*
instrumented, which means that generating calls to those in the latter two
cases is a bug.
We can make this noinstr-safe as with the GCC case, but we'll lose the
desirable instrumentation for the test_implicit_memcpy() case.
I think something has to change on the compiler side here (e.g. as per
options above), and we should align GCC and clang on the same
approach...
Thanks,
Mark.
Hi!
On Fri, Mar 25, 2022 at 03:13:36PM +0100, Peter Zijlstra wrote:
>
> +linux-toolchains
>
> On Fri, Mar 25, 2022 at 12:15:28PM +0000, Mark Rutland wrote:
> > On Thu, Mar 24, 2022 at 11:43:46AM -0700, Nick Desaulniers wrote:
> > > On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <[email protected]> wrote:
> > > > The issue is that clang generates a memcpy() call when a struct copy
> > > > happens:
> > > >
> > > > if (regs != eregs)
> > > > *regs = *eregs;
> > >
> > > Specifically, this is copying one struct pt_regs to another. It looks
> > > like the sizeof struct pt_regs is just large enough to have clang emit
> > > the libcall.
> > > https://godbolt.org/z/scx6aa8jq
> > > Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> > > the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
> > > be raised so that we continue to inline the memcpy? *shrug*
I win't talk for LLVM, of course... all of what I'll write here is
assuming LLVM copied the GCC requirement that memcpy is the standard
function, even if freestanding (and also memmove, memset, memcmp).
It is valid to replace any call to memcpy with some open-coded machine
code, or conversely, insert calls to memcpy wherever its semantics are
wanted.
> > > As Mark said in the sibling reply; I don't know of general ways to
> > > inhibit libcall optimizations on the level you're looking for, short
> > > of heavy handy methods of disabling optimizations entirely. There's
> > > games that can be played with -fno-builtin-*, but they're not super
> > > portable, and I think there's a handful of *blessed* functions that
> > > must exist in any env, freestanding or not: memcpy, memmove, memset,
> > > and memcmp for which you cannot yet express "these do not exist."
The easy, fool-proof, and correct way to prevent a function ending in
a sibling call is to simply not let it end in a call at all. The best
way I know to do that is insert
asm("");
right before the end of the function.
> > a) The compiler expects the out-of-line implementations of functions
> > ARE NOT instrumented by address-sanitizer.
> >
> > If this is the case, then it's legitimate for the compiler to call
> > these functions anywhere, and we should NOT instrument the kernel
> > implementations of these. If the compiler wants those instrumented it
> > needs to add the instrumentation in the caller.
The compiler isn't assuming anything about asan. The compiler generates
its code without any consideration of what asan will or will not do.
The burden of making things work is on asan.
It is legitimate to call (or not call!) memcpy anywhere. memcpy always
is __builtin_memcpy, which either or not does a function call.
> > AFAICT The two options for the compiler here are:
> >
> > 1) Always inline an uninstrumented form of the function in this case
> >
> > 2) Have distinct instrumented/uninstrumented out-of-line
> > implementations, and call the uninstrumented form in this case.
The compiler should not do anything differently here if it uses asan.
The address sanitizer and the memcpy function implementation perhaps
have to cooperate somehow, or asan needs more smarts. This needs to
happen no matter what, to support other things calling memcpy, say,
assembler code.
> > So from those examples it seems GCC falls into bucket (a), and assumes the
> > blessed functions ARE NOT instrumented.
No, it doesn't show GCC assumes anything. No testing of this kind can
show anything alike.
> > I think something has to change on the compiler side here (e.g. as per
> > options above), and we should align GCC and clang on the same
> > approach...
GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
standard-specified semantics). This means that it will have the same
semantics as __builtin_memcpy always, and either or not be a call to an
external function. It can also create calls to it out of thin air.
All of this has been true for thirty years, and it won't change today
either.
Segher
+linux-toolchains
On Fri, Mar 25, 2022 at 12:15:28PM +0000, Mark Rutland wrote:
> On Thu, Mar 24, 2022 at 11:43:46AM -0700, Nick Desaulniers wrote:
> > On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <[email protected]> wrote:
> > >
> > > Hi folks,
> > >
> > > so I've been looking at a recent objtool noinstr warning from clang
> > > builds:
> > >
> > > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> > >
> > > The issue is that clang generates a memcpy() call when a struct copy
> > > happens:
> > >
> > > if (regs != eregs)
> > > *regs = *eregs;
> >
> > Specifically, this is copying one struct pt_regs to another. It looks
> > like the sizeof struct pt_regs is just large enough to have clang emit
> > the libcall.
> > https://godbolt.org/z/scx6aa8jq
> > Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> > the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
> > be raised so that we continue to inline the memcpy? *shrug*
> >
> > Though, looking at the compiled memcpy (`llvm-objdump -D
> > --disassemble-symbols=memcpy vmlinux`), maybe we *should* try harder.
> > Filed
> > https://github.com/llvm/llvm-project/issues/54535.
> >
> > > see below for asm output.
> > >
> > > While gcc does simply generate an actual "rep; movsq".
> > >
> > > So, how hard would it be to make clang do that too pls?
> >
> > As Mark said in the sibling reply; I don't know of general ways to
> > inhibit libcall optimizations on the level you're looking for, short
> > of heavy handy methods of disabling optimizations entirely. There's
> > games that can be played with -fno-builtin-*, but they're not super
> > portable, and I think there's a handful of *blessed* functions that
> > must exist in any env, freestanding or not: memcpy, memmove, memset,
> > and memcmp for which you cannot yet express "these do not exist."
>
> Talking with Peter on IRC, I think there's an oversight on the compiler
> side here w.r.t. the expectations around these blessed functions, since
> either:
>
> a) The compiler expects the out-of-line implementations of functions
> ARE NOT instrumented by address-sanitizer.
>
> If this is the case, then it's legitimate for the compiler to call
> these functions anywhere, and we should NOT instrument the kernel
> implementations of these. If the compiler wants those instrumented it
> needs to add the instrumentation in the caller.
>
> b) The compiler expects the out-of-line implementations of functions
> ARE instrumented by address-sanitizer.
>
> If this is the case, the compiler MUST NOT generate implicit calls to
> these "blessed" functions from functions marked with:
>
> __attribute__((no_sanitize_address)).
>
> ... or the compiler is violating the premise of that attribute.
>
> AFAICT The two options for the compiler here are:
>
> 1) Always inline an uninstrumented form of the function in this case
>
> 2) Have distinct instrumented/uninstrumented out-of-line
> implementations, and call the uninstrumented form in this case.
>
> To see what clang and GCC do today, I hacked the following in:
>
> | diff --git a/init/main.c b/init/main.c
> | index 65fa2e41a9c0..30406c472b5d 100644
> | --- a/init/main.c
> | +++ b/init/main.c
> | @@ -1637,3 +1637,31 @@ static noinline void __init kernel_init_freeable(void)
> |
> | integrity_load_keys();
> | }
> | +
> | +void
> | +test_implicit_memcpy(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + *dest = *src;
> | +}
> | +
> | +void
> | +test_explicit_memcpy(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + memcpy(dest, src, sizeof(*dest));
> | +}
> | +
> | +void __no_sanitize_address
> | +test_implicit_memcpy_nokasan(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + *dest = *src;
> | +}
> | +
> | +void __no_sanitize_address
> | +test_explicit_memcpy_nokasan(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + memcpy(dest, src, sizeof(*dest));
> | +}
>
>
> For arm64, GCC 11.1.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | d503245f bti c
> | d503233f paciasp
> | a9be7bfd stp x29, x30, [sp, #-32]!
> | 910003fd mov x29, sp
> | a90153f3 stp x19, x20, [sp, #16]
> | aa0103f3 mov x19, x1
> | aa0003f4 mov x20, x0
> | d281c001 mov x1, #0xe00 // #3584
> | 940b9534 bl ffff8000082f9d90 <__asan_storeN>
> | aa1303e0 mov x0, x19
> | d281c001 mov x1, #0xe00 // #3584
> | 940b951e bl ffff8000082f9d44 <__asan_loadN>
> | aa1303e1 mov x1, x19
> | aa1403e0 mov x0, x20
> | d281c002 mov x2, #0xe00 // #3584
> | 940b98c5 bl ffff8000082fabf0 <memcpy>
> | a94153f3 ldp x19, x20, [sp, #16]
> | a8c27bfd ldp x29, x30, [sp], #32
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98bb bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_implicit_memcpy_nokasan>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98b2 bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98a7 bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
>
> For x86_64, GCC 11.1.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | 41 54 push %r12
> | 49 89 fc mov %rdi,%r12
> | 55 push %rbp
> | 48 89 f5 mov %rsi,%rbp
> | be 40 1c 00 00 mov $0x1c40,%esi
> | e8 0d 9a 32 00 call ffffffff8132b0f0 <__asan_storeN>
> | 48 89 ef mov %rbp,%rdi
> | be 40 1c 00 00 mov $0x1c40,%esi
> | e8 f0 99 32 00 call ffffffff8132b0e0 <__asan_loadN>
> | 4c 89 e7 mov %r12,%rdi
> | 48 89 ee mov %rbp,%rsi
> | b9 88 03 00 00 mov $0x388,%ecx
> | f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
> | 5d pop %rbp
> | 41 5c pop %r12
> | c3 ret
> |
> | <test_explicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 b6 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
> |
> |
> | <test_implicit_memcpy_nokasan>:
> | b9 88 03 00 00 mov $0x388,%ecx
> | f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
> | c3 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 96 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
>
> So from those examples it seems GCC falls into bucket (a), and assumes the
> blessed functions ARE NOT instrumented.
>
> We can make this noinstr-safe AND get instrumentation for the first two cases
> by removing the instrumentation from the out-of-line copies (always using
> noinstr asm implementations) and using ifdeffery to make the explicit calls
> target as distinct kasan_instrumented_memcpy() or similar...
>
>
> For arm64, clang 13.0.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f66 bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f5e bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_implicit_memcpy_nokasan>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f56 bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f4e bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
>
> For x86_64, clang 13.0.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e8 d6 94 36 00 call ffffffff8136a830 <memcpy>
> | c3 ret
> |
> | <test_explicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e8 c6 94 36 00 call ffffffff8136a830 <memcpy>
> | c3 ret
> |
> |
> | <test_implicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 b6 94 36 00 jmp ffffffff8136a830 <memcpy>
> |
> |
> | <test_explicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 a6 94 36 00 jmp ffffffff8136a830 <memcpy>
>
> ... for which the first two suggests clang thinks the blessed functions *are*
> instrumented, which means that generating calls to those in the latter two
> cases is a bug.
>
> We can make this noinstr-safe as with the GCC case, but we'll lose the
> desirable instrumentation for the test_implicit_memcpy() case.
>
> I think something has to change on the compiler side here (e.g. as per
> options above), and we should align GCC and clang on the same
> approach...
>
> Thanks,
> Mark.
On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote:
> I think we're talking past each other here, so let me be more precise. :)
>
> The key thing is that when the user passes `-fsantize=address`, instrumentation
> is added by (a part of) the compiler. That instrumentation is added under some
> assumptions as to how the compiler as a whole will behave.
>
> With that in mind, the question is how is __attribute__((no_sanitize_address))
> intended to work when considering all the usual expectations around how the
> compiler can play with memcpy and similar?
no_sanitize_address or lack thereof is whether the current function
shouldn't be or should be ASan instrumented, not on whether other functions
it calls are instrumented or not. memcpy/memmove/memset are just a tiny bit
special case because the compiler can add them on their own even if they
aren't present in the source (there are a few others the compiler can
pattern match too) and various builtins can be on the other side expanded
inline instead of called, so one then gets the sanitization status of the
function in which it is used rather than whether the out of line
implementation of the function is sanitized.
If coexistence of instrumented and non-instrumented memcpy etc. was the goal
(it clearly wasn't), then the sanitizer libraries wouldn't be overriding
memcpy calls, but instead the compiler would redirect calls to memcpy in
instrumented functions to say __asan_memcpy which then would be
instrumented.
> Given the standard doesn't say *anything* about instrumentation, what does GCC
> *require* instrumentation-wise of the memcpy implementation? What happens *in
> practice* today?
>
> For example, is the userspace implementation of memcpy() instrumented for
> AddressSanitizer, or not?
It is, for all functions, whether compiled with -fsanitize=address or not,
if user app is linked with -fsanitize=address, libasan is linked in and
overrides the libc memcpy with its instrumented version.
Note that the default shadow memory value is 0 which means accessible,
so even when memcpy is instrumented, when called from non-instrumented code
often it will not do anything beyond normal behavior, as non-instrumented
functions don't poison the paddings in between variable (there even doesn't
have to be any) etc. But e.g. malloc/operator new etc. is also overridden,
so buffer overflows/underflows on memory allocated that way from
uninstrumented code using memcpy etc. will be diagnosed.
Jakub
On Mon, Mar 28, 2022 at 06:16:37PM +0200, Peter Zijlstra wrote:
> On Mon, Mar 28, 2022 at 10:59:57AM -0500, Segher Boessenkool wrote:
[ Context added back: ]
> > > My argument is: if the compiler is permitted to implictly and
> > > arbitrarily add calls to instrumented functions within a function marked
> > > with `no_sanitize_address`, the `no_sanitize_address` attribute is
> > > effectively useless, and therefore *something* needs to change.
> I do not see how that follows. Maybe that is obvious from how you look
> > at your use case, but it is not from the viewpoint of people who just
> > want to do sanitation.
>
> It's a substitution issue:
>
> either:
>
> memcpy() equals: "asan instrumentation" + "memcpy implementation"
>
> or:
>
> memcpy() equals: "memcpy implementation"
>
> It can not be both, since they're not equivalent.
Equivalent in what sense? ASAN (like any other sanitizer) does not
change the semantics of valid programs *at all*. And invalid programs
do not have semantics, of course.
> So if the compiler does the substitution, it needs some sense of
> equivalence. All we're asking is that it be consistent (my preference is
> for the latter).
If you want to never do sanitation, there is -fno-sanitize=all. But
that obviously is not what you want either.
> > So what is the goal here? Why do you need to
> > prevent sanitation on anything called from this function, at all cost?
>
> Kernel entry code might not have reached a point where instrumentation
> assumptions are valid yet. Consider calling into C before the kernel
> page-tables are swapped in. KASAN instrumentation would insta-explode
> simply because the environment it expects (the shadow data etc..) isn't
> there.
Ah. Something like the proposed global boolean flag would work fine for
that, afaics? Have all the asan implementation functions just return
until the "I am ready now" flag is set. This is trivial overhead,
compared to having asan at all!
Segher
On Mon, Mar 28, 2022 at 12:20:39PM +0200, Jakub Jelinek wrote:
> On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote:
> > I think we're talking past each other here, so let me be more precise. :)
> >
> > The key thing is that when the user passes `-fsantize=address`, instrumentation
> > is added by (a part of) the compiler. That instrumentation is added under some
> > assumptions as to how the compiler as a whole will behave.
> >
> > With that in mind, the question is how is __attribute__((no_sanitize_address))
> > intended to work when considering all the usual expectations around how the
> > compiler can play with memcpy and similar?
>
> no_sanitize_address or lack thereof is whether the current function
> shouldn't be or should be ASan instrumented, not on whether other functions
> it calls are instrumented or not.
I understand this. :)
> memcpy/memmove/memset are just a tiny bit special case because the
> compiler can add them on their own even if they aren't present in the
> source (there are a few others the compiler can pattern match too) and
> various builtins can be on the other side expanded inline instead of
> called, so one then gets the sanitization status of the function in
> which it is used rather than whether the out of line implementation of
> the function is sanitized.
Yep, and this is the point I'm getting at. If the compiler *implicitly*
generates a call to one of these from a function which was marked with
__attribute__((no_sanitize_address)), then either:
1) This renders __attribute__((no_sanitize_address)) useless, since
instrumentation is being added in a way the code author cannot
reliably prevent. If so, I'd argue that this is a compiler bug, and
that the transformation of inserting the call is unsound.
2) There's an expectation that those out-of-line implementations are
*NOT* instrumented, and this is fine so long as those are not
instrumented.
I appreciate that this isn't necessarily something which was considered
when the feature was originally designed, and maybe this is fine for
userspace, but for kernel usage we need to be able to reliably prevent
instrumentation.
> If coexistence of instrumented and non-instrumented memcpy etc. was the goal
> (it clearly wasn't), then the sanitizer libraries wouldn't be overriding
> memcpy calls, but instead the compiler would redirect calls to memcpy in
> instrumented functions to say __asan_memcpy which then would be
> instrumented.
FWIW, I think that approach would be fine for kernel usage.
> > Given the standard doesn't say *anything* about instrumentation, what does GCC
> > *require* instrumentation-wise of the memcpy implementation? What happens *in
> > practice* today?
> >
> > For example, is the userspace implementation of memcpy() instrumented for
> > AddressSanitizer, or not?
>
> It is, for all functions, whether compiled with -fsanitize=address or not,
> if user app is linked with -fsanitize=address, libasan is linked in and
> overrides the libc memcpy with its instrumented version.
Thanks for confirming! Just to check, how does libasan prevent recursing
within itself on implicitly generated calls to memcpy and friends? Is
anything special done to build the libasan code, is that all asm, or
something else?
> Note that the default shadow memory value is 0 which means accessible,
> so even when memcpy is instrumented, when called from non-instrumented code
> often it will not do anything beyond normal behavior
That might be true in userspace, but is not the case within the kernel
as the shadow might not be mapped (and is one reason we need to inhibit
instrumentation in the first place).
For example, consider the code which *creates* the page tables for the
shadow memory. Until that's run, accessing the shadow will result in a
(fatal) fault, and while the compilation unit with that code is compiled
*without* `-fsantize=address`, it may call into the common
implementation of memcpy() and friends shared by all kernel code.
Thanks,
Mark.
> , as non-instrumented
> functions don't poison the paddings in between variable (there even doesn't
> have to be any) etc. But e.g. malloc/operator new etc. is also overridden,
> so buffer overflows/underflows on memory allocated that way from
> uninstrumented code using memcpy etc. will be diagnosed.
>
> Jakub
>
On Mon, Mar 28, 2022 at 09:22:20AM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote:
> > On Fri, Mar 25, 2022 at 10:12:38AM -0500, Segher Boessenkool wrote:
> > > The compiler isn't assuming anything about asan. The compiler generates
> > > its code without any consideration of what asan will or will not do.
> > > The burden of making things work is on asan.
> >
> > I think we're talking past each other here, so let me be more precise. :)
> >
> > The key thing is that when the user passes `-fsantize=address`, instrumentation
> > is added by (a part of) the compiler. That instrumentation is added under some
> > assumptions as to how the compiler as a whole will behave.
> >
> > With that in mind, the question is how is __attribute__((no_sanitize_address))
> > intended to work when considering all the usual expectations around how the
> > compiler can play with memcpy and similar?
>
> The attribute is about how the *current* function is instrumented, not
> about anything called by this function. This is clearly documented:
> 'no_sanitize_address'
> 'no_address_safety_analysis'
> The 'no_sanitize_address' attribute on functions is used to inform
> the compiler that it should not instrument memory accesses in the
> function when compiling with the '-fsanitize=address' option. The
> 'no_address_safety_analysis' is a deprecated alias of the
> 'no_sanitize_address' attribute, new code should use
> 'no_sanitize_address'.
I understand this, and I have read the documentation.
I'm not claiming any *individual* semantic is wrong, just that in
combination this doesn't provide what people *need* (even if it strictly
matches what is documented).
My argument is: if the compiler is permitted to implictly and
arbitrarily add calls to instrumented functions within a function marked
with `no_sanitize_address`, the `no_sanitize_address` attribute is
effectively useless, and therefore *something* needs to change.
> > > The compiler should not do anything differently here if it uses asan.
> > > The address sanitizer and the memcpy function implementation perhaps
> > > have to cooperate somehow, or asan needs more smarts. This needs to
> > > happen no matter what, to support other things calling memcpy, say,
> > > assembler code.
> >
> > I appreciate where you're coming from here, but I think you're approaching the
> > problem sideways.
>
> I am stating facts, I am not trying to solve your problem there. It
> seemed to me (and still does) that you didn't grasp all facts here.
Sorry, but I think you're reading my replies uncharitably if you think
that.
> > We need to define *what the semantics are* so that we can actually solve the
> > problem, e.g. is a memcpy implementation expected to be instrumented or not?
>
> That is up to the memcpy implementation itself, of course.
Sorry, but that doesn't make sense to me. When the compiler instruments
a function with AddressSanitizer, it must have *some* assumption about
whether memcpy() itself will be instrumented, such that it won't miss
some necessary instrumentation (and ideally, for performance reasons
doesn't have redundant instrumentation).
If the story is "memcpy may or may not be instrumented", then the only
way to guarantee necessary instrumentation is for the compiler to
*always* place it in the caller (unless forbidden by
`no_sanitize_address`). If that were the case, the kernel can make
things work by simply not instrumenting memcpy and friends.
IIUC today those assumptions are not documented. Is the behaviour
consistent?
> > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
> > > standard-specified semantics). This means that it will have the same
> > > semantics as __builtin_memcpy always, and either or not be a call to an
> > > external function. It can also create calls to it out of thin air.
> >
> > I understand all of that.
>
> And still you want us to do something that is impossible under those
> existing constraints :-(
If that's truly impossible, that's very unfortunate.
FWIW, I can believe this would require tremendous effort to change, even
if it's not truly impossible.
If that is the case, it means that kernel side we have to never
instrument our implementation of memcpy and friends for correctness
reasons, which has the unfortunate property of losing coverage in the
cases we *would* like to use an instrumented memcpy.
> If you want the external memcpy called by modules A, B, C to not be
> instrumented, you have to link A, B, and C against an uninstrumented
> memcpy. This is something the kernel will have to do, the compiler has
> no say in how the kernel is linked together.
Unfortunately that options doesn't really fix the `no_sanitize_address`
semantic, and forces us to move *all* uninstrumentable code out into
separate compilation units, etc.
This isn't *impossible*, but is *very* painful.
Thanks,
Mark.
On Mon, Mar 28, 2022 at 01:55:38PM +0100, Mark Rutland wrote:
> > If coexistence of instrumented and non-instrumented memcpy etc. was the goal
> > (it clearly wasn't), then the sanitizer libraries wouldn't be overriding
> > memcpy calls, but instead the compiler would redirect calls to memcpy in
> > instrumented functions to say __asan_memcpy which then would be
> > instrumented.
>
> FWIW, I think that approach would be fine for kernel usage.
>
> > > Given the standard doesn't say *anything* about instrumentation, what does GCC
> > > *require* instrumentation-wise of the memcpy implementation? What happens *in
> > > practice* today?
> > >
> > > For example, is the userspace implementation of memcpy() instrumented for
> > > AddressSanitizer, or not?
> >
> > It is, for all functions, whether compiled with -fsanitize=address or not,
> > if user app is linked with -fsanitize=address, libasan is linked in and
> > overrides the libc memcpy with its instrumented version.
>
> Thanks for confirming! Just to check, how does libasan prevent recursing
> within itself on implicitly generated calls to memcpy and friends? Is
> anything special done to build the libasan code, is that all asm, or
> something else?
Generally, most of the libasan wrappers look like
do something
call the original libc function (address from dlsym/dlvsym)
do something
and the "do something" code isn't that complicated. The compiler doesn't
add calls to memcpy/memset etc. just to screw up users, they are added
for a reason, such as copying or clearing very large aggregates (including
for passing them by value), without -Os it will rarely use calls for smaller
sizes and will instead expand them inline.
For malloc and the like wrappers I think it uses some TLS
recursion counters so that say malloc called from dlsym doesn't cause
problems.
Now, one way for the kernel to make kasan work (more) reliably even with
existing compilers without special tweaks for this might be if those
calls to no_sanitize_address code aren't mixed with sanitized code all the
time might be set some per-thread flag when starting a "no sanitized" code
execution and clear it at the end of the region (or vice versa) and test
those flags in the kernel's memcpy/memset etc. implementation to decide if
they should be sanitized or not.
As KASAN is (hopefully) just a kernel debugging aid and not something meant
for production (in the userspace at least GCC strongly recommends against
using the sanitizers in production), perhaps allocating one per-thread bool
or int and changing it in a few spots and testing in the library functions
might be acceptable.
Jakub
Hi!
On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote:
> On Fri, Mar 25, 2022 at 10:12:38AM -0500, Segher Boessenkool wrote:
> > The compiler isn't assuming anything about asan. The compiler generates
> > its code without any consideration of what asan will or will not do.
> > The burden of making things work is on asan.
>
> I think we're talking past each other here, so let me be more precise. :)
>
> The key thing is that when the user passes `-fsantize=address`, instrumentation
> is added by (a part of) the compiler. That instrumentation is added under some
> assumptions as to how the compiler as a whole will behave.
>
> With that in mind, the question is how is __attribute__((no_sanitize_address))
> intended to work when considering all the usual expectations around how the
> compiler can play with memcpy and similar?
The attribute is about how the *current* function is instrumented, not
about anything called by this function. This is clearly documented:
'no_sanitize_address'
'no_address_safety_analysis'
The 'no_sanitize_address' attribute on functions is used to inform
the compiler that it should not instrument memory accesses in the
function when compiling with the '-fsanitize=address' option. The
'no_address_safety_analysis' is a deprecated alias of the
'no_sanitize_address' attribute, new code should use
'no_sanitize_address'.
> > The compiler should not do anything differently here if it uses asan.
> > The address sanitizer and the memcpy function implementation perhaps
> > have to cooperate somehow, or asan needs more smarts. This needs to
> > happen no matter what, to support other things calling memcpy, say,
> > assembler code.
>
> I appreciate where you're coming from here, but I think you're approaching the
> problem sideways.
I am stating facts, I am not trying to solve your problem there. It
seemed to me (and still does) that you didn't grasp all facts here.
> We need to define *what the semantics are* so that we can actually solve the
> problem, e.g. is a memcpy implementation expected to be instrumented or not?
That is up to the memcpy implementation itself, of course.
> > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
> > standard-specified semantics). This means that it will have the same
> > semantics as __builtin_memcpy always, and either or not be a call to an
> > external function. It can also create calls to it out of thin air.
>
> I understand all of that.
And still you want us to do something that is impossible under those
existing constraints :-(
If you want the external memcpy called by modules A, B, C to not be
instrumented, you have to link A, B, and C against an uninstrumented
memcpy. This is something the kernel will have to do, the compiler has
no say in how the kernel is linked together.
Segher
On Mon, Mar 28, 2022 at 12:20:39PM +0200, Jakub Jelinek wrote:
> On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote:
> > I think we're talking past each other here, so let me be more precise. :)
> >
> > The key thing is that when the user passes `-fsantize=address`, instrumentation
> > is added by (a part of) the compiler. That instrumentation is added under some
> > assumptions as to how the compiler as a whole will behave.
> >
> > With that in mind, the question is how is __attribute__((no_sanitize_address))
> > intended to work when considering all the usual expectations around how the
> > compiler can play with memcpy and similar?
>
> no_sanitize_address or lack thereof is whether the current function
> shouldn't be or should be ASan instrumented, not on whether other functions
> it calls are instrumented or not. memcpy/memmove/memset are just a tiny bit
> special case because the compiler can add them on their own even if they
> aren't present in the source (there are a few others the compiler can
> pattern match too) and various builtins can be on the other side expanded
> inline instead of called, so one then gets the sanitization status of the
> function in which it is used rather than whether the out of line
> implementation of the function is sanitized.
>
> If coexistence of instrumented and non-instrumented memcpy etc. was the goal
> (it clearly wasn't), then the sanitizer libraries wouldn't be overriding
> memcpy calls, but instead the compiler would redirect calls to memcpy in
> instrumented functions to say __asan_memcpy which then would be
> instrumented.
This then leaves us holding the pieces because this behaviour is
actively wrong.
A non-instrumented function *MUST*NOT* call an instrumented function,
ever. This very much violates how we use/expect
__attribute__((no_sanitize_address)) to work.
If we use that on a function, we expect/rely on that function (nor any
compiler tranformation thereof) to *NOT* have instrumentation. This is a
hard correctness requirement that cannot be argued with.
So there's two options:
A) compiler generates implicit mem*() calls with the knowledge that
mem*() is not instrumentet, and as such will also emit
instrumentation for it when so required (or calls mem*_asan() like
functions.
B) compiler knows mem*() are instrumented, at which point the implicit
mem*() calls are no longer equivalent under
__attribute__((no_sanitize_address)) and will no longer perform
these substitutions.
At some point this becomes a choice between being able to boot or having
KASAN, choice is simple.
On Mon, Mar 28, 2022 at 03:12:14PM +0200, Jakub Jelinek wrote:
> On Mon, Mar 28, 2022 at 01:55:38PM +0100, Mark Rutland wrote:
> > > If coexistence of instrumented and non-instrumented memcpy etc. was the goal
> > > (it clearly wasn't), then the sanitizer libraries wouldn't be overriding
> > > memcpy calls, but instead the compiler would redirect calls to memcpy in
> > > instrumented functions to say __asan_memcpy which then would be
> > > instrumented.
> >
> > FWIW, I think that approach would be fine for kernel usage.
> >
> > > > Given the standard doesn't say *anything* about instrumentation, what does GCC
> > > > *require* instrumentation-wise of the memcpy implementation? What happens *in
> > > > practice* today?
> > > >
> > > > For example, is the userspace implementation of memcpy() instrumented for
> > > > AddressSanitizer, or not?
> > >
> > > It is, for all functions, whether compiled with -fsanitize=address or not,
> > > if user app is linked with -fsanitize=address, libasan is linked in and
> > > overrides the libc memcpy with its instrumented version.
> >
> > Thanks for confirming! Just to check, how does libasan prevent recursing
> > within itself on implicitly generated calls to memcpy and friends? Is
> > anything special done to build the libasan code, is that all asm, or
> > something else?
>
> Generally, most of the libasan wrappers look like
> do something
> call the original libc function (address from dlsym/dlvsym)
> do something
> and the "do something" code isn't that complicated.
I see; thanks!
> The compiler doesn't add calls to memcpy/memset etc. just to screw up
> users, they are added for a reason, such as copying or clearing very
> large aggregates (including for passing them by value), without -Os it
> will rarely use calls for smaller sizes and will instead expand them
> inline.
Sure; I understand that and (from my side at least) I'm not arguing that
there's malice or so on, just that I don't think we currently have the
tools for the kernel to be able to do the right thing reliably and
robustly. Thanks for helping! :)
> For malloc and the like wrappers I think it uses some TLS recursion
> counters so that say malloc called from dlsym doesn't cause problems.
>
> Now, one way for the kernel to make kasan work (more) reliably even with
> existing compilers without special tweaks for this might be if those
> calls to no_sanitize_address code aren't mixed with sanitized code all the
> time might be set some per-thread flag when starting a "no sanitized" code
> execution and clear it at the end of the region (or vice versa) and test
> those flags in the kernel's memcpy/memset etc. implementation to decide if
> they should be sanitized or not.
Unfortunately, I don't think the setting a flag is workable, since e.g.
we need to ensure the flag is set before any implicitly-generated calls,
and I don't think we have a reliable way to do that from C. There's also
a number of portions of uninstrumentable code, so from a maintainability
and robustness PoV this option isn't great.
For `noinstr` code specifically (which gets placed into a distinct
section and can be identified by virtual address) we could have the
out-of-line functions look at their return address, but that's not going
to cover the general case of `__attribute__((no_sanitize_address))` or
compilation units built without `-fsanitize=address`.
From my PoV, distinguishing instrumentable/uninstrumentable calls at
compile time would be ideal. That, or placing the instrumentation into
the caller (omitting it when instrumentation is disabled for that
caller), and expecting the out-of-line forms are never instrumented. I
appreciate that latter option may not be workable due to potential size
bloat, though.
Similar will apply to TSAN, MSAN, etc, and I'm not sure whether those
are mutually exclusive from the compiler's PoV.
> As KASAN is (hopefully) just a kernel debugging aid and not something meant
> for production (in the userspace at least GCC strongly recommends against
> using the sanitizers in production), perhaps allocating one per-thread bool
> or int and changing it in a few spots and testing in the library functions
> might be acceptable.
I'm not sure whether folk are using KASAN in production, but IIRC there
was a desire to do so.
Thanks,
Mark.
On Mon, Mar 28, 2022 at 03:58:10PM +0100, Mark Rutland wrote:
> On Mon, Mar 28, 2022 at 09:22:20AM -0500, Segher Boessenkool wrote:
> > The attribute is about how the *current* function is instrumented, not
> > about anything called by this function. This is clearly documented:
> > 'no_sanitize_address'
> > 'no_address_safety_analysis'
> > The 'no_sanitize_address' attribute on functions is used to inform
> > the compiler that it should not instrument memory accesses in the
> > function when compiling with the '-fsanitize=address' option. The
> > 'no_address_safety_analysis' is a deprecated alias of the
> > 'no_sanitize_address' attribute, new code should use
> > 'no_sanitize_address'.
>
> I understand this, and I have read the documentation.
>
> I'm not claiming any *individual* semantic is wrong, just that in
> combination this doesn't provide what people *need* (even if it strictly
> matches what is documented).
>
> My argument is: if the compiler is permitted to implictly and
> arbitrarily add calls to instrumented functions within a function marked
> with `no_sanitize_address`, the `no_sanitize_address` attribute is
> effectively useless, and therefore *something* needs to change.
I do not see how that follows. Maybe that is obvious from how you look
at your use case, but it is not from the viewpoint of people who just
want to do sanitation. So what is the goal here? Why do you need to
prevent sanitation on anything called from this function, at all cost?
> > > I appreciate where you're coming from here, but I think you're approaching the
> > > problem sideways.
> >
> > I am stating facts, I am not trying to solve your problem there. It
> > seemed to me (and still does) that you didn't grasp all facts here.
>
> Sorry, but I think you're reading my replies uncharitably if you think
> that.
Not at all. I just don't see what your problem is, and what you try to
achieve. I do know what you say you want, but that is clearly
impossible to do: the compiler cannot put restrictions on what some
external function will or won't do!
> > > We need to define *what the semantics are* so that we can actually solve the
> > > problem, e.g. is a memcpy implementation expected to be instrumented or not?
> >
> > That is up to the memcpy implementation itself, of course.
>
> Sorry, but that doesn't make sense to me. When the compiler instruments
> a function with AddressSanitizer, it must have *some* assumption about
> whether memcpy() itself will be instrumented, such that it won't miss
> some necessary instrumentation (and ideally, for performance reasons
> doesn't have redundant instrumentation).
Yes. It sets things up with an external memcpy that is sanitized. But
that happens at the linking stage: it is fine in general for user space,
but for kasan you need to do something similar manually.
> If the story is "memcpy may or may not be instrumented", then the only
> way to guarantee necessary instrumentation is for the compiler to
> *always* place it in the caller (unless forbidden by
> `no_sanitize_address`). If that were the case, the kernel can make
> things work by simply not instrumenting memcpy and friends.
It is *impossible* (in general) to put this in the caller, and it is not
how this stuff is designed either (of course).
> IIUC today those assumptions are not documented. Is the behaviour
> consistent?
The documentation I quoted above is simple and clear enough I hope.
Consistent? Consistent with what?
> > > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
> > > > standard-specified semantics). This means that it will have the same
> > > > semantics as __builtin_memcpy always, and either or not be a call to an
> > > > external function. It can also create calls to it out of thin air.
> > >
> > > I understand all of that.
> >
> > And still you want us to do something that is impossible under those
> > existing constraints :-(
>
> If that's truly impossible, that's very unfortunate.
>
> FWIW, I can believe this would require tremendous effort to change, even
> if it's not truly impossible.
No. Truly and trivially impossible. You must want something else than
what you say, but I cannot figure out what.
To implement what you ask for we will have to build every function
twice, once with and once without instrumentation, and emit both
somewhere, somehow. This requires either some ABI extensions, or file
format extensions ("fat objects"), or at the minimum some copperation
between the app (the kernel here), the compiler, and the linker (and
more tools in general, but, kernel :-) )
> If that is the case, it means that kernel side we have to never
> instrument our implementation of memcpy and friends for correctness
> reasons, which has the unfortunate property of losing coverage in the
> cases we *would* like to use an instrumented memcpy.
What correctness reasons are that?
> > If you want the external memcpy called by modules A, B, C to not be
> > instrumented, you have to link A, B, and C against an uninstrumented
> > memcpy. This is something the kernel will have to do, the compiler has
> > no say in how the kernel is linked together.
>
> Unfortunately that options doesn't really fix the `no_sanitize_address`
> semantic, and forces us to move *all* uninstrumentable code out into
> separate compilation units, etc.
Yes. Which follows directly from you not wanting to call anything
instrumented from anything marked with such an attribute: you have to
divide the world into two parts, if you want the world to be divided
into two parts.
> This isn't *impossible*, but is *very* painful.
Yes. It is doable if there is just a handful of things that you do not
want instrumented. Like low-level interrupt handlers. Luckily those
things are generally written in assembler language for other reasons.
Segher
On Mon, Mar 28, 2022 at 10:59:57AM -0500, Segher Boessenkool wrote:
> I do not see how that follows. Maybe that is obvious from how you look
> at your use case, but it is not from the viewpoint of people who just
> want to do sanitation.
It's a substitution issue:
either:
memcpy() equals: "asan instrumentation" + "memcpy implementation"
or:
memcpy() equals: "memcpy implementation"
It can not be both, since they're not equivalent.
So if the compiler does the substitution, it needs some sense of
equivalence. All we're asking is that it be consistent (my preference is
for the latter).
> So what is the goal here? Why do you need to
> prevent sanitation on anything called from this function, at all cost?
Kernel entry code might not have reached a point where instrumentation
assumptions are valid yet. Consider calling into C before the kernel
page-tables are swapped in. KASAN instrumentation would insta-explode
simply because the environment it expects (the shadow data etc..) isn't
there.
On Fri, Mar 25, 2022 at 10:12:38AM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Mar 25, 2022 at 03:13:36PM +0100, Peter Zijlstra wrote:
> >
> > +linux-toolchains
> >
> > On Fri, Mar 25, 2022 at 12:15:28PM +0000, Mark Rutland wrote:
> > > a) The compiler expects the out-of-line implementations of functions
> > > ARE NOT instrumented by address-sanitizer.
> > >
> > > If this is the case, then it's legitimate for the compiler to call
> > > these functions anywhere, and we should NOT instrument the kernel
> > > implementations of these. If the compiler wants those instrumented it
> > > needs to add the instrumentation in the caller.
>
> The compiler isn't assuming anything about asan. The compiler generates
> its code without any consideration of what asan will or will not do.
> The burden of making things work is on asan.
I think we're talking past each other here, so let me be more precise. :)
The key thing is that when the user passes `-fsantize=address`, instrumentation
is added by (a part of) the compiler. That instrumentation is added under some
assumptions as to how the compiler as a whole will behave.
With that in mind, the question is how is __attribute__((no_sanitize_address))
intended to work when considering all the usual expectations around how the
compiler can play with memcpy and similar?
I think the answer to that is "this hasn't been thought about in great detail",
which leads to the question of "how could/should this be made to work?", which
is what I'm on about below.
> It is legitimate to call (or not call!) memcpy anywhere. memcpy always
> is __builtin_memcpy, which either or not does a function call.
>
> > > AFAICT The two options for the compiler here are:
> > >
> > > 1) Always inline an uninstrumented form of the function in this case
> > >
> > > 2) Have distinct instrumented/uninstrumented out-of-line
> > > implementations, and call the uninstrumented form in this case.
>
> The compiler should not do anything differently here if it uses asan.
> The address sanitizer and the memcpy function implementation perhaps
> have to cooperate somehow, or asan needs more smarts. This needs to
> happen no matter what, to support other things calling memcpy, say,
> assembler code.
I appreciate where you're coming from here, but I think you're approaching the
problem sideways.
> > > So from those examples it seems GCC falls into bucket (a), and assumes the
> > > blessed functions ARE NOT instrumented.
>
> No, it doesn't show GCC assumes anything. No testing of this kind can
> show anything alike.
I appreciate that; hence "it seems".
What I'm getting at is that the *instrumentation* is added under some
assumptions (those of whoever wrote the instrumentation code), and those
assumptions might not match the behaviour of the compiler, or the behaviour we
expect for __attribute__((no_sanitize_address)).
We need to define *what the semantics are* so that we can actually solve the
problem, e.g. is a memcpy implementation expected to be instrumented or not?
> > > I think something has to change on the compiler side here (e.g. as per
> > > options above), and we should align GCC and clang on the same
> > > approach...
>
> GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
> standard-specified semantics). This means that it will have the same
> semantics as __builtin_memcpy always, and either or not be a call to an
> external function. It can also create calls to it out of thin air.
I understand all of that.
Given the standard doesn't say *anything* about instrumentation, what does GCC
*require* instrumentation-wise of the memcpy implementation? What happens *in
practice* today?
For example, is the userspace implementation of memcpy() instrumented for
AddressSanitizer, or not?
Thanks,
Mark.
On Mon, Mar 28, 2022 at 02:44PM +0100, Mark Rutland wrote:
[...]
> From my PoV, distinguishing instrumentable/uninstrumentable calls at
> compile time would be ideal. That, or placing the instrumentation into
> the caller (omitting it when instrumentation is disabled for that
> caller), and expecting the out-of-line forms are never instrumented. I
> appreciate that latter option may not be workable due to potential size
> bloat, though.
That's what user space ASan with Clang already does:
https://godbolt.org/z/ro1Y8E59e
, where it prefixes mem*() calls with __asan_. Only
-fsanitize=kernel-address has (unfortunately) been taught to omit the
prefix. I don't have any info on the history of this, but given the
issues that have surfaced in this thread, it's clearly the wrong thing.
We'll be adding an option to undo this behaviour:
https://reviews.llvm.org/D122724
It's unfortunate yet another option is required, but it's the only way
to retain compatibility with older kernels that don't yet understand
__asan_mem*() functions.
For the kernel GCC then should also:
1. omit the prefix for `-fsanitize=kernel-address` (current behaviour)
2. add the prefix for `-fsanitize=kernel-address --param asan-kernel-mem-intrinsic-prefix`
But because the ASan user space runtime already understands the
__asan-prefixed versions, I imagine GCC can also do to align with Clang:
3. add the prefix for normal `-fsanitize=address`
Thanks,
-- Marco