2020-09-04 16:53:55

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

Use get_unaligned and put_unaligned for the small constant size cases
in the generic uaccess routines. This ensures they can be used for
architectures that do not support unaligned loads and stores, while
being a no-op for those that do.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/asm-generic/uaccess.h | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index cc3b2c8b68fab4..768502bbfb154e 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -36,19 +36,17 @@ raw_copy_from_user(void *to, const void __user * from, unsigned long n)
if (__builtin_constant_p(n)) {
switch(n) {
case 1:
- *(u8 *)to = *(u8 __force *)from;
+ *(u8 *)to = get_unaligned((u8 __force *)from);
return 0;
case 2:
- *(u16 *)to = *(u16 __force *)from;
+ *(u16 *)to = get_unaligned((u16 __force *)from);
return 0;
case 4:
- *(u32 *)to = *(u32 __force *)from;
+ *(u32 *)to = get_unaligned((u32 __force *)from);
return 0;
-#ifdef CONFIG_64BIT
case 8:
- *(u64 *)to = *(u64 __force *)from;
+ *(u64 *)to = get_unaligned((u64 __force *)from);
return 0;
-#endif
}
}

@@ -62,19 +60,17 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n)
if (__builtin_constant_p(n)) {
switch(n) {
case 1:
- *(u8 __force *)to = *(u8 *)from;
+ put_unaligned(*(u8 *)from, (u8 __force *)to);
return 0;
case 2:
- *(u16 __force *)to = *(u16 *)from;
+ put_unaligned(*(u16 *)from, (u16 __force *)to);
return 0;
case 4:
- *(u32 __force *)to = *(u32 *)from;
+ put_unaligned(*(u32 *)from, (u32 __force *)to);
return 0;
-#ifdef CONFIG_64BIT
case 8:
- *(u64 __force *)to = *(u64 *)from;
+ put_unaligned(*(u64 *)from, (u64 __force *)to);
return 0;
-#endif
default:
break;
}
--
2.28.0


2020-09-04 18:06:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Fri, Sep 4, 2020 at 6:52 PM Christoph Hellwig <[email protected]> wrote:
>
> Use get_unaligned and put_unaligned for the small constant size cases
> in the generic uaccess routines. This ensures they can be used for
> architectures that do not support unaligned loads and stores, while
> being a no-op for those that do.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> include/asm-generic/uaccess.h | 20 ++++++++------------
> 1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
> index cc3b2c8b68fab4..768502bbfb154e 100644
> --- a/include/asm-generic/uaccess.h
> +++ b/include/asm-generic/uaccess.h
> @@ -36,19 +36,17 @@ raw_copy_from_user(void *to, const void __user * from, unsigned long n)
> if (__builtin_constant_p(n)) {
> switch(n) {
> case 1:
> - *(u8 *)to = *(u8 __force *)from;
> + *(u8 *)to = get_unaligned((u8 __force *)from);
> return 0;
> case 2:
> - *(u16 *)to = *(u16 __force *)from;
> + *(u16 *)to = get_unaligned((u16 __force *)from);
> return 0;

The change look correct and necessary, but I wonder if this could be done
in a way that is a little easier on the compiler than the nested switch/case.

If I see it right, __put_user() and __get_user() can probably
be reduced to a plain put_unaligned() and get_unaligned() here,
which would simplify these a lot.

In turn it seems that the generic raw_copy_to_user() can just be the
a plain memcpy(), IIRC the optimization for small sizes should also
be done by modern compilers whenever they can.

Arnd

2020-09-04 18:07:52

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Fri, Sep 04, 2020 at 06:52:11PM +0200, Christoph Hellwig wrote:
> Use get_unaligned and put_unaligned for the small constant size cases
> in the generic uaccess routines. This ensures they can be used for
> architectures that do not support unaligned loads and stores, while
> being a no-op for those that do.

Frankly, I would rather get rid of those constant-sized cases entirely;
sure, we'd need to adjust asm-generic/uaccess.h defaults for __get_user(),
but there that kind of stuff would make sense; in raw_copy_from_user()
it really doesn't.

2020-09-04 22:36:29

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Fri, Sep 04, 2020 at 07:06:17PM +0100, Al Viro wrote:
> On Fri, Sep 04, 2020 at 06:52:11PM +0200, Christoph Hellwig wrote:
> > Use get_unaligned and put_unaligned for the small constant size cases
> > in the generic uaccess routines. This ensures they can be used for
> > architectures that do not support unaligned loads and stores, while
> > being a no-op for those that do.
>
> Frankly, I would rather get rid of those constant-sized cases entirely;
> sure, we'd need to adjust asm-generic/uaccess.h defaults for __get_user(),
> but there that kind of stuff would make sense; in raw_copy_from_user()
> it really doesn't.

FWIW, we have asm-generic/uaccess.h used by
arc
c6x
hexagon
riscv/!MMU
um
by direct includes from asm/uaccess.h
h8300
picked as default from asm-generic, in place of absent native uaccess.h

In asm-generic/uaccess.h we have
raw_copy_from_user(): CONFIG_UACCESS_MEMCPY
[h8300, riscv with your series]
raw_copy_to_user(): CONFIG_UACCESS_MEMCP
[h8300, riscv with your series]
set_fs group: MAKE_MM_SEG KERNEL_DS USER_DS set_fs() get_fs() uaccess_kernel()
all, really
access_ok()/__access_ok() (unless overridden)
[c6x/!CONFIG_ACCESS_CHECK h8300 riscv]
__put_user()/put_user()
all, implemented via __put_user_fn()
__put_user_fn(): raw_copy_to_user(), unless overridden [all except arc]
__get_user()/get_user()
all, implemented via __get_user_fn()
__get_user_fn(): raw_copy_from_user(), unless overridden [all except arc]
__strncpy_from_user() (unless overridden) [c6x h8300 riscv]
strncpy_from_user()
__strnlen_user() (unless overridden) [c6x h8300 riscv]
strnlen_user()
__clear_user() (unless overridden) [c6x h8300 riscv]
clear_user()

__strncpy_from_user()/__strnlen_user()/__clear_user() are never used outside
of arch/*, and there almost all callers are non-__ variants of the same.
Exceptions:
arch/hexagon/include/asm/uaccess.h:76: long res = __strnlen_user(src, n);
racy implementation of __strncpy_from_user()
arch/c6x/kernel/signal.c:157: err |= __clear_user(&frame->uc, offsetof(struct ucontext, uc_mcontext));
arch/x86/include/asm/fpu/internal.h:367: err = __clear_user(&buf->header, sizeof(buf->header));
arch/x86/kernel/fpu/signal.c:138: if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))

and that's it.

Now, if you look at raw_copy_from_user() you'll see an interesting
picture: some architectures special-case the handling of small constant sizes.
Namely,
arc (any size; inlining in there is obscene, constant size or not),
c6x (1,4,8),
m68k/MMU (1,2,3,4,5,6,7,8,9,10,12)
ppc (1,2,4,8),
h8300 (1,2,4),
riscv (with your series)(1,2,4, 8 if 64bit).

If you look at the callers of e.g. raw_copy_from_user(), you'll
see this:
* default __get_user_fn() [relevant on c6x, h8300 and riscv - in
all cases it should be doing get_unaligned() instead]
* __copy_from_user_inatomic()
* __copy_from_user()
* copy_from_user() in case of INLINE_COPY_FROM_USER [relevant on
arc, c6x and m68k]
* lib/* callers, all with non-constant sizes.

Callers of __copy_from_user_inatomic() on relevant architectures, excluding the
ones with non-constant (or huge - several get PAGE_SIZE) sizes:
* [ppc] p9_hmi_special_emu() - 16 bytes read; not recognized as special
* [riscv] user_backtrace() - 2 words read; not recognized as special
* __copy_from_user_inatomic_nocache()
* arch_perf_out_copy_user()

All callers of __copy_from_user_inatomic_nocache() pass it non-constant sizes.
arch_perf_out_copy_user() is called (via layers of preprocessor abuse) via
__output_copy_user(), which gets non-constant size.

Callers of __copy_from_user() potentially hitting those:
arch/arc/kernel/signal.c:108: err = __copy_from_user(&set, &sf->uc.uc_sigmask, sizeof(set));
arch/c6x/kernel/signal.c:82: if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
arch/h8300/kernel/signal.c:114: if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
arch/m68k/kernel/signal.c:340: if (__copy_from_user(current->thread.fpcntl,
arch/m68k/kernel/signal.c:794: __copy_from_user(&set.sig[1], &frame->extramask,
arch/m68k/kernel/signal.c:817: if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
arch/powerpc/kernel/signal_64.c:688: if (__copy_from_user(&set, &new_ctx->uc_sigmask, sizeof(set)))
arch/powerpc/kernel/signal_64.c:719: if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
arch/powerpc/kvm/book3s_64_mmu_hv.c:1864: if (__copy_from_user(&hdr, buf, sizeof(hdr)))
arch/riscv/kernel/signal.c:113: if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2244: if (__copy_from_user(&fence, user++, sizeof(fence))) {
include/linux/regset.h:256: } else if (__copy_from_user(data, *ubuf, copy))

The last one is user_regset_copyin() and it's going to die.
A bunch of signal-related ones are in in sigreturn variants, reading
sigset_t. Considering that shitloads of data get copied nearby for
each such call, I would be surprised if those would be worth bothering
with. Remaining ppc case is kvm_htab_write(), which just might be
hot enough to care; we are copying a 64bit structure, and it might
be worth reading it as a single 64bit. And i915 is reading 64bit
objects in a loop. Hell knows, might or might not be hot.

copy_from_user() callers on arc, c6x and m68k boil down to one case:
arch/arc/kernel/disasm.c:37: bytes_not_copied = copy_from_user(ins_buf,
8-byte read. And that's it.

IOW, there's a scattering of potentially valid uses that might be better
off with get_user(). And there's generic non-MMU variant that gets
used in get_user()/__get_user() on h8300 and riscv. This one *is*
valid, but I don't think that raw_copy_from_user() is the right layer
for that.

For raw_copy_to_user() the picture is similar. And I'd like to get
rid of that magical crap. Let's not make it harder...

2020-09-05 07:15:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Fri, Sep 04, 2020 at 08:04:34PM +0200, Arnd Bergmann wrote:
> > if (__builtin_constant_p(n)) {
> > switch(n) {
> > case 1:
> > - *(u8 *)to = *(u8 __force *)from;
> > + *(u8 *)to = get_unaligned((u8 __force *)from);
> > return 0;
> > case 2:
> > - *(u16 *)to = *(u16 __force *)from;
> > + *(u16 *)to = get_unaligned((u16 __force *)from);
> > return 0;
>
> The change look correct and necessary, but I wonder if this could be done
> in a way that is a little easier on the compiler than the nested switch/case.
>
> If I see it right, __put_user() and __get_user() can probably
> be reduced to a plain put_unaligned() and get_unaligned() here,
> which would simplify these a lot.
>
> In turn it seems that the generic raw_copy_to_user() can just be the
> a plain memcpy(), IIRC the optimization for small sizes should also
> be done by modern compilers whenever they can.

Sure, I can look into that.

2020-09-05 14:44:03

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Fri, Sep 04, 2020 at 11:35:18PM +0100, Al Viro wrote:

> Now, if you look at raw_copy_from_user() you'll see an interesting
> picture: some architectures special-case the handling of small constant sizes.
> Namely,
> arc (any size; inlining in there is obscene, constant size or not),
> c6x (1,4,8),
> m68k/MMU (1,2,3,4,5,6,7,8,9,10,12)
> ppc (1,2,4,8),
> h8300 (1,2,4),
> riscv (with your series)(1,2,4, 8 if 64bit).

FWIW, on the raw_copy_to_user() side the same set of constant sizes is
recongized by the same architectures and we have
* __put_user/put_user in asm-generic/uaccess.h make use of that
* arc, c6x, ppc and riscv using it to store sigset_t on sigframe
* 3 odd callers:
* arc stash_usr_regs(), inlined and unrolled large copy_to_user()
* ppc kvm_htab_read(), 64bit store.
* i915_gem_execbuffer_ioctl():
if (__copy_to_user(&user_exec_list[i].offset,
&exec2_list[i].offset,
sizeof(user_exec_list[i].offset)))
in a loop. 'offset' here is __u64.

That's it. IOW, asm-generic put_user() is the only real cause to have those
magic sizes recognized on raw_copy_to_user() side.

2020-09-07 08:11:00

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from,to}_user

On Sat, Sep 5, 2020 at 12:35 AM Al Viro <[email protected]> wrote:
>
> On Fri, Sep 04, 2020 at 07:06:17PM +0100, Al Viro wrote:
> > On Fri, Sep 04, 2020 at 06:52:11PM +0200, Christoph Hellwig wrote:
> > > Use get_unaligned and put_unaligned for the small constant size cases
> > > in the generic uaccess routines. This ensures they can be used for
> > > architectures that do not support unaligned loads and stores, while
> > > being a no-op for those that do.
> >
> > Frankly, I would rather get rid of those constant-sized cases entirely;
> > sure, we'd need to adjust asm-generic/uaccess.h defaults for __get_user(),
> > but there that kind of stuff would make sense; in raw_copy_from_user()
> > it really doesn't.

Right. When I originally wrote that part of asm-generic/uaccess.h, the
idea was that its __get_user()/__put_user() would end up being used
across most architectures, which then would only have to implement
custom __copy_from_user()/__copy_to_user() with those special cases
to get the optimum behavior. It didn't work out in the end, since
copy_from_user() also has to deal with unaligned or partial copies
that prevent it from degrading into a single instruction on anything
other than the simplest NOMMU architectures.

I'd still hope we can eventually come up with a generic
__get_user()/__put_user() helper that avoids all the common
architecture specific bugs in them, with a simpler way for
an architecture to hook into with a set of inline functions
while leaving the macros to common code, but that can be
done another time.

> IOW, there's a scattering of potentially valid uses that might be better
> off with get_user(). And there's generic non-MMU variant that gets
> used in get_user()/__get_user() on h8300 and riscv. This one *is*
> valid, but I don't think that raw_copy_from_user() is the right layer
> for that.
>
> For raw_copy_to_user() the picture is similar. And I'd like to get
> rid of that magical crap. Let's not make it harder...

Agreed

Arnd

2020-09-07 19:02:09

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from, to}_user

> Re: [PATCH 3/8] asm-generic: fix unaligned access hamdling in raw_copy_{from, to}_user

nit: handling

--Sean