We have several primitives for bulk kernel<->userland copying.
That stuff lives in various asm/uaccess.h, with serious code duplication
_and_ seriously inconsistent semantics.
That code has grown a lot of cruft and more than a few bugs.
Some got caught and fixed last year, but some fairly unpleasant ones
still remain. A large part of problem was that a lot of code used to
include <asm/uaccess.h> directly, so we had no single place to work
with. That got finally fixed in 4.10-rc1, when everything had been
forcibly switched to including <linux/uaccess.h>. At that point it
became possible to start getting rid of boilerplate; I hoped to deal
with that by 4.11-rc1, but the things didn't work out and that work
has slipped to this cycle.
The patchset currently in vfs.git#work.uaccess is the result;
there's more work to do, but it takes care of a large part of the
problems. About 2.8KLoc removed, a lot of cruft is gone and semantics
is hopefully in sync now. All but two architectures (ia64 and metag)
had been switched to new mechanism; for these two I'm afraid that I'll
need serious help from maintainers.
Currently we have 8 primitives - 6 on every architecture and 2 more
on biarch ones. All of them have the same calling conventions: arguments
are the same as for memcpy() (void *to, const void *from, unsigned long size)
and the same rules for return value.
If all loads and stores succeed, everything is obvious - the
'size' bytes starting at 'to' become equal to 'size' bytes starting at 'from'
and zero is returned. If some loads or stores fail, non-zero value should
be returned. If any of those primitives returns a positive value N,
* N should be no greater than size
* the values fetched out of from[0..size-N-1] should be stored into the
corresponding bytes of to[0..size-N-1]
* N should not be equal to size unless not a single byte could have
been fetched or stored. As long as that restriction is satisfied, these
primitives are not required to squeeze every possible byte in case some
loads or stores fail.
1) copy_from_user() - 'to' points to kernel memory, 'from' is
normally a userland pointer. This is used for copying structures from
userland in all kinds of ioctls, etc. No faults on access to destination are
allowed, faults on access to source lead to zero-padding the rest of
destination. Note that for architectures with the same address space split
between the kernel and userland (i.e. the ones that have non-trivial
access_ok()) passing a kernel address instead of a userland one should be
treated as 'every access would fail'. In such cases the entire destination
should be zeroed (failure to do so was a fairly common bug).
Note that all these functions, including copy_from_user(), are
affected by set_fs() - when called under set_fs(KERNEL_DS), they expect
kernel pointers where normally a userland one would be given.
2) copy_to_user() - 'from' points to kernel memory, 'to' is
a userland pointer (subject to set_fs() effects, as usual). Again.
this is used by all kinds of code in all kinds of drivers, syscalls, etc.
No faults on access to source, fault on access to destination terminates
copying. No zero-padding, of course - the faults are going to be on store
here. Does not assume that access_ok() had been checked by caller;
given 'to'/'size' that fails access_ok() returns "nothing copied".
3) copy_in_user() - both 'from' and 'to' are in userland. Used
only by compat code that needs to repack 32bit data structures into native
64bit counterparts. As the result, provided only by biarch architectures.
Subject to set_fs(), but really should not be (and AFAICS isn't) used that way.
Some architectures tried to zero-pad, but did it inconsistently and it's
pointless anyway - destination is in userland memory, so no infoleaks would
happen.
4) __copy_from_user_inatomic() - similar to copy_from_user(),
except that
* the caller is presumed to have verified that the source range passes
access_ok() [note that this is does not guarantee the lack of faults]
* most importantly, zero-padding SHOULD NOT happen on short copy.
If implementation fails to guarantee that, it's a bug and potentially
bad one[1].
* it may be called with pagefaults disabled (of course, in
that case any pagefault results in a short copy). That's what 'inatomic'
in the name refers to. Note that actually disabling pagefaults is
up to the caller; blindly calling it e.g. from under a spinlock will just
get you a deadlock. Even more precautions are needed to call it from
something like an interrupt handler - you must do that under set_fs(),
etc. It's not "this variant is safe to call from atomic contexts", it's
"I know what I'm doing, don't worry if you see it in an atomic context".
5) __copy_to_user_inatomic(). A counterpart of
__copy_from_user_inatomic(), except for the direction of copying.
6) __copy_from_user(). Essentially the only difference from
__copy_from_user_inatomic() is that one isn't supposed to call it from
atomic contexts. It may be marginally faster than copy_from_user() (due
to skipped access_ok()), but these days the main costs are not in doing
fairly light arithmetics. In theory, you might do a single access_ok()
covering a large structure and then proceed to call __copy_from_user()
on various parts of that. In practice doing many calls of that thing on
small chunks of data is going to cost a lot on current x86 boxen due to
STAC/CLAC pair inside each call. Has fewer call sites than copy_from_user()
- copy_from_user() is in thousands, while this one has only 40 callers
outside of arch/, some fairly dubious. In arch there's about 170 callers
total, mostly in sigreturn instances.
7) __copy_to_user(). A counterpart of __copy_from_user(), with
pretty much the same considerations applied.
8) __copy_in_user(). Basically, copy_in_user() sans access_ok().
Biarch-only, with the grand total of 6 callers...
What this series does is:
* convert architectures to fewer primitives (raw_copy_{to,from,in}_user(),
the last one only on biarch ones), switching to generic implementations
of the 8 primitives aboves via raw_... ones. Those generic implementations
are in linux/uaccess.h (and lib/usercopy.c). Architecture provides
raw_... ones, selects ARCH_HAS_RAW_COPY_USER and it's done.
* all object size check, kasan, etc. instrumentation is taken care of
in linux/uaccess.h; no need to touch it in arch/*
* consistent semantics wrt zero-padding - none of the raw_... do any of
that, copy_from_user() does (outside of fast path).
At the moment I have that conversion done for everything except ia64 and
metag. Once everything is converted, I'll remove ARCH_HAS_RAW_COPY_USER
and make generic stuff unconditional; at the same point
HAVE_ARCH_HARDENED_USERCOPY will be gone (becoming unconditionally true).
The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
in #work.uaccess. It's based at 4.11-rc1. Infrastructure is in
#uaccess.stem, then it splits into per-architecture branches (uaccess.<arch>),
eventually merged into #work.uaccess. Some stuff (including a cherry-picked
mips build fix) is in #uaccess.misc, also merged into the final.
I hope that infrastructure part is stable enough to put it into never-rebased
state. Some of per-architecture branches might be even done right; however,
most of them got no testing whatsoever, so any help with testing (as well
as "Al, for fuck sake, dump that garbage of yours, here's the correct patch"
from maintainers) would be very welcome. So would the review, of course.
In particular, the fix in uaccess.parisc should be replaced with the stuff
Helge posted on parisc list, probably along with the get_user/put_user
patches. I've put my variant of fix there as a stopgap; switch of pa_memcpy()
to assembler is clearly the right way to solve it and I'll be happy to
switch to that as soon as parisc folks settle on the final version of that
stuff.
For most of the oddball architectures I have no way to test that stuff, so
please treat the asm-affecting patches in there as a starting point for
doing it right. Some might even work as is - stranger things had happened,
but don't count ont it.
And again, metag and ia64 parts are simply not there - both architectures
zero-pad in __copy_from_user_inatomic() and that really needs fixing.
In case of metag there's __copy_to_user() breakage as well, AFAICS, and
I've been unable to find any documentation describing the architecture
wrt exceptions, and that part is apparently fairly weird. In case of
ia64... I can test mckinley side of things, but not the generic __copy_user()
and ia64 is about as weird as it gets. With no reliable emulator, at that...
So these two are up to respective maintainers.
Other things not there:
* unification of strncpy_from_user() and friends. Probably next
cycle.
* anything to do with uaccess_begin/unsafe accesses/uaccess_end
stuff. Definitely next cycle.
I'm not sure if mailbombing linux-arch would be a good idea; there are
90 patches in that pile, with total size nearly half a megabyte. If anyone
wants that posted, I'll do so, but it might be more convenient to just
use git.
Comments, review, testing, replacement patches, etc. are very welcome.
Al "hates assembers, dozens of them" Viro
[1] Nick Piggin has spotted that bug back in early 2000s, fixed it for
i386 and hadn't bothered to do anything about other architectures (including
amd64, for crying out loud!). Since then we had inconsistent behaviour
between the architectures. Results of those bugs range from transient bogus
values observed in mmap() if you get memory pressure combined with bad timing
to outright fs corruption, if the timing is *really* bad. All architectures
used to have it, hopefully this series will take care of the last stragglers.
On 03/28/2017 10:57 PM, Al Viro wrote:
> We have several primitives for bulk kernel<->userland copying.
> That stuff lives in various asm/uaccess.h, with serious code duplication
> _and_ seriously inconsistent semantics.
>
> That code has grown a lot of cruft and more than a few bugs.
> Some got caught and fixed last year, but some fairly unpleasant ones
> still remain. A large part of problem was that a lot of code used to
> include <asm/uaccess.h> directly, so we had no single place to work
> with. That got finally fixed in 4.10-rc1, when everything had been
> forcibly switched to including <linux/uaccess.h>. At that point it
> became possible to start getting rid of boilerplate; I hoped to deal
> with that by 4.11-rc1, but the things didn't work out and that work
> has slipped to this cycle.
>
> The patchset currently in vfs.git#work.uaccess is the result;
> there's more work to do, but it takes care of a large part of the
> problems. About 2.8KLoc removed, a lot of cruft is gone and semantics
> is hopefully in sync now. All but two architectures (ia64 and metag)
> had been switched to new mechanism; for these two I'm afraid that I'll
> need serious help from maintainers.
>
> Currently we have 8 primitives - 6 on every architecture and 2 more
> on biarch ones. All of them have the same calling conventions: arguments
> are the same as for memcpy() (void *to, const void *from, unsigned long size)
> and the same rules for return value.
> If all loads and stores succeed, everything is obvious - the
> 'size' bytes starting at 'to' become equal to 'size' bytes starting at 'from'
> and zero is returned. If some loads or stores fail, non-zero value should
> be returned. If any of those primitives returns a positive value N,
> * N should be no greater than size
> * the values fetched out of from[0..size-N-1] should be stored into the
> corresponding bytes of to[0..size-N-1]
> * N should not be equal to size unless not a single byte could have
> been fetched or stored. As long as that restriction is satisfied, these
> primitives are not required to squeeze every possible byte in case some
> loads or stores fail.
>
> 1) copy_from_user() - 'to' points to kernel memory, 'from' is
> normally a userland pointer. This is used for copying structures from
> userland in all kinds of ioctls, etc. No faults on access to destination are
> allowed, faults on access to source lead to zero-padding the rest of
> destination. Note that for architectures with the same address space split
> between the kernel and userland (i.e. the ones that have non-trivial
> access_ok()) passing a kernel address instead of a userland one should be
> treated as 'every access would fail'. In such cases the entire destination
> should be zeroed (failure to do so was a fairly common bug).
> Note that all these functions, including copy_from_user(), are
> affected by set_fs() - when called under set_fs(KERNEL_DS), they expect
> kernel pointers where normally a userland one would be given.
>
> 2) copy_to_user() - 'from' points to kernel memory, 'to' is
> a userland pointer (subject to set_fs() effects, as usual). Again.
> this is used by all kinds of code in all kinds of drivers, syscalls, etc.
> No faults on access to source, fault on access to destination terminates
> copying. No zero-padding, of course - the faults are going to be on store
> here. Does not assume that access_ok() had been checked by caller;
> given 'to'/'size' that fails access_ok() returns "nothing copied".
>
> 3) copy_in_user() - both 'from' and 'to' are in userland. Used
> only by compat code that needs to repack 32bit data structures into native
> 64bit counterparts. As the result, provided only by biarch architectures.
> Subject to set_fs(), but really should not be (and AFAICS isn't) used that way.
> Some architectures tried to zero-pad, but did it inconsistently and it's
> pointless anyway - destination is in userland memory, so no infoleaks would
> happen.
>
> 4) __copy_from_user_inatomic() - similar to copy_from_user(),
> except that
> * the caller is presumed to have verified that the source range passes
> access_ok() [note that this is does not guarantee the lack of faults]
> * most importantly, zero-padding SHOULD NOT happen on short copy.
> If implementation fails to guarantee that, it's a bug and potentially
> bad one[1].
> * it may be called with pagefaults disabled (of course, in
> that case any pagefault results in a short copy). That's what 'inatomic'
> in the name refers to. Note that actually disabling pagefaults is
> up to the caller; blindly calling it e.g. from under a spinlock will just
> get you a deadlock. Even more precautions are needed to call it from
> something like an interrupt handler - you must do that under set_fs(),
> etc. It's not "this variant is safe to call from atomic contexts", it's
> "I know what I'm doing, don't worry if you see it in an atomic context".
>
> 5) __copy_to_user_inatomic(). A counterpart of
> __copy_from_user_inatomic(), except for the direction of copying.
>
> 6) __copy_from_user(). Essentially the only difference from
> __copy_from_user_inatomic() is that one isn't supposed to call it from
> atomic contexts. It may be marginally faster than copy_from_user() (due
> to skipped access_ok()), but these days the main costs are not in doing
> fairly light arithmetics. In theory, you might do a single access_ok()
> covering a large structure and then proceed to call __copy_from_user()
> on various parts of that. In practice doing many calls of that thing on
> small chunks of data is going to cost a lot on current x86 boxen due to
> STAC/CLAC pair inside each call. Has fewer call sites than copy_from_user()
> - copy_from_user() is in thousands, while this one has only 40 callers
> outside of arch/, some fairly dubious. In arch there's about 170 callers
> total, mostly in sigreturn instances.
>
> 7) __copy_to_user(). A counterpart of __copy_from_user(), with
> pretty much the same considerations applied.
>
> 8) __copy_in_user(). Basically, copy_in_user() sans access_ok().
> Biarch-only, with the grand total of 6 callers...
>
>
> What this series does is:
>
> * convert architectures to fewer primitives (raw_copy_{to,from,in}_user(),
> the last one only on biarch ones), switching to generic implementations
> of the 8 primitives aboves via raw_... ones. Those generic implementations
> are in linux/uaccess.h (and lib/usercopy.c). Architecture provides
> raw_... ones, selects ARCH_HAS_RAW_COPY_USER and it's done.
>
> * all object size check, kasan, etc. instrumentation is taken care of
> in linux/uaccess.h; no need to touch it in arch/*
>
> * consistent semantics wrt zero-padding - none of the raw_... do any of
> that, copy_from_user() does (outside of fast path).
>
> At the moment I have that conversion done for everything except ia64 and
> metag. Once everything is converted, I'll remove ARCH_HAS_RAW_COPY_USER
> and make generic stuff unconditional; at the same point
> HAVE_ARCH_HARDENED_USERCOPY will be gone (becoming unconditionally true).
>
> The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
> in #work.uaccess. It's based at 4.11-rc1. Infrastructure is in
> #uaccess.stem, then it splits into per-architecture branches (uaccess.<arch>),
> eventually merged into #work.uaccess. Some stuff (including a cherry-picked
> mips build fix) is in #uaccess.misc, also merged into the final.
>
> I hope that infrastructure part is stable enough to put it into never-rebased
> state. Some of per-architecture branches might be even done right; however,
> most of them got no testing whatsoever, so any help with testing (as well
> as "Al, for fuck sake, dump that garbage of yours, here's the correct patch"
> from maintainers) would be very welcome. So would the review, of course.
>
> In particular, the fix in uaccess.parisc should be replaced with the stuff
> Helge posted on parisc list, probably along with the get_user/put_user
> patches. I've put my variant of fix there as a stopgap; switch of pa_memcpy()
> to assembler is clearly the right way to solve it and I'll be happy to
> switch to that as soon as parisc folks settle on the final version of that
> stuff.
>
> For most of the oddball architectures I have no way to test that stuff, so
> please treat the asm-affecting patches in there as a starting point for
> doing it right. Some might even work as is - stranger things had happened,
> but don't count ont it.
>
> And again, metag and ia64 parts are simply not there - both architectures
> zero-pad in __copy_from_user_inatomic() and that really needs fixing.
> In case of metag there's __copy_to_user() breakage as well, AFAICS, and
> I've been unable to find any documentation describing the architecture
> wrt exceptions, and that part is apparently fairly weird. In case of
> ia64... I can test mckinley side of things, but not the generic __copy_user()
> and ia64 is about as weird as it gets. With no reliable emulator, at that...
> So these two are up to respective maintainers.
>
> Other things not there:
> * unification of strncpy_from_user() and friends. Probably next
> cycle.
> * anything to do with uaccess_begin/unsafe accesses/uaccess_end
> stuff. Definitely next cycle.
>
> I'm not sure if mailbombing linux-arch would be a good idea; there are
> 90 patches in that pile, with total size nearly half a megabyte. If anyone
> wants that posted, I'll do so, but it might be more convenient to just
> use git.
>
> Comments, review, testing, replacement patches, etc. are very welcome.
>
> Al "hates assembers, dozens of them" Viro
Hi Al,
Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to
existing 2 variants (inline/out-of-line) we already have.
I've added a patch for that (attached too) - boot tested the series on ARC.
------->
>From 29205ba126468986fcee0d12dba6b5f831506803 Mon Sep 17 00:00:00 2001
From: Vineet Gupta <[email protected]>
Date: Wed, 29 Mar 2017 11:53:33 -0700
Subject: [PATCH] ARC: uaccess: enable INLINE_COPY_{TO,FROM}_USER ...
... and switch to generic out of line version in lib/usercopy.c
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/uaccess.h | 16 ++++++----------
arch/arc/mm/extable.c | 14 --------------
2 files changed, 6 insertions(+), 24 deletions(-)
diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h
index c4d26e8a21b3..f35974ee7264 100644
--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -168,7 +168,7 @@
static inline unsigned long
-__arc_copy_from_user(void *to, const void __user *from, unsigned long n)
+raw_copy_from_user(void *to, const void __user *from, unsigned long n)
{
long res = 0;
char val;
@@ -395,7 +395,7 @@ __arc_copy_from_user(void *to, const void __user *from,
unsigned long n)
}
static inline unsigned long
-__arc_copy_to_user(void __user *to, const void *from, unsigned long n)
+raw_copy_to_user(void __user *to, const void *from, unsigned long n)
{
long res = 0;
char val;
@@ -721,24 +721,20 @@ static inline long __arc_strnlen_user(const char __user *s,
long n)
}
#ifndef CONFIG_CC_OPTIMIZE_FOR_SIZE
-#define raw_copy_from_user __arc_copy_from_user
-#define raw_copy_to_user __arc_copy_to_user
+
+#define INLINE_COPY_TO_USER
+#define INLINE_COPY_FROM_USER
+
#define __clear_user(d, n) __arc_clear_user(d, n)
#define __strncpy_from_user(d, s, n) __arc_strncpy_from_user(d, s, n)
#define __strnlen_user(s, n) __arc_strnlen_user(s, n)
#else
-extern long arc_copy_from_user_noinline(void *to, const void __user * from,
- unsigned long n);
-extern long arc_copy_to_user_noinline(void __user *to, const void *from,
- unsigned long n);
extern unsigned long arc_clear_user_noinline(void __user *to,
unsigned long n);
extern long arc_strncpy_from_user_noinline (char *dst, const char __user *src,
long count);
extern long arc_strnlen_user_noinline(const char __user *src, long n);
-#define raw_copy_from_user arc_copy_from_user_noinline
-#define raw_copy_to_user arc_copy_to_user_noinline
#define __clear_user(d, n) arc_clear_user_noinline(d, n)
#define __strncpy_from_user(d, s, n) arc_strncpy_from_user_noinline(d, s, n)
#define __strnlen_user(s, n) arc_strnlen_user_noinline(s, n)
diff --git a/arch/arc/mm/extable.c b/arch/arc/mm/extable.c
index c86906b41bfe..72125a34e780 100644
--- a/arch/arc/mm/extable.c
+++ b/arch/arc/mm/extable.c
@@ -28,20 +28,6 @@ int fixup_exception(struct pt_regs *regs)
#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
-long arc_copy_from_user_noinline(void *to, const void __user *from,
- unsigned long n)
-{
- return __arc_copy_from_user(to, from, n);
-}
-EXPORT_SYMBOL(arc_copy_from_user_noinline);
-
-long arc_copy_to_user_noinline(void __user *to, const void *from,
- unsigned long n)
-{
- return __arc_copy_to_user(to, from, n);
-}
-EXPORT_SYMBOL(arc_copy_to_user_noinline);
-
unsigned long arc_clear_user_noinline(void __user *to,
unsigned long n)
{
--
2.7.4
On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote:
> Hi Al,
>
> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to
> existing 2 variants (inline/out-of-line) we already have.
> I've added a patch for that (attached too) - boot tested the series on ARC.
BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
It's probably arch-dependent and it would be nice if somebody compared
performance with and without inlining those... ARC, in particular, has
__arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant
size and your patch, AFAICS, will inline all of it in *all* cases. It might
end up being a win, but that's not apriori obvious... Do you have any
profiling results in that area?
On Wed, Mar 29, 2017 at 1:29 PM, Al Viro <[email protected]> wrote:
>
> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
I would suggest against it.
The only part I think is worth inlining is the compile time size
checks for kasan - and that only because of the obvious "sizes are
constant only when inlining" issue.
We used to inline a *lot* of user accesses historically, pretty much
all of them were bogus.
The only ones that really want inlining are the non-checking ones that
people should never use directly, but that are just helper things used
by other routines (ie the "unsafe_copy_from_user()" kind of things
that are designed for strncpy_from_user()).
Once you start checking access ranges, and have might_fault debugging
etc, it shouldn't be inlined.
Linus
On Wed, Mar 29, 2017 at 01:37:30PM -0700, Linus Torvalds wrote:
> On Wed, Mar 29, 2017 at 1:29 PM, Al Viro <[email protected]> wrote:
> >
> > BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
>
> I would suggest against it.
>
> The only part I think is worth inlining is the compile time size
> checks for kasan - and that only because of the obvious "sizes are
> constant only when inlining" issue.
>
> We used to inline a *lot* of user accesses historically, pretty much
> all of them were bogus.
>
> The only ones that really want inlining are the non-checking ones that
> people should never use directly, but that are just helper things used
> by other routines (ie the "unsafe_copy_from_user()" kind of things
> that are designed for strncpy_from_user()).
>
> Once you start checking access ranges, and have might_fault debugging
> etc, it shouldn't be inlined.
FWIW, that's why I'd put those knobs (INLINE_COPY_{TO,FROM}_USER) in there;
if for some architectures making those inlined is really a win, they can
request the inlining; for now I'd mostly set them to match what architectures
had been doing, but I also strongly suspect that in a lot of cases that
inlining is counterproductive. Building it both ways is simply a matter of
deleting those two lines in asm/uaccess.h in question, and if testing
shows that out-of-line works better on given architecture, well...
I would expect that the final variant will have those remain only on a few
architectures. IMO decision whether to inline them or not is up to
architecture - it's not as if having the possibility to inline them
would really complicate the generic side of things...
On 03/29/2017 01:29 PM, Al Viro wrote:
> On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote:
>
>> Hi Al,
>>
>> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to
>> existing 2 variants (inline/out-of-line) we already have.
>> I've added a patch for that (attached too) - boot tested the series on ARC.
>
> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
Just to be clear, your series was doing this for everyone.
> It's probably arch-dependent and it would be nice if somebody compared
> performance with and without inlining those... ARC, in particular, has
> __arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant
> size and your patch, AFAICS, will inline all of it in *all* cases.
Yes we do inline all of it: the non-constant case is actually simpler, it is a
simple byte loop.
" mov.f lp_count, %0 \n"
" lpnz 3f \n"
" ldb.ab %1, [%3, 1] \n"
"1: stb.ab %1, [%2, 1] \n"
" sub %0, %0, 1 \n"
Doing it out of line (3 args) will be 4 instructions anyways.
For constant size, there's laddered copy for blocks of 16 bytes + stragglers 1-15.
We do "manual" constant propagation there to compile time optimize away the
straggler part. But yes all of this is emitted inline.
> It might
> end up being a win, but that's not apriori obvious... Do you have any
> profiling results in that area?
Unfortunately not at the moment. The reason for adding out-of-line variant was not
so much as performance but to improve the footprint for -Os case (some customer I
think).
On Wed, Mar 29, 2017 at 2:03 PM, Al Viro <[email protected]> wrote:
>
> it's not as if having the possibility to inline them
> would really complicate the generic side of things...
I disagree.
I think one of the biggest problems with our current uaccess.h mess is
just how illegible the header files are, and the
INLINE_COPY_{TO,FROM}_USER thing is not helping.
I think it would be much better if the header file just had
extern unsigned long _copy_from_user(void *, const void __user *,
unsigned long);
and nothing else. No unnecessary noise.
The same goes for things like [__]copy_in_user() - why is that thing
still inlined? If it was a *macro*, it might be useful due to the
might_fault() thing giving the caller information, but that's not even
the case here, so we'd actually be much better off without any of that
inlining stuff. Do it all in lib/usercopy.c, and move the
might_fault() in there too.
Linus
On Wed, Mar 29, 2017 at 02:24:37PM -0700, Linus Torvalds wrote:
> I think one of the biggest problems with our current uaccess.h mess is
> just how illegible the header files are, and the
> INLINE_COPY_{TO,FROM}_USER thing is not helping.
>
> I think it would be much better if the header file just had
>
> extern unsigned long _copy_from_user(void *, const void __user *,
> unsigned long);
>
> and nothing else. No unnecessary noise.
>
> The same goes for things like [__]copy_in_user() - why is that thing
> still inlined? If it was a *macro*, it might be useful due to the
> might_fault() thing giving the caller information, but that's not even
> the case here, so we'd actually be much better off without any of that
> inlining stuff. Do it all in lib/usercopy.c, and move the
> might_fault() in there too.
IMO that's a separate series. For now I would be bloody happy if we got
* arch-dependent asm fixes out of the way
* everything consolidated outside of arch/*
* arch/*/include/uaccess*.h simplified.
As for __copy_in_user()... I'm not sure we want to keep it in the long run -
drivers/gpu/drm/drm_ioc32.c:390: if (__copy_in_user(buf, argp, offsetof(drm_buf_desc32_t, agp_start))
drivers/gpu/drm/drm_ioc32.c:399: if (__copy_in_user(argp, buf, offsetof(drm_buf_desc32_t, agp_start))
drivers/gpu/drm/drm_ioc32.c:475: if (__copy_in_user(&to[i], &list[i],
drivers/gpu/drm/drm_ioc32.c:536: if (__copy_in_user(&list32[i], &list[i],
fs/compat_ioctl.c:753: if (__copy_in_user(&tdata->read_write, &udata->read_write, 2 * sizeof(u8)))
fs/compat_ioctl.c:755: if (__copy_in_user(&tdata->size, &udata->size, 2 * sizeof(u32)))
are all callers out there. And looking at those callers... fs/compat_ioctl.c
ones are ridiculous - they translate
struct i2c_smbus_ioctl_data {
__u8 read_write;
__u8 command;
__u32 size;
union i2c_smbus_data __user *data;
};
into
struct i2c_smbus_ioctl_data32 {
u8 read_write;
u8 command;
u32 size;
compat_caddr_t data; /* union i2c_smbus_data *data */
};
by doing
* 2 byte copy (read_write + command -> read_write + command)
* 8 byte copy (size + data -> size + half of data; WTF 8 and not 4?)
* 4 byte load (data)
* 8 byte store (data)
That gem went into the tree in 2003, apparently as a quick hack from
benh, and never had been touched since then. IMO inlining is very far
down the list of, er, deficiencies there. If anything, it would be
better off with a single copy_from_user() into a local union, followed by
something like foo.native.data = compat_ptr(foo.compat.data) and
copy_to_user() into tdata. And that's assuming we won't be better off
with proper ->compat_ioctl() for that sucker - AFAICS, there's a bunch
of I2C_... stuff understood by fs/compat_ioctl.c, all for the sake of
one driver. I'll look into that tonight...
As for the drm ones, I don't see any reasons for them not to be copy_in_user().
If any of that is the hot path (the last two are in loops), we have worse
problems with STAC/CLAC anyway.
So I'm not sure if __copy_in_user() shouldn't just die. copy_in_user()
might be a good candidate for move to lib/usercopy.c; I'm somewhat worried
about sparc64, though. access_ok() is a no-op there, so on the builds
without lockdep where might_fault() is a no-op we get pointless extra
jump for no good reason. I would like to see comments from davem on that
one...
On Wed, Mar 29, 2017 at 02:14:22PM -0700, Vineet Gupta wrote:
> > BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
>
> Just to be clear, your series was doing this for everyone.
Huh? It's just that most of architectures *were* inlining that;
arc change was unintentional (copy_from_user/copy_to_user went
uninlined, which your patch deals with), but it's not that I'm forcing
inlining on every architecture out there.
> > It might
> > end up being a win, but that's not apriori obvious... Do you have any
> > profiling results in that area?
>
> Unfortunately not at the moment. The reason for adding out-of-line variant was not
> so much as performance but to improve the footprint for -Os case (some customer I
> think).
Just to make it clear - I'm less certain than Linus that uninlined is uniformly
better, but I have a strong suspicion that on most architectures it *is*.
And not just in terms of kernel size - I would expect better speed as well.
The only reason why these knobs are there is that I want to separate the
"who should switch to uninlined" from this series and allow for the possibility
that for some architectures inlined will really turn out to be better.
I do _not_ expect that there'll be many of those; if it turns out that there's
none, I'll be only glad to make the guts of copy_{to,from}_user() always
out of line.
IOW your patch reverts an unintentional change of behaviour, but I really
wonder if that (out-of-line guts of copy_{to,from}_user) isn't an overall
win for arc. I've applied your patch, but it would be nice if you could
arrange for testing with and without inlining and post the results. The
same goes for all architectures; again, I would expect out-of-line to end up
a win on most of them.
On Wed, Mar 29, 2017 at 4:09 PM, Al Viro <[email protected]> wrote:
>
> IMO that's a separate series. For now I would be bloody happy if we got
> * arch-dependent asm fixes out of the way
> * everything consolidated outside of arch/*
> * arch/*/include/uaccess*.h simplified.
Sure, I agree.
At the same time, I just think that we really *should* aim for a
simpler uaccess.h in the long term, so I would prefer we not encourage
architectures to do things that simply won't matter.
> As for __copy_in_user()... I'm not sure we want to keep it in the long run -
I agree, it's probably not worth it at all.
In fact, I suspect none of the "__copy_.*_user()" versions are worth
it, and we should strive to remove them.
There aren't even that many users, and they _have_ caused security
issues when people have had some path that hasn't checked the range.
Linus
On 03/29/2017 04:42 PM, Al Viro wrote:
> On Wed, Mar 29, 2017 at 02:14:22PM -0700, Vineet Gupta wrote:
>
>>> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.
>>
>> Just to be clear, your series was doing this for everyone.
>
> Huh? It's just that most of architectures *were* inlining that;
> arc change was unintentional (copy_from_user/copy_to_user went
> uninlined, which your patch deals with), but it's not that I'm forcing
> inlining on every architecture out there.
That is correct - I didn't mean to say you changed it per-se , but that I saw
INLINE_COPY* all over the place but not for ARC :-)
>>> It might
>>> end up being a win, but that's not apriori obvious... Do you have any
>>> profiling results in that area?
>>
>> Unfortunately not at the moment. The reason for adding out-of-line variant was not
>> so much as performance but to improve the footprint for -Os case (some customer I
>> think).
>
> Just to make it clear - I'm less certain than Linus that uninlined is uniformly
> better, but I have a strong suspicion that on most architectures it *is*.
> And not just in terms of kernel size - I would expect better speed as well.
> The only reason why these knobs are there is that I want to separate the
> "who should switch to uninlined" from this series and allow for the possibility
> that for some architectures inlined will really turn out to be better.
> I do _not_ expect that there'll be many of those; if it turns out that there's
> none, I'll be only glad to make the guts of copy_{to,from}_user() always
> out of line.
>
> IOW your patch reverts an unintentional change of behaviour, but I really
> wonder if that (out-of-line guts of copy_{to,from}_user) isn't an overall
> win for arc. I've applied your patch, but it would be nice if you could
> arrange for testing with and without inlining and post the results. The
> same goes for all architectures; again, I would expect out-of-line to end up
> a win on most of them.
I guess I can in next day or two - but mind you the inline version for ARC is kind
of special vs. other arches. We have this "manual" constant propagation to elide
the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. In the
out-of-line version, we loose all of that and the code needs to fall thru all the
cases. We can possibly improve that by re-arranging the checks - so exit early if
no stragglers etc ...
On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta
<[email protected]> wrote:
>
> I guess I can in next day or two - but mind you the inline version for ARC is kind
> of special vs. other arches. We have this "manual" constant propagation to elide
> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant.
I don't think that's special. We do that on x86 too, and I suspect ARC
copied it from there (or from somebody else who did it).
But at least on x86 is is limited entirely to the "__" versions, and
it's almost entirely pointless. We actually removed some of that kind
of code because it was *do* pointless, and it had just been copied
around into the "atomic" versions too.
See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h
complexity"), which did that.
The basic "__" versions still do that constant-size thing, but they
really are questionable. Exactly because it's just the "__" versions -
the *regular* "copy_to/from_user()" is an unconditional function call,
because inlining it isn't just the access operations, it's the size
check, and on modern x86 it's also the "set AC to mark the user access
as safe".
.. and many distros enable some of the might_sleep() debugging code
etc. With any of that, inlining is simply a *bad* choice.
Linus
On Wed, Mar 29, 2017 at 05:27:40PM -0700, Linus Torvalds wrote:
> The basic "__" versions still do that constant-size thing, but they
> really are questionable. Exactly because it's just the "__" versions -
> the *regular* "copy_to/from_user()" is an unconditional function call,
> because inlining it isn't just the access operations, it's the size
> check, and on modern x86 it's also the "set AC to mark the user access
> as safe".
Keep in mind that come architectures have __copy_from_user() (well,
raw_copy_from_user(), now) used in __get_user(). This is a bad idea
for a lot of reasons, and it needs to be taken care of, but I really
don't want to mix __get_user()/__put_user() stuff (there's a lot
of boilerplate in that area as well) into this series.
Infrastructure for that would have to go into the uaccess.stem, and that
would pretty much guarantee that it wouldn't get into no-rebase mode for
extra couple of weeks. As it is, uaccess.<arch> are on top of no-rebase
branch, so once architecture maintainers are happy with what's in it,
we can put it in no-rebase mode and have it pulled into that architecture's
tree. That way we can avoid any merge conflicts; fighting the conflicts
between vfs.git and random growing set of architecture trees, all the way
through -next into the merge window... <shudder>
For even more fun, there's VFS (well, fs, actually - it's in ->write_end()
instances) work depending on the __copy_from_user_inatomic() not zero-padding
anything on short copy. With the set of potential conflicts of its own,
with individual fs trees... ;-/
On Wed, 29 Mar 2017 06:57:06 +0100
Al Viro <[email protected]> wrote:
> The patchset currently in vfs.git#work.uaccess is the result;
> there's more work to do, but it takes care of a large part of the
> problems. About 2.8KLoc removed, a lot of cruft is gone and semantics
> is hopefully in sync now. All but two architectures (ia64 and metag)
> had been switched to new mechanism; for these two I'm afraid that I'll
> need serious help from maintainers.
I have tested the code in vfs.git#work.uaccess and in principle it works
for s390. I found one bug which would return an incorrect result
for copy_from_user if the access faults on the last page of the copy.
In that case the new code would return 0 instead of the remaining bytes.
This patch snippet should fix it, please just merge it into commit
"s390: get rid of zeroing, switch to RAW_COPY_USER"
diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c
index b55172c..1e5bb2b 100644
--- a/arch/s390/lib/uaccess.c
+++ b/arch/s390/lib/uaccess.c
@@ -35,7 +35,7 @@ static inline unsigned long copy_from_user_mvcos(void *x, const void __user *ptr
" nr %4,%3\n" /* %4 = (ptr + 4095) & -4096 */
" slgr %4,%1\n"
" clgr %0,%4\n" /* copy crosses next page boundary? */
- " jnh 4f\n"
+ " jnh 5f\n"
"3: .insn ss,0xc80000000000,0(%4,%2),0(%1),0\n"
"7: slgr %0,%4\n"
" j 5f\n"
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
[cc linux-kernel]
On Thu, Mar 30, 2017 at 4:25 PM, Alexey Dobriyan <[email protected]> wrote:
>> void *to, const void *from, unsigned long size
>
> Type of the last argument should be "unsigned int",
> for the following reasons:
> * on x86_64 actual copying is done as 32-bit: types flip to "unsigned"
> at some point and 32-bit registers are used in assembly
>
> * 4GB+ copies simply do not exist:
> kernel doesn't have data structures that big,
>
> huge copies are done with cond_resched() in between
> (with or without proxy pages),
>
> even if they exist, they do not work on x86_64 now and can be
> trivially wrapped as copy_from_user64() or something.
>
> VFS truncates everything at INT_MAX exactly to not deal with all
> the bugs associated with implicit type conversions.
>
> * Binary code becomes smaller: it doesn't matter for constant sizes,
> but people tend to maintain type "safety" by using size_t and
> do 64-bit arithmetic where it is unnecessary. I've sent some
> patches to networking people already and have much more
> which exterminate "size_t" from net/ and they give smaller code
> size overall.
>
> Same arguments apply to kmalloc(), strlen(), memcpy() and
> all functions which are "size_t len".
>
> There was one time when Xen(?) people asked for 64-bit memcpy(),
> I can't remember any other example.
>
> A
On Thu, Mar 30, 2017 at 02:32:12PM +0200, Martin Schwidefsky wrote:
> On Wed, 29 Mar 2017 06:57:06 +0100
> Al Viro <[email protected]> wrote:
>
> > The patchset currently in vfs.git#work.uaccess is the result;
> > there's more work to do, but it takes care of a large part of the
> > problems. About 2.8KLoc removed, a lot of cruft is gone and semantics
> > is hopefully in sync now. All but two architectures (ia64 and metag)
> > had been switched to new mechanism; for these two I'm afraid that I'll
> > need serious help from maintainers.
>
> I have tested the code in vfs.git#work.uaccess and in principle it works
> for s390. I found one bug which would return an incorrect result
> for copy_from_user if the access faults on the last page of the copy.
> In that case the new code would return 0 instead of the remaining bytes.
>
> This patch snippet should fix it, please just merge it into commit
> "s390: get rid of zeroing, switch to RAW_COPY_USER"
Done.
On Wed, Mar 29, 2017 at 04:43:05PM -0700, Linus Torvalds wrote:
> > As for __copy_in_user()... I'm not sure we want to keep it in the long run -
>
> I agree, it's probably not worth it at all.
>
> In fact, I suspect none of the "__copy_.*_user()" versions are worth
> it, and we should strive to remove them.
>
> There aren't even that many users, and they _have_ caused security
> issues when people have had some path that hasn't checked the range.
Actually, looking through those users shows some very odd places: for example,
sctp_setsockopt_bindx() does
/* Check the user passed a healthy pointer. */
if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size)))
return -EFAULT;
/* Alloc space for the address array in kernel memory. */
kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN);
if (unlikely(!kaddrs))
return -ENOMEM;
if (__copy_from_user(kaddrs, addrs, addrs_size)) {
kfree(kaddrs);
return -EFAULT;
}
The obvious question is "why not memdup_user()?" and rationale looks fishy:
* We don't use copy_from_user() for optimization: we first do the
* sanity checks (buffer size -fast- and access check-healthy
* pointer); if all of those succeed, then we can alloc the memory
* (expensive operation) needed to copy the data to kernel. Then we do
* the copying without checking the user space area
* (__copy_from_user()).
plus that:
sctp: use GFP_USER for user-controlled kmalloc
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.
This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
First of all, access_ok() for sanity checks on size is BS - on some
architectures it's constant 1 and on *all* architectures it allows a lot
more than what kmalloc() will. So it won't stop a malicious program from
getting to kmalloc() and wasting its cycles and it's not much help with
buggy ones. __GFP_NOWARN part is more interesting, but... the quoted
commit misses memdup_user() calls in other setsockopt cases on the same
sctp. So if we care about that one, we probably should care about the
rest of them as well, and I doubt that open-coding each is a good solution.
Something like kvmemdup_user(), perhaps? I.e. quiet fallback to vmalloc
on large sizes/in case when kmalloc barfs, with kvfree on the freeing side?
Or just a flat-out check for some reasonably upper limit on optlen in
the very beginning?
On Wed, Mar 29, 2017 at 06:57:06AM +0100, Al Viro wrote:
> Comments, review, testing, replacement patches, etc. are very welcome.
I've given this a spin, and it appears to work (in that the box boots).
Kernel size wise:
text data bss dec hex filename
8020229 3014220 10243276 21277725 144ac1d vmlinux.orig
8034741 3014388 10243276 21292405 144e575 vmlinux.uaccess
7976719 3014324 10243276 21234319 144028f vmlinux.noinline
Performance using hdparm -T (cached reads) to evaluate against a SSD
gives me the following results:
* original:
Timing cached reads: 580 MB in 2.00 seconds = 289.64 MB/sec
Timing cached reads: 580 MB in 2.00 seconds = 290.06 MB/sec
Timing cached reads: 580 MB in 2.00 seconds = 289.65 MB/sec
Timing cached reads: 582 MB in 2.00 seconds = 290.82 MB/sec
Timing cached reads: 578 MB in 2.00 seconds = 289.07 MB/sec
Average = 289.85MB/s
* uaccess:
Timing cached reads: 578 MB in 2.00 seconds = 288.36 MB/sec
Timing cached reads: 534 MB in 2.00 seconds = 266.68 MB/sec
Timing cached reads: 534 MB in 2.00 seconds = 267.07 MB/sec
Timing cached reads: 552 MB in 2.00 seconds = 275.45 MB/sec
Timing cached reads: 532 MB in 2.00 seconds = 266.08 MB/sec
Average = 272.73 MB/sec
* noinline:
Timing cached reads: 548 MB in 2.00 seconds = 274.16 MB/sec
Timing cached reads: 574 MB in 2.00 seconds = 287.19 MB/sec
Timing cached reads: 574 MB in 2.00 seconds = 286.47 MB/sec
Timing cached reads: 572 MB in 2.00 seconds = 286.20 MB/sec
Timing cached reads: 578 MB in 2.00 seconds = 288.86 MB/sec
Average = 284.58 MB/sec
I've run the test twice, and there's definitely a reproducable drop in
performance for some reason when switching between current and Al's
uaccess patches, which is partly recovered by switching to the out of
line versions.
The only difference that I can identify that could explain this are
the extra might_fault() checks in Al's version but which are missing
from the ARM version.
I'd suggest that we immediately switch to the uninlined versions on
ARM so that the impact of that change is reduced. We end up with
a 1.9% performance reduction rather than a 6% reduction with the
inlined versions.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
On Thu, Mar 30, 2017 at 05:22:41PM +0100, Russell King - ARM Linux wrote:
> On Wed, Mar 29, 2017 at 06:57:06AM +0100, Al Viro wrote:
> > Comments, review, testing, replacement patches, etc. are very welcome.
>
> I've given this a spin, and it appears to work (in that the box boots).
>
> Kernel size wise:
>
> text data bss dec hex filename
> 8020229 3014220 10243276 21277725 144ac1d vmlinux.orig
> 8034741 3014388 10243276 21292405 144e575 vmlinux.uaccess
> 7976719 3014324 10243276 21234319 144028f vmlinux.noinline
>
> Performance using hdparm -T (cached reads) to evaluate against a SSD
> gives me the following results:
>
> * original:
> Timing cached reads: 580 MB in 2.00 seconds = 289.64 MB/sec
> Timing cached reads: 580 MB in 2.00 seconds = 290.06 MB/sec
> Timing cached reads: 580 MB in 2.00 seconds = 289.65 MB/sec
> Timing cached reads: 582 MB in 2.00 seconds = 290.82 MB/sec
> Timing cached reads: 578 MB in 2.00 seconds = 289.07 MB/sec
>
> Average = 289.85MB/s
>
> * uaccess:
> Timing cached reads: 578 MB in 2.00 seconds = 288.36 MB/sec
> Timing cached reads: 534 MB in 2.00 seconds = 266.68 MB/sec
> Timing cached reads: 534 MB in 2.00 seconds = 267.07 MB/sec
> Timing cached reads: 552 MB in 2.00 seconds = 275.45 MB/sec
> Timing cached reads: 532 MB in 2.00 seconds = 266.08 MB/sec
>
> Average = 272.73 MB/sec
>
> * noinline:
> Timing cached reads: 548 MB in 2.00 seconds = 274.16 MB/sec
> Timing cached reads: 574 MB in 2.00 seconds = 287.19 MB/sec
> Timing cached reads: 574 MB in 2.00 seconds = 286.47 MB/sec
> Timing cached reads: 572 MB in 2.00 seconds = 286.20 MB/sec
> Timing cached reads: 578 MB in 2.00 seconds = 288.86 MB/sec
>
> Average = 284.58 MB/sec
>
> I've run the test twice, and there's definitely a reproducable drop in
> performance for some reason when switching between current and Al's
> uaccess patches, which is partly recovered by switching to the out of
> line versions.
>
> The only difference that I can identify that could explain this are
> the extra might_fault() checks in Al's version but which are missing
> from the ARM version.
How would the following affect things?
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e68604ae3ced..d24d338f0682 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -184,7 +184,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
kaddr = kmap(page);
from = kaddr + offset;
- left = __copy_to_user(buf, from, copy);
+ left = __copy_to_user_inatomic(buf, from, copy);
copy -= left;
skip += copy;
from += copy;
@@ -193,7 +193,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
iov++;
buf = iov->iov_base;
copy = min(bytes, iov->iov_len);
- left = __copy_to_user(buf, from, copy);
+ left = __copy_to_user_inatomic(buf, from, copy);
copy -= left;
skip = copy;
from += copy;
@@ -267,7 +267,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
kaddr = kmap(page);
to = kaddr + offset;
- left = __copy_from_user(to, buf, copy);
+ left = __copy_from_user_inatomic(to, buf, copy);
copy -= left;
skip += copy;
to += copy;
@@ -276,7 +276,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
iov++;
buf = iov->iov_base;
copy = min(bytes, iov->iov_len);
- left = __copy_from_user(to, buf, copy);
+ left = __copy_from_user_inatomic(to, buf, copy);
copy -= left;
skip = copy;
to += copy;
@@ -541,7 +541,7 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
if (unlikely(i->type & ITER_PIPE))
return copy_pipe_to_iter(addr, bytes, i);
iterate_and_advance(i, bytes, v,
- __copy_to_user(v.iov_base, (from += v.iov_len) - v.iov_len,
+ __copy_to_user_inatomic(v.iov_base, (from += v.iov_len) - v.iov_len,
v.iov_len),
memcpy_to_page(v.bv_page, v.bv_offset,
(from += v.bv_len) - v.bv_len, v.bv_len),
@@ -560,7 +560,7 @@ size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
return 0;
}
iterate_and_advance(i, bytes, v,
- __copy_from_user((to += v.iov_len) - v.iov_len, v.iov_base,
+ __copy_from_user_inatomic((to += v.iov_len) - v.iov_len, v.iov_base,
v.iov_len),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
v.bv_offset, v.bv_len),
@@ -582,7 +582,7 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
return false;
iterate_all_kinds(i, bytes, v, ({
- if (__copy_from_user((to += v.iov_len) - v.iov_len,
+ if (__copy_from_user_inatomic((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len))
return false;
0;}),
On Thu, Mar 30, 2017 at 9:43 AM, Al Viro <[email protected]> wrote:
> On Thu, Mar 30, 2017 at 05:22:41PM +0100, Russell King - ARM Linux wrote:
> How would the following affect things?
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index e68604ae3ced..d24d338f0682 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -184,7 +184,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>
> kaddr = kmap(page);
> from = kaddr + offset;
> - left = __copy_to_user(buf, from, copy);
> + left = __copy_to_user_inatomic(buf, from, copy);
This is all going in the wrong direction entirely.
That "__copy_to_user()" code was bad from the beginning: it should
never have had the double underscores. I objected to it at the time.
Now you're making it go from bad to insane. You're apparently
mis-using "inatomic" because of subtle issues that have nothing to do
with "inatomic" - you want to get rid of a might_sleep() warning, but
you don't actuially want inatomic behavior, so the thing will still
sleep.
This all very subtle already depends on people having checked the
"struct iov_iter" beforehand. We should *remove* subtle stuff like
that, not add yet more layers of subtlety and possible future bugs
when somebody calls copy_page_to_iter() without having properly
validated the iter.
These are not theoretical issues. We've _had_ these exact bugs when
people didn't validate the stuff they created by hand and bypassed the
normal IO paths.
Trying to optimize away an access_ok() or a might_fault() is *not* a
valid reason to completely break our security model, and create code
that makes no sense (claiming it is atomic when it isn't).
Linus
On Thu, Mar 30, 2017 at 10:18:23AM -0700, Linus Torvalds wrote:
> This is all going in the wrong direction entirely.
This is not going into the tree - it's just a "let's check your
theory about might_fault() overhead being the source of slowdown
you are seeing" quick-and-dirty patch.
Speaking of the checks in there - if anything, might_fault() in those
suckers belongs outside of the loop; note that even on the kmap_atomic()
side of copy_page_to_iter_iovec() we do stuff like fault_in_pages_writeable().
BTW, ..._inatomic is a very unfortunate name, IMO - it's *not* safe
to use in atomic contexts as-is, to start with; the caller needs to take
care of pagefault_disable(). If anything, __copy_from_user_nofault() would
probably be better...
I really wonder about the low dispersion in those tests - IME on amd64
boxen it tends to be ~5% or so; what's normal for arm?
On Thu, Mar 30, 2017 at 07:48:24PM +0100, Al Viro wrote:
> BTW, ..._inatomic is a very unfortunate name, IMO - it's *not* safe
> to use in atomic contexts as-is, to start with; the caller needs to take
> care of pagefault_disable(). If anything, __copy_from_user_nofault() would
> probably be better...
Not even that - again, it will happily trigger page faults unless the
caller disables those. __copy_from_user_I_know_what_I_am_doing()?
On Thu, Mar 30, 2017 at 11:48 AM, Al Viro <[email protected]> wrote:
>
> This is not going into the tree - it's just a "let's check your
> theory about might_fault() overhead being the source of slowdown
> you are seeing" quick-and-dirty patch.
Note that for cached hdparm reads, I suspect a *much* bigger effects
than the fairly cheap might_fault() tests is just the random layout of
the data in the page cache.
Memory is just more expensive than CPU is.
The precise physical address that gets allocated for the page cache
entries ends up mattering, and is obviously fairly "sticky" within one
reboot (unless you have a huge working set and that flushes it, or you
use something like
echo 3 > /proc/sys/vm/drop_caches
to flush filesystem caches manually).
The reason things like page allocation matter for performance testing
is simply that the CPU caches are physically indexed (the L1 might not
be, but outer levels definitely are), and so page allocation ends up
impacting caching unless you have very high associativity.
And even if your workload doesn't fit in your CPU caches (I'd hope
that the "cached" hdparm is still doing a fairly big area), you'll
still see memory performance depend on physical addresses.
Doing kernel performance testing without rebooting several times is
generally very hard.
Linus
On Thu, Mar 30, 2017 at 11:54 AM, Al Viro <[email protected]> wrote:
>
> Not even that - again, it will happily trigger page faults unless the
> caller disables those. __copy_from_user_I_know_what_I_am_doing()?
That's a horrible name. Everybody always thinks they know what they are doing.
There's a reason I called the new odd user access functions
"unsafe_get/put_user()"
But regardless of that, I think you're being silly to even look at the
iovec code. That code simply *isn't* critical enough that one or two
extra instructions matter.
Show me profiles to the contrary. I dare you.
Those things shouldn't be using *anything* odd at all. They should be
using "copy_from_user()". Nothing else.
Linus
On Thu, Mar 30, 2017 at 11:59:16AM -0700, Linus Torvalds wrote:
> But regardless of that, I think you're being silly to even look at the
> iovec code. That code simply *isn't* critical enough that one or two
> extra instructions matter.
>
> Show me profiles to the contrary. I dare you.
>
> Those things shouldn't be using *anything* odd at all. They should be
> using "copy_from_user()". Nothing else.
That they very definitely should not. And not because of access_ok() or
might_fault() - this is one place where zero-padding is absolutely wrong.
So unless you are going to take it out of copy_from_user() and pray
that random shit ioctls in random shit drivers check the return value
properly, copy_from_user() is no-go here.
On Thu, Mar 30, 2017 at 12:10 PM, Al Viro <[email protected]> wrote:
>
> That they very definitely should not. And not because of access_ok() or
> might_fault() - this is one place where zero-padding is absolutely wrong.
> So unless you are going to take it out of copy_from_user() and pray
> that random shit ioctls in random shit drivers check the return value
> properly, copy_from_user() is no-go here.
Actually, that is a great example of why you should *not* use
__copy_from_user().
If the reason is lack of zero-padding, that doesn't mean that suddenly
we shouldn't check the range. And it doesn't mean that it shouldn't
document why it does it.
So dammit, just add something like this to lib/iovec.c:
static inline unsigned long copy_from_user_nozero(void *to, const
void __user *from, size_t len)
{
if (!access_ok(from, len))
return len;
return __copy_from_user(to, from, len);
}
which now isn't insecure, and also magically documents *why* you don't
just use the plain copy_from_user().
Linus
On 03/29/2017 05:27 PM, Linus Torvalds wrote:
> On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta
> <[email protected]> wrote:
>>
>> I guess I can in next day or two - but mind you the inline version for ARC is kind
>> of special vs. other arches. We have this "manual" constant propagation to elide
>> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant.
>
> I don't think that's special. We do that on x86 too, and I suspect ARC
> copied it from there (or from somebody else who did it).
No, I (re)wrote that code and AFAIKR didn't copy from anyone and AFAICS it is
certainly different from others if not special. If you look closely at
arc:access.h it is not the trivial check for 1-2-4 conversion as in the commit you
referred to. It actually tries to compile time eliminate hunks from inline
assembly, for constant @sz (so is designed purely for inlined variants, whether
that matters or not is a different story). Thing is from the hardware POV, 4
LD/ST in flight is good (atleast for ARC700 cores) so we wrap it up in a Zero
delay loop. This takes care of multiples of 16 bytes, the last 15 bytes are the
killer which requires bunch of conditionals which is what I try to eliminate.
FWIW, I experimented with uaccess inlining on ARC
1. pristine 4.11-rc1 (all inline)
2. Inline + disabling the "smart" const propagation
3. Out of line only variants (which already existed/default on ARC for -Os, but
hacked for current -O3)
Numbers for LMBench FS latency (off of tmpfs to avoid any device related
perturbation). Note that LMBench already runs them several times itself and each
of below is obviously with a fresh reboot since kernels were different.
So it seems 0k file create/del gets worse without the smart inline, while 10k gets
better. mmap (16k) got worse as well. With out of line some got better while some
worse.
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6
170330-v4 Linux 4.11.0- 154.9 88.3 709.2 131.2 2494.0 4.056 11.0 91.1
170330-v4 Linux 4.11.0- 157.7 69.8 622.7 140.8 2168.0 5.654 10.8 91.0
Compare that to data against
1. pristine 4.11-rc1 (all inline)
2. Al's series + ARC forced inline
3. Al's series + ARC forced NOT inline
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6
170329-v4 Linux 4.11.0- 141.2 63.4 629.7 130.0 2172.0 5.796 10.8 90.0
170329-v4 Linux 4.11.0- 154.9 89.2 691.6 147.7 2323.0 4.922 10.8 92.3
So it's a mix bag really. Maybe we need some better directed test to really drill
it down.
> But at least on x86 is is limited entirely to the "__" versions, and
> it's almost entirely pointless. We actually removed some of that kind
> of code because it was *do* pointless, and it had just been copied
> around into the "atomic" versions too.
>
> See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h
> complexity"), which did that.
>
> The basic "__" versions still do that constant-size thing, but they
> really are questionable.
Perhaps because the scope of constant usage was pretty narrow - it would only
benefit if *copy_from_user() were called with 1,2,4 which is relatively unlikely
as we have __get_user and friends for that already.
> Exactly because it's just the "__" versions -
> the *regular* "copy_to/from_user()" is an unconditional function call,
> because inlining it isn't just the access operations, it's the size
> check, and on modern x86 it's also the "set AC to mark the user access
> as safe".
So what you are saying is it is relatively costly on x86 because of SMAP which may
not be true for arches w/o hardware support.
Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter
-Vineet
On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
<[email protected]> wrote:
>
> So it's a mix bag really. Maybe we need some better directed test to really drill
> it down.
As mentioned inn the discussion about ARM, I seriously doubt that the
inlining will even be noticeable compared to other effects here.
The cases where I've seen this matter have been the small
constant-sized copies, and in every case the right fix was to use
"get_user()" instead of "copy_from_user()".
There are a couple of really special cases that can show up where
there's a slightly bigger and more complex case that still is
meaningful: the most noticeable of those being "stat()".
> So what you are saying is it is relatively costly on x86 because of SMAP which may
> not be true for arches w/o hardware support.
> Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter
So on x86 (and ARM) we have the SMAP/UAO issue, and on other
architectures we have another expense entirely: maintenance and
testing.
(On ARM, hopefully the UAO bit is faster to set, but it's still
"another instruction before and after", so even if it's not as
expensive as clac/stac are on current x86 chips, it's an argument
against inlining)
And on the maintenance and testing side, it means that unless some
header organization makes sense on x86 or ARM (or power), it likely
doesn't make sense to try to tweak for any other architecture.
Or at the very least it would have to be a _really_ big and noticeable
issue, not something that might be hard/impossible to even measure.
Linus
On Thu, Mar 30, 2017 at 12:19:35PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 12:10 PM, Al Viro <[email protected]> wrote:
> >
> > That they very definitely should not. And not because of access_ok() or
> > might_fault() - this is one place where zero-padding is absolutely wrong.
> > So unless you are going to take it out of copy_from_user() and pray
> > that random shit ioctls in random shit drivers check the return value
> > properly, copy_from_user() is no-go here.
>
> Actually, that is a great example of why you should *not* use
> __copy_from_user().
>
> If the reason is lack of zero-padding, that doesn't mean that suddenly
> we shouldn't check the range. And it doesn't mean that it shouldn't
> document why it does it.
>
> So dammit, just add something like this to lib/iovec.c:
>
> static inline unsigned long copy_from_user_nozero(void *to, const
> void __user *from, size_t len)
> {
> if (!access_ok(from, len))
> return len;
> return __copy_from_user(to, from, len);
> }
>
> which now isn't insecure, and also magically documents *why* you don't
> just use the plain copy_from_user().
Maybe... However, we *do* have places where it's done under kmap_atomic()
in there. Let's leave that one until this round of uaccess consolidation is
finished, OK? lib/iov_iter.c is special and isolated enough; we can figure
out what to do with those primitives later.
As far as I'm concerned, lib/*.c and mm/*.c are separate story; I would start
with getting rid of that stuff in random drivers. Here's what we have at the
moment:
there are only 3 irregular callers of __copy_to_user_inatomic():
arch/mips/kernel/unaligned.c:1276: res = __copy_to_user_inatomic(addr, fpr, sizeof(*fpr));
drivers/gpu/drm/i915/i915_gem.c:913: ret = __copy_to_user_inatomic(user_data, vaddr + offset, length);
drivers/gpu/drm/i915/i915_gem.c:983: unwritten = __copy_to_user_inatomic(user_data, vaddr + offset, length);
There are 32 irregular callers of __copy_from_user_inatomic(), majority in
perf/oprofile-related code. Leave those aside, only 8 are left:
arch/mips/kernel/unaligned.c:1242: res = __copy_from_user_inatomic(fpr, addr,
drivers/gpu/drm/i915/i915_gem.c:1324: ret = __copy_from_user_inatomic(vaddr + offset, user_data, len);
drivers/gpu/drm/i915/i915_gem_execbuffer.c:669: unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeo
f(r[0]));
drivers/gpu/drm/msm/msm_gem_submit.c:73: return __copy_from_user_inatomic(to, from, n);
kernel/trace/trace.c:5780: len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
kernel/trace/trace.c:5851: len = __copy_from_user_inatomic(&entry->id, ubuf, cnt);
kernel/trace/trace_kprobe.c:216: ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1);
virt/kvm/kvm_main.c:1832: r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
Ones in perf and oprofile code really smell like a missing helper,
along the lines of probe_kernel_read(), but for userland pointers.
Incidentally, metag, mips, openrisc and xtensa instances of that lack
pagefault_disable() - might be a bug, need to check that. powerpc and
sparc ones also lack it, but those have pagefault_disable() done in
caller. tile ones open-code access_ok(), AFAICS. Sorting that pile
out would already about half the amount of callers.
Ho-hum... There's something odd about those - some of them seem to
assume that we are under set_fs(USER_DS), some do what access_ok()
would've done with USER_DS and proceed to __copy_from_user_inatomic().
And that includes the ones like sparc... Very strange.
Am I right assuming that perf_callchain_user() can't be called other than
with USER_DS, but oprofile ->backtrace() can? I'm not familiar enough
with oprofile guts... Folks?
On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
> <[email protected]> wrote:
> >
> > So it's a mix bag really. Maybe we need some better directed test to really drill
> > it down.
>
> As mentioned inn the discussion about ARM, I seriously doubt that the
> inlining will even be noticeable compared to other effects here.
(Sorry to switch sub-threads.)
I'm running tests on that point, concentrating on hdparm -T and perfing
that. You're right in so far as perf identifies the hotspot as the
copy_to_user() function for that workload, rather than the inlined bits
- the top hits in perf of hdparm -T are:
+ 66.52% hdparm [k] __copy_to_user_std
+ 8.49% hdparm [k] generic_file_read_iter
+ 3.82% hdparm [k] lock_acquire
+ 2.80% hdparm [k] copy_page_to_iter
+ 2.49% hdparm [k] find_get_entry
+ 1.19% hdparm [k] lock_release
Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots
can be off.
The generic_file_read_iter() one is definitely affected by an IRQ-
disabled region in there.
Here's the average hdparm -T transfer rates and standard deviation over
20 samples:
Unpatched: Average=320.42 MB/s sigma=0.878657
Uaccess+inline: Average=318.77 MB/s sigma=1.003332
Uaccess+noinline: Average=319.40 MB/s sigma=1.088354
This pattern - where the noinline version sits between the inlined
version and unpatched version seems to be a pattern in all the
measurements I've done so far, and it points to inlining that code
having a slight detrimental effect. What we don't know is whether
uninlining the code without Al's patch would see a slight boost,
but I'm not about to go there.
However, this all points towards there being a very slight advantage
to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for
ARM, but I'd say it's really down in the noise - I'm not concerned.
> (On ARM, hopefully the UAO bit is faster to set, but it's still
> "another instruction before and after", so even if it's not as
> expensive as clac/stac are on current x86 chips, it's an argument
> against inlining)
The UAO set/clear does show up as a hotspot within copy_page_to_iter(),
but as we can see, overall its about 3% of the workload. Within
copy_page_to_iter(), it's the __put_user() based loop inside
fault_in_pages_writeable() which has the hotspot, due to the repeated
enable+disable sequence (more the instruction barriers that we need.)
Perf reports that the barriers account for 8.33 and 17.59% of the
time spent within that function, so we're actually talking about
maybe .25% and .5% of this workload spent doing the UAO thing.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
On Tue, Mar 28, 2017 at 10:57 PM, Al Viro <[email protected]> wrote:
> We have several primitives for bulk kernel<->userland copying.
> That stuff lives in various asm/uaccess.h, with serious code duplication
> _and_ seriously inconsistent semantics.
>
> That code has grown a lot of cruft and more than a few bugs.
> Some got caught and fixed last year, but some fairly unpleasant ones
> still remain. A large part of problem was that a lot of code used to
> include <asm/uaccess.h> directly, so we had no single place to work
> with. That got finally fixed in 4.10-rc1, when everything had been
> forcibly switched to including <linux/uaccess.h>. At that point it
> became possible to start getting rid of boilerplate; I hoped to deal
> with that by 4.11-rc1, but the things didn't work out and that work
> has slipped to this cycle.
>
> The patchset currently in vfs.git#work.uaccess is the result;
> there's more work to do, but it takes care of a large part of the
> problems. About 2.8KLoc removed, a lot of cruft is gone and semantics
> is hopefully in sync now. All but two architectures (ia64 and metag)
> had been switched to new mechanism; for these two I'm afraid that I'll
> need serious help from maintainers.
FWIW, I tested this on x86 and ARM with the LKDTM tests I built for
CONFIG_HARDENED_USERCOPY and this branch (which includes the earlier
fixes I suggested privately) tests fine for me.
> Currently we have 8 primitives - 6 on every architecture and 2 more
> on biarch ones. All of them have the same calling conventions: arguments
> are the same as for memcpy() (void *to, const void *from, unsigned long size)
> and the same rules for return value.
> If all loads and stores succeed, everything is obvious - the
> 'size' bytes starting at 'to' become equal to 'size' bytes starting at 'from'
> and zero is returned. If some loads or stores fail, non-zero value should
> be returned. If any of those primitives returns a positive value N,
> * N should be no greater than size
> * the values fetched out of from[0..size-N-1] should be stored into the
> corresponding bytes of to[0..size-N-1]
> * N should not be equal to size unless not a single byte could have
> been fetched or stored. As long as that restriction is satisfied, these
> primitives are not required to squeeze every possible byte in case some
> loads or stores fail.
>
> 1) copy_from_user() - 'to' points to kernel memory, 'from' is
> normally a userland pointer. This is used for copying structures from
> [...]
> 8) __copy_in_user(). Basically, copy_in_user() sans access_ok().
> Biarch-only, with the grand total of 6 callers...
It seems to me like everything above here should end up in comments
for these functions. I think even after the unification, it's valuable
to have this actually in the source.
> What this series does is:
>
> * convert architectures to fewer primitives (raw_copy_{to,from,in}_user(),
> the last one only on biarch ones), switching to generic implementations
> of the 8 primitives aboves via raw_... ones. Those generic implementations
> are in linux/uaccess.h (and lib/usercopy.c). Architecture provides
> raw_... ones, selects ARCH_HAS_RAW_COPY_USER and it's done.
Bikeshed: I still prefer that the "raw_copy_*" functions be named
"arch_copy_*" or "__arch_copy_*" to match all the other arch-specific
functions in the kernel. This clearly marks them as arch-specific, and
in theory, the leading "__" would indicate that they're "internal" or
hint that they don't perform any of the checking done from the
standard interface functions.
Currently arm64 already uses the name __arch_copy_*, and arm's is
arm_copy_*. I just don't think "raw" is meaningful enough to avoid
people accidentally using it.
> * all object size check, kasan, etc. instrumentation is taken care of
> in linux/uaccess.h; no need to touch it in arch/*
>
> * consistent semantics wrt zero-padding - none of the raw_... do any of
> that, copy_from_user() does (outside of fast path).
>
> At the moment I have that conversion done for everything except ia64 and
> metag. Once everything is converted, I'll remove ARCH_HAS_RAW_COPY_USER
> and make generic stuff unconditional; at the same point
> HAVE_ARCH_HARDENED_USERCOPY will be gone (becoming unconditionally true).
Yay! :)
> The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
> in #work.uaccess. It's based at 4.11-rc1. Infrastructure is in
> #uaccess.stem, then it splits into per-architecture branches (uaccess.<arch>),
> eventually merged into #work.uaccess. Some stuff (including a cherry-picked
> mips build fix) is in #uaccess.misc, also merged into the final.
>
> I hope that infrastructure part is stable enough to put it into never-rebased
> state. Some of per-architecture branches might be even done right; however,
> most of them got no testing whatsoever, so any help with testing (as well
> as "Al, for fuck sake, dump that garbage of yours, here's the correct patch"
> from maintainers) would be very welcome. So would the review, of course.
>
> In particular, the fix in uaccess.parisc should be replaced with the stuff
> Helge posted on parisc list, probably along with the get_user/put_user
> patches. I've put my variant of fix there as a stopgap; switch of pa_memcpy()
> to assembler is clearly the right way to solve it and I'll be happy to
> switch to that as soon as parisc folks settle on the final version of that
> stuff.
>
> For most of the oddball architectures I have no way to test that stuff, so
> please treat the asm-affecting patches in there as a starting point for
> doing it right. Some might even work as is - stranger things had happened,
> but don't count ont it.
>
> And again, metag and ia64 parts are simply not there - both architectures
> zero-pad in __copy_from_user_inatomic() and that really needs fixing.
> In case of metag there's __copy_to_user() breakage as well, AFAICS, and
> I've been unable to find any documentation describing the architecture
> wrt exceptions, and that part is apparently fairly weird. In case of
> ia64... I can test mckinley side of things, but not the generic __copy_user()
> and ia64 is about as weird as it gets. With no reliable emulator, at that...
> So these two are up to respective maintainers.
I would also call out lib/test_user_copy.c (CONFIG_TEST_USER_COPY) for
maintainers to see if things are working correctly. This tries to test
all the size-specific combinations of possible copies and checks for
zeroing, etc. (I'm sure the test could be improved, but it's already
caught tiny bugs in per-arch implementations in the past.)
> Other things not there:
> * unification of strncpy_from_user() and friends. Probably next
> cycle.
> * anything to do with uaccess_begin/unsafe accesses/uaccess_end
> stuff. Definitely next cycle.
>
> I'm not sure if mailbombing linux-arch would be a good idea; there are
> 90 patches in that pile, with total size nearly half a megabyte. If anyone
> wants that posted, I'll do so, but it might be more convenient to just
> use git.
>
> Comments, review, testing, replacement patches, etc. are very welcome.
>
> Al "hates assembers, dozens of them" Viro
>
>
> [1] Nick Piggin has spotted that bug back in early 2000s, fixed it for
> i386 and hadn't bothered to do anything about other architectures (including
> amd64, for crying out loud!). Since then we had inconsistent behaviour
> between the architectures. Results of those bugs range from transient bogus
> values observed in mmap() if you get memory pressure combined with bad timing
> to outright fs corruption, if the timing is *really* bad. All architectures
> used to have it, hopefully this series will take care of the last stragglers.
Thanks for working on this! I've wanted to see this done for a long
time; I'm glad you had the time for it!
-Kees
--
Kees Cook
Pixel Security
On Thu, Mar 30, 2017 at 05:21:32PM -0700, Kees Cook wrote:
> On Tue, Mar 28, 2017 at 10:57 PM, Al Viro <[email protected]> wrote:
> > At the moment I have that conversion done for everything except ia64 and
> > metag. Once everything is converted, I'll remove ARCH_HAS_RAW_COPY_USER
> > and make generic stuff unconditional; at the same point
> > HAVE_ARCH_HARDENED_USERCOPY will be gone (becoming unconditionally true).
>
> Yay! :)
In the mean time should ARCH_HAS_RAW_COPY_USER select
HAVE_ARCH_HARDENED_USERCOPY?
FWIW I already have patches for metag to enable RAW_COPY_USER that just
need some more testing (mainly due to prerequisite user copy fixes), but
I don't know about ia64
Cheers
James
On 29/03/17 06:57, Al Viro wrote:
> The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
> in #work.uaccess. It's based at 4.11-rc1. Infrastructure is in
> #uaccess.stem, then it splits into per-architecture branches (uaccess.<arch>),
> Comments, review, testing, replacement patches, etc. are very welcome.
For the two "arm64: " patches:
Reviewed-by: James Morse <[email protected]>
Thanks,
James
On Wed, Mar 29, 2017 at 06:57:06AM +0100, Al Viro wrote:
> I hope that infrastructure part is stable enough to put it into never-rebased
> state. Some of per-architecture branches might be even done right; however,
> most of them got no testing whatsoever, so any help with testing (as well
> as "Al, for fuck sake, dump that garbage of yours, here's the correct patch"
> from maintainers) would be very welcome. So would the review, of course.
For the xtensa part:
Tested-by: Max Filippov <[email protected]>
I believe that the xtensa part needs the following correction:
---8<---
>From 4505d69c3514fb12405409a7943e45831d037960 Mon Sep 17 00:00:00 2001
From: Max Filippov <[email protected]>
Date: Tue, 4 Apr 2017 13:20:34 -0700
Subject: [PATCH] xtensa: fix prefetch in the raw_copy_to_user
'from' is the input buffer, it should be prefetched with prefetch, not
prefetchw.
Signed-off-by: Max Filippov <[email protected]>
---
arch/xtensa/include/asm/uaccess.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/xtensa/include/asm/uaccess.h b/arch/xtensa/include/asm/uaccess.h
index 8e93ed8..2e7bac0 100644
--- a/arch/xtensa/include/asm/uaccess.h
+++ b/arch/xtensa/include/asm/uaccess.h
@@ -245,7 +245,7 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
static inline unsigned long
raw_copy_to_user(void __user *to, const void *from, unsigned long n)
{
- prefetchw(from);
+ prefetch(from);
return __xtensa_copy_user((__force void *)to, from, n);
}
#define INLINE_COPY_FROM_USER
---8<---
--
Thanks.
-- Max
On Tue, Apr 04, 2017 at 01:26:29PM -0700, Max Filippov wrote:
> On Wed, Mar 29, 2017 at 06:57:06AM +0100, Al Viro wrote:
> > I hope that infrastructure part is stable enough to put it into never-rebased
> > state. Some of per-architecture branches might be even done right; however,
> > most of them got no testing whatsoever, so any help with testing (as well
> > as "Al, for fuck sake, dump that garbage of yours, here's the correct patch"
> > from maintainers) would be very welcome. So would the review, of course.
>
> For the xtensa part:
> Tested-by: Max Filippov <[email protected]>
>
> I believe that the xtensa part needs the following correction:
Applied.
> ---8<---
> >From 4505d69c3514fb12405409a7943e45831d037960 Mon Sep 17 00:00:00 2001
> From: Max Filippov <[email protected]>
> Date: Tue, 4 Apr 2017 13:20:34 -0700
> Subject: [PATCH] xtensa: fix prefetch in the raw_copy_to_user
>
> 'from' is the input buffer, it should be prefetched with prefetch, not
> prefetchw.
>
> Signed-off-by: Max Filippov <[email protected]>
> ---
> arch/xtensa/include/asm/uaccess.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/xtensa/include/asm/uaccess.h b/arch/xtensa/include/asm/uaccess.h
> index 8e93ed8..2e7bac0 100644
> --- a/arch/xtensa/include/asm/uaccess.h
> +++ b/arch/xtensa/include/asm/uaccess.h
> @@ -245,7 +245,7 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
> static inline unsigned long
> raw_copy_to_user(void __user *to, const void *from, unsigned long n)
> {
> - prefetchw(from);
> + prefetch(from);
> return __xtensa_copy_user((__force void *)to, from, n);
> }
> #define INLINE_COPY_FROM_USER
> ---8<---
>
> --
> Thanks.
> -- Max
On Wed, Mar 29, 2017 at 06:57:06AM +0100, Al Viro wrote:
> And again, metag and ia64 parts are simply not there - both architectures
> zero-pad in __copy_from_user_inatomic() and that really needs fixing.
> In case of metag there's __copy_to_user() breakage as well, AFAICS, and
> I've been unable to find any documentation describing the architecture
> wrt exceptions, and that part is apparently fairly weird. In case of
> ia64... I can test mckinley side of things, but not the generic __copy_user()
> and ia64 is about as weird as it gets. With no reliable emulator, at that...
> So these two are up to respective maintainers.
Speaking of ia64: copy_user.S contains the following oddity:
2:
EX(.failure_in3,(p16) ld8 val1[0]=[src1],16)
(p16) ld8 val2[0]=[src2],16
src1 is 16-byte aligned, src2 is src1 + 8.
What guarantees that we can't race with e.g. TLB shootdown from a thread on
another CPU, ending up with the second insn taking a fault and oopsing?
AFAICS, other places where we have such pairs of loads or stores (e.g.
EX(.ex_handler, (p16) ld8 r34=[src0],16)
EK(.ex_handler, (p16) ld8 r38=[src1],16)
in the memcpy_mck.S counterpart of that code) both have exception table
entries associated with them.
Is that one intentional and correct for some subtle reason, or is it a very
narrow race on the hardware nobody gives a damn anymore? It is pre-mckinley
stuff, after all...
On Wed, Apr 05, 2017 at 06:05:08AM +0100, Al Viro wrote:
> Speaking of ia64: copy_user.S contains the following oddity:
> 2:
> EX(.failure_in3,(p16) ld8 val1[0]=[src1],16)
> (p16) ld8 val2[0]=[src2],16
>
> src1 is 16-byte aligned, src2 is src1 + 8.
>
> What guarantees that we can't race with e.g. TLB shootdown from a thread on
> another CPU, ending up with the second insn taking a fault and oopsing?
>
> AFAICS, other places where we have such pairs of loads or stores (e.g.
> EX(.ex_handler, (p16) ld8 r34=[src0],16)
> EK(.ex_handler, (p16) ld8 r38=[src1],16)
> in the memcpy_mck.S counterpart of that code) both have exception table
> entries associated with them.
>
> Is that one intentional and correct for some subtle reason, or is it a very
> narrow race on the hardware nobody gives a damn anymore? It is pre-mckinley
> stuff, after all...
Actually, the piece immediately after that one is worse. By that point,
we have
* checked that len is large enough to be worth bothering with word
copies. Fine.
* checked that src and dst have the same remainder modulo 8.
* copied until src is a multiple of 16, incrementing src and dst
by the same amount.
* prepared for copying in multiples of 16 bytes
* set src2 and dst2 8 bytes past src1 and dst1 resp.
and now we have a pipelined loop with
EX(.failure_in3,(p16) ld8 val1[0]=[src1],16)
(p16) ld8 val2[0]=[src2],16
EX(.failure_out, (EPI) st8 [dst1]=val1[PIPE_DEPTH-1],16)
(EPI) st8 [dst2]=val2[PIPE_DEPTH-1],16
for body. Now, consider the following case:
* to is 8 bytes before the end of user page, next page is unmapped
* from is at the beginning of kernel page
* len is simply PAGE_SIZE
and we call copy_to_user(). All the preparation work won't read or write
anything - all alignments are fine. src1 and src2 are kernel page and
kernel page + 8 resp.; dst1 is 8 bytes before the end of user page, dst2
is at the beginning of unmapped user page. No loads are going to fail;
the first store into dst1 won't fail either. The *second* store - one to
dst2 will not just fail, it'll oops.
<goes to test>
... and sure enough, on generic kernel (CONFIG_ITANIUM) that yields a nice
shiny oops at precisely that insn.
We really need tests for uaccess primitives. That's not a recent regression,
BTW - it had been that way since 2.3.48-pre2, as far as I can see.
On Wed, Apr 5, 2017 at 1:08 AM, Al Viro <[email protected]> wrote:
> ... and sure enough, on generic kernel (CONFIG_ITANIUM) that yields a nice
> shiny oops at precisely that insn.
The right fix here might be to delete all the CONFIG_ITANIUM paths. I
doubt that anyone is still running upstream kernels on Merced CPUs
(and if they are, it might be a kindness to them to make them stop).
> We really need tests for uaccess primitives. That's not a recent regression,
> BTW - it had been that way since 2.3.48-pre2, as far as I can see.
Probably be handy for new architectures to test all the corner cases.
-Tony
On Wed, Apr 05, 2017 at 11:44:23AM -0700, Tony Luck wrote:
> On Wed, Apr 5, 2017 at 1:08 AM, Al Viro <[email protected]> wrote:
> > ... and sure enough, on generic kernel (CONFIG_ITANIUM) that yields a nice
> > shiny oops at precisely that insn.
>
> The right fix here might be to delete all the CONFIG_ITANIUM paths. I
> doubt that anyone is still running upstream kernels on Merced CPUs
> (and if they are, it might be a kindness to them to make them stop).
Frankly, I would be surprised if it turned out that more Merced boxen are
running the current kernels than there had been 386 and 486DLC ones doing
the same in 2012. Granted, the latter bunch had been much older by that
point, but comparing the total amounts sold...
> > We really need tests for uaccess primitives. That's not a recent regression,
> > BTW - it had been that way since 2.3.48-pre2, as far as I can see.
>
> Probably be handy for new architectures to test all the corner cases.
I wouldn't be too optimistic about the existing ones, to be honest. Bitrot
happens, and slight modifications of exception table handling, etc. can
bugger some cases without anyone noticing.
FWIW, I'm running fairly exhaustive tests for mckinley __copy_user()
(after removing zero-padding part), giving it 0..4096 bytes available
until fault, asking to copy 0..4096 bytes and running it with all possible
(unsigned long)to % 16). The outermost loop is by number of bytes available,
so far got through 351 iterations, no problems found yet. Takes about 7s
per outer loop iteration; that'll go a bit slower as the distance to fault
increases, but not dramatically so - scaffolding includes 8Kb memcmp and a pair
of 8Kb memsets per combination, so the growing cost of __copy_user() (and
matching memcpy()) shouldn't increase it too much. That's just user-to-kernel
side, though...
For the record, the tests being run are as below (c_f_u() is a renamed
copy of __copy_user() with zero-padding taken out):
#define pr_fmt(fmt) "cfu test: %s " fmt, __func__
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/module.h>
extern unsigned long c_f_u(void *to, const void __user *from, unsigned long n);
static char pat[PAGE_SIZE];
static char cmp[PAGE_SIZE * 2];
static char *kp;
static char __user *up;
static int run_test(int avail, int asked, int off)
{
int copied;
char *p;
memset(kp, 1, 2 * PAGE_SIZE);
memset(cmp, 1, 2 * PAGE_SIZE);
copied = asked - c_f_u(kp + off, up + PAGE_SIZE - avail, asked);
if (copied < 0 || copied > asked) {
pr_err("impossible return value: %d not between 0 and %d\n",
copied, asked);
return -1;
}
if (avail && asked && !copied) {
pr_err("no progess (%d available, %d asked, nothing copied)\n",
avail, asked);
return -10;
}
if (asked <= avail && copied < asked) {
pr_err("bogus fault (%d available, %d asked, %d copied)\n",
avail, asked, copied);
return -2;
}
if (copied > avail) {
pr_err("claims to have copied %d with only %d avaialable\n",
copied, avail);
return -3;
}
memcpy(cmp + off, pat + PAGE_SIZE - avail, copied);
if (likely(!memcmp(kp, cmp, 2 * PAGE_SIZE)))
return 0;
if (memcmp(kp, cmp, off)) {
pr_err("modified memory _below_ 'to' (%d, %d, %d => %d)\n",
off, avail, asked, copied);
return -4;
}
if (memcmp(kp + off, cmp + off, copied)) {
char *p;
pr_err("crap in copy (%d, %d, %d => %d)",
off, avail, asked, copied);
p = memchr(kp + off, 1, copied);
if (p) {
int n = p - (kp + off);
memset(cmp + off + n, 1, copied - n);
if (!memcmp(kp, cmp, 2 * PAGE_SIZE)) {
pr_cont(" only %d copied\n", n);
return -5;
}
}
pr_cont("\n");
return -6;
}
/* must be after the copy... */
p = kp + off + copied;
if (!*p) {
int i, n;
n = 2 * PAGE_SIZE - off - copied;
for (i = 0; i < n && !p[i]; i++)
;
pr_err("crap after copy (%d, %d, %d => %d)",
off, avail, asked, copied);
pr_cont(" padded with %d zeroes\n", i);
return 0;
}
pr_err("crap after copy (%d, %d, %d => %d)\n",
off, avail, asked, copied);
return -8;
}
static int __init cfu_test(void)
{
int i;
kp = kmalloc(PAGE_SIZE * 2, GFP_KERNEL);
if (!kp)
return -EAGAIN;
up = (char __user *)vm_mmap(NULL, 0, 2 * PAGE_SIZE,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, 0);
if (IS_ERR(up)) {
pr_err("Failed to allocate user memory\n");
kfree(kp);
return -EAGAIN;
}
vm_munmap((unsigned long)up + PAGE_SIZE, PAGE_SIZE);
for (i = 0; i < PAGE_SIZE; i++)
pat[i] = 128 | i;
if (copy_to_user(up, pat, PAGE_SIZE)) {
pr_err("failed to copy to user memory\n");
goto out;
}
for (i = 0; i <= 4096; i++) {
int j;
pr_err("trying %d\n", i);
for (j = 0; j <= 4096; j++) {
int k;
for (k = 0; k < 16; k++) {
if (run_test(i, j, k) < 0)
break;
}
}
}
out:
vm_munmap((unsigned long)up, PAGE_SIZE);
kfree(kp);
return -EAGAIN;
}
module_init(cfu_test);
MODULE_LICENSE("GPL");
Updates since v1:
* metag conversion (based on fixes from James Hogan) added. Result
tested by the aforementioned metag maintainer.
* xtensa fix added, result tested.
* arm, arm64, amd64 tested.
* s390 fix folded, result tested.
* arc fix added, result tested.
* parisc fix replaced with backmerge of the variant in mainline,
result tested.
* ia64 conversion for CONFIG_MCKINLEY added; appears to work.
CONFIG_ITANIUM *not* converted; the current mainline has all kinds
of bugs in that config, including a user-triggerable oops with one
hell of a DoS potential. That one needs to be fixed in -stable, at least
to the point where it wouldn't allow any user to leave the box in a state
when any lookup in /tmp hangs unkillably, but as for the mainline...
Frankly, I suspect that we have fewer Merced boxen running mainline
kernels now than we had 386 and 486DLC ones doing the same five years ago,
when CONFIG_M386 finally got killed. IOW, maybe it's time to put it
out of its misery.
* backmerges of mainline fixes (on ia64, mips, powerpc and parisc
branches) added.
* conversion made unconditional
* HAVE_ARCH_HARDENED_USERCOPY removed (universally true now)
* no object size checks remain in arch/*
* ibmvnet bugs spotted and fixed; that'll get fed into net-next
ASAP.
* balance is at -3KLoC now (OK, -2984LoC)
* the thing is included into #for-next.
The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
in #work.uaccess. It's still based at 4.11-rc1 and topology is unchanged,
except for backmerges into arch branches instead of cherry-picking the mainline
fixes into them + a couple of followup commits after the place where branches
converge (making stuff unconditional). Infrastructure part hadn't been
rebased or modified in any way since the previous version; if you are OK
with your architecture branch (uaccess.<arch>) you can say so and it'll
be put into never-rebased mode as well, making it safe to pull into your
tree. Alternatively, if you want to cherry-pick stuff from that branch,
just put it into never-rebased branch in your tree and tell me to pull
it.
As before, comments, review, testing, replacement patches, etc. are very
welcome. Folks, if you don't yell, it will get pushed come next cycle.
I don't believe that 104-piece mailbomb with total size at 0.5 megabyte is
a good idea for public lists, but if somebody wants one, just say so.
Or just use git...
FWIW, the current stats are:
Al Viro (100):
uaccess: move VERIFY_{READ,WRITE} definitions to linux/uaccess.h
uaccess: drop duplicate includes from asm/uaccess.h
uaccess: drop pointless ifdefs
add asm-generic/extable.h
new helper: uaccess_kernel()
asm-generic/uaccess.h: don't mess with __copy_{to,from}_user
asm-generic: zero in __get_user(), not __get_user_fn()
generic ...copy_..._user primitives
alpha: switch __copy_user() and __do_clean_user() to normal calling conventions
alpha: add asm/extable.h
alpha: get rid of 'segment' argument of __{get,put}_user_check()
alpha: don't bother with __access_ok() in traps.c
alpha: kill the 'segment' argument of __access_ok()
alpha: add a helper for emitting exception table entries
alpha: switch to RAW_COPY_USER
arc: get rid of unused declaration
arm: switch to generic extable.h
arm: switch to RAW_COPY_USER
arm64: add extable.h
avr32: switch to generic extable.h
arm64: switch to RAW_COPY_USER
avr32: switch to RAW_COPY_USER
blackfin: switch to generic extable.h
bfin: switch to RAW_COPY_USER
c6x: remove duplicate definition of __access_ok
c6x: switch to RAW_COPY_USER
cris: switch to generic extable.h
cris: don't rely upon __copy_user_zeroing() zeroing the tail
cris: get rid of zeroing in __asm_copy_from_user_N for N > 4
cris: get rid of zeroing
cris: rename __copy_user_zeroing to __copy_user_in
cris: switch to RAW_COPY_USER
frv: switch to use of fixup_exception()
frv: switch to RAW_COPY_USER
8300: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
m32r: switch to generic extable.h
m32r: get rid of zeroing
m68k: switch to generic extable.h
m68k: get rid of zeroing
m68k: switch to RAW_COPY_USER
metag: switch to generic extable.h
metag: kill verify_area()
microblaze: switch to generic extable.h
microblaze: switch to RAW_COPY_USER
mn10300: switch to generic extable.h
mn10300: get rid of zeroing
mn10300: switch to RAW_COPY_USER
nios2: switch to generic extable.h
nios2: switch to RAW_COPY_USER
openrisc: switch to generic extable.h
openrisc: switch to RAW_COPY_USER
powerpc: switch to extable.h
s390: switch to extable.h
score: switch to generic extable.h
score: it's "VERIFY_WRITE", not "VERFITY_WRITE"...
score: switch to RAW_COPY_USER
sh: switch to extable.h
sh: switch to RAW_COPY_USER
sparc32: kill __ret_efault()
tile: switch to generic extable.h
tile: get rid of zeroing, switch to RAW_COPY_USER
um: switch to RAW_COPY_USER
amd64: get rid of zeroing
unicore32: get rid of zeroing and switch to RAW_COPY_USER
kill __copy_from_user_nocache()
xtensa: switch to generic extable.h
xtensa: get rid of zeroing, use RAW_COPY_USER
arc: switch to RAW_COPY_USER
m32r: switch to RAW_COPY_USER
x86: don't wank with magical size in __copy_in_user()
x86: switch to RAW_COPY_USER
s390: get rid of zeroing, switch to RAW_COPY_USER
Merge branch 'parisc-4.11-3' of git://git.kernel.org/.../deller/parisc-linux into uaccess.parisc
parisc: switch to RAW_COPY_USER
sparc: switch to RAW_COPY_USER
Merge branch 'fixes' of git://git.kernel.org/.../jhogan/metag into uaccess.metag
Merge commit 'fc69910f329d' into uaccess.mips
mips: sanitize __access_ok()
mips: consolidate __invoke_... wrappers
mips: clean and reorder the forest of macros...
mips: make copy_from_user() zero tail explicitly
mips: get rid of tail-zeroing in primitives
mips: switch to RAW_COPY_USER
don't open-code kernel_setsockopt()
alpha: fix stack smashing in old_adjtimex(2)
esas2r: don't open-code memdup_user()
ibmvnic: fix kstrtoul, copy_from_user and copy_to_user misuse
Merge commit 'a7d2475af7aedcb9b5c6343989a8bfadbf84429b' into uaccess.powerpc
powerpc: get rid of zeroing, switch to RAW_COPY_USER
Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64
ia64: add extable.h
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: sanitize __access_ok()
ia64: get rid of copy_in_user()
get rid of padding, switch to RAW_COPY_USER
Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
James Hogan (8):
metag/usercopy: Drop unused macros
metag/usercopy: Fix alignment error checking
metag/usercopy: Add early abort to copy_to_user
metag/usercopy: Zero rest of buffer from copy_from_user
metag/usercopy: Set flags before ADDZ
metag/usercopy: Fix src fixup in from user rapf loops
metag/usercopy: Add missing fixups
metag/usercopy: Switch to RAW_COPY_USER
Max Filippov (1):
xtensa: fix prefetch in the raw_copy_to_user
Vineet Gupta (1):
ARC: uaccess: enable INLINE_COPY_{TO,FROM}_USER ...
arch/alpha/include/asm/extable.h | 55 ++++
arch/alpha/include/asm/futex.h | 16 +-
arch/alpha/include/asm/uaccess.h | 305 +++++---------------
arch/alpha/kernel/osf_sys.c | 2 +-
arch/alpha/kernel/traps.c | 152 +++-------
arch/alpha/lib/clear_user.S | 66 ++---
arch/alpha/lib/copy_user.S | 82 +++---
arch/alpha/lib/csum_partial_copy.c | 10 +-
arch/alpha/lib/ev6-clear_user.S | 84 +++---
arch/alpha/lib/ev6-copy_user.S | 104 +++----
arch/arc/include/asm/Kbuild | 1 +
arch/arc/include/asm/uaccess.h | 25 +-
arch/arc/mm/extable.c | 14 -
arch/arm/Kconfig | 1 -
arch/arm/include/asm/Kbuild | 1 +
arch/arm/include/asm/uaccess.h | 87 ++----
arch/arm/lib/uaccess_with_memcpy.c | 4 +-
arch/arm64/Kconfig | 1 -
arch/arm64/include/asm/extable.h | 25 ++
arch/arm64/include/asm/uaccess.h | 83 +-----
arch/arm64/kernel/arm64ksyms.c | 2 +-
arch/arm64/lib/copy_in_user.S | 4 +-
arch/avr32/include/asm/Kbuild | 1 +
arch/avr32/include/asm/uaccess.h | 39 +--
arch/avr32/kernel/avr32_ksyms.c | 2 -
arch/avr32/lib/copy_user.S | 15 -
arch/blackfin/include/asm/Kbuild | 1 +
arch/blackfin/include/asm/uaccess.h | 47 +---
arch/blackfin/kernel/process.c | 2 +-
arch/c6x/include/asm/Kbuild | 1 +
arch/c6x/include/asm/uaccess.h | 19 +-
arch/c6x/kernel/sys_c6x.c | 2 +-
arch/cris/arch-v10/lib/usercopy.c | 31 +--
arch/cris/arch-v32/lib/usercopy.c | 30 +-
arch/cris/include/arch-v10/arch/uaccess.h | 46 ++-
arch/cris/include/arch-v32/arch/uaccess.h | 54 ++--
arch/cris/include/asm/Kbuild | 1 +
arch/cris/include/asm/uaccess.h | 77 +----
arch/frv/include/asm/Kbuild | 1 +
arch/frv/include/asm/uaccess.h | 84 ++----
arch/frv/kernel/traps.c | 7 +-
arch/frv/mm/extable.c | 27 +-
arch/frv/mm/fault.c | 6 +-
arch/h8300/include/asm/Kbuild | 2 +-
arch/h8300/include/asm/uaccess.h | 54 ++++
arch/hexagon/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/uaccess.h | 18 +-
arch/hexagon/kernel/hexagon_ksyms.c | 4 +-
arch/hexagon/mm/copy_from_user.S | 2 +-
arch/hexagon/mm/copy_to_user.S | 2 +-
arch/ia64/Kconfig | 1 -
arch/ia64/include/asm/extable.h | 11 +
arch/ia64/include/asm/uaccess.h | 102 ++-----
arch/ia64/lib/memcpy_mck.S | 13 +-
arch/ia64/mm/extable.c | 5 +-
arch/m32r/include/asm/Kbuild | 1 +
arch/m32r/include/asm/uaccess.h | 189 +------------
arch/m32r/kernel/m32r_ksyms.c | 2 -
arch/m32r/lib/usercopy.c | 21 --
arch/m68k/include/asm/Kbuild | 1 +
arch/m68k/include/asm/processor.h | 10 -
arch/m68k/include/asm/uaccess.h | 1 +
arch/m68k/include/asm/uaccess_mm.h | 103 ++++---
arch/m68k/include/asm/uaccess_no.h | 43 +--
arch/m68k/kernel/signal.c | 2 +-
arch/m68k/kernel/traps.c | 9 +-
arch/m68k/lib/uaccess.c | 12 +-
arch/m68k/mm/fault.c | 2 +-
arch/metag/include/asm/Kbuild | 1 +
arch/metag/include/asm/uaccess.h | 63 +----
arch/metag/lib/usercopy.c | 318 ++++++++-------------
arch/microblaze/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/uaccess.h | 62 +----
arch/mips/Kconfig | 1 -
arch/mips/cavium-octeon/octeon-memcpy.S | 31 +--
arch/mips/include/asm/checksum.h | 4 +-
arch/mips/include/asm/r4kcache.h | 4 +-
arch/mips/include/asm/uaccess.h | 449 ++++--------------------------
arch/mips/kernel/mips-r2-to-r6-emul.c | 24 +-
arch/mips/kernel/syscall.c | 2 +-
arch/mips/kernel/unaligned.c | 10 +-
arch/mips/lib/memcpy.S | 49 ----
arch/mips/oprofile/backtrace.c | 2 +-
arch/mn10300/include/asm/Kbuild | 1 +
arch/mn10300/include/asm/uaccess.h | 187 +------------
arch/mn10300/kernel/mn10300_ksyms.c | 2 -
arch/mn10300/lib/usercopy.c | 18 --
arch/nios2/include/asm/Kbuild | 1 +
arch/nios2/include/asm/uaccess.h | 55 +---
arch/nios2/mm/uaccess.c | 16 +-
arch/openrisc/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/uaccess.h | 53 +---
arch/parisc/Kconfig | 1 -
arch/parisc/include/asm/futex.h | 2 +-
arch/parisc/include/asm/uaccess.h | 69 +----
arch/parisc/lib/memcpy.c | 16 +-
arch/powerpc/Kconfig | 1 -
arch/powerpc/include/asm/extable.h | 29 ++
arch/powerpc/include/asm/uaccess.h | 96 +------
arch/powerpc/lib/Makefile | 2 +-
arch/powerpc/lib/copy_32.S | 14 -
arch/powerpc/lib/copyuser_64.S | 35 +--
arch/powerpc/lib/usercopy_64.c | 41 ---
arch/s390/Kconfig | 1 -
arch/s390/include/asm/extable.h | 28 ++
arch/s390/include/asm/uaccess.h | 153 +---------
arch/s390/lib/uaccess.c | 68 ++---
arch/score/include/asm/Kbuild | 1 +
arch/score/include/asm/extable.h | 11 -
arch/score/include/asm/uaccess.h | 59 +---
arch/sh/include/asm/extable.h | 10 +
arch/sh/include/asm/uaccess.h | 64 +----
arch/sparc/Kconfig | 1 -
arch/sparc/include/asm/uaccess.h | 2 +-
arch/sparc/include/asm/uaccess_32.h | 44 +--
arch/sparc/include/asm/uaccess_64.h | 44 +--
arch/sparc/kernel/head_32.S | 7 -
arch/sparc/lib/GENcopy_from_user.S | 2 +-
arch/sparc/lib/GENcopy_to_user.S | 2 +-
arch/sparc/lib/GENpatch.S | 4 +-
arch/sparc/lib/NG2copy_from_user.S | 2 +-
arch/sparc/lib/NG2copy_to_user.S | 2 +-
arch/sparc/lib/NG2patch.S | 4 +-
arch/sparc/lib/NG4copy_from_user.S | 2 +-
arch/sparc/lib/NG4copy_to_user.S | 2 +-
arch/sparc/lib/NG4patch.S | 4 +-
arch/sparc/lib/NGcopy_from_user.S | 2 +-
arch/sparc/lib/NGcopy_to_user.S | 2 +-
arch/sparc/lib/NGpatch.S | 4 +-
arch/sparc/lib/U1copy_from_user.S | 4 +-
arch/sparc/lib/U1copy_to_user.S | 4 +-
arch/sparc/lib/U3copy_to_user.S | 2 +-
arch/sparc/lib/U3patch.S | 4 +-
arch/sparc/lib/copy_in_user.S | 6 +-
arch/sparc/lib/copy_user.S | 16 +-
arch/tile/include/asm/Kbuild | 1 +
arch/tile/include/asm/uaccess.h | 166 +----------
arch/tile/lib/exports.c | 7 +-
arch/tile/lib/memcpy_32.S | 41 +--
arch/tile/lib/memcpy_user_64.c | 15 +-
arch/um/include/asm/Kbuild | 1 +
arch/um/include/asm/uaccess.h | 13 +-
arch/um/kernel/skas/uaccess.c | 18 +-
arch/unicore32/include/asm/Kbuild | 1 +
arch/unicore32/include/asm/uaccess.h | 15 +-
arch/unicore32/kernel/ksyms.c | 4 +-
arch/unicore32/kernel/process.c | 2 +-
arch/unicore32/lib/copy_from_user.S | 16 +-
arch/unicore32/lib/copy_to_user.S | 6 +-
arch/x86/Kconfig | 1 -
arch/x86/include/asm/uaccess.h | 70 +----
arch/x86/include/asm/uaccess_32.h | 127 +--------
arch/x86/include/asm/uaccess_64.h | 128 +--------
arch/x86/lib/usercopy.c | 54 +---
arch/x86/lib/usercopy_32.c | 288 +------------------
arch/x86/lib/usercopy_64.c | 13 -
arch/xtensa/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/asm-uaccess.h | 3 -
arch/xtensa/include/asm/uaccess.h | 67 +----
arch/xtensa/lib/usercopy.S | 116 ++++----
block/bsg.c | 2 +-
drivers/net/ethernet/ibm/ibmvnic.c | 100 +++----
drivers/scsi/esas2r/esas2r_ioctl.c | 25 +-
drivers/scsi/sg.c | 2 +-
fs/ocfs2/cluster/tcp.c | 25 +-
include/asm-generic/extable.h | 26 ++
include/asm-generic/uaccess.h | 135 +--------
include/linux/uaccess.h | 197 ++++++++++++-
include/rdma/ib.h | 2 +-
kernel/trace/bpf_trace.c | 2 +-
lib/Makefile | 2 +-
lib/iov_iter.c | 6 +-
lib/usercopy.c | 26 ++
mm/memory.c | 2 +-
net/rds/tcp.c | 5 +-
net/rds/tcp_send.c | 8 +-
security/Kconfig | 9 -
security/tomoyo/network.c | 2 +-
178 files changed, 1608 insertions(+), 4592 deletions(-)
create mode 100644 arch/alpha/include/asm/extable.h
create mode 100644 arch/arm64/include/asm/extable.h
create mode 100644 arch/h8300/include/asm/uaccess.h
create mode 100644 arch/ia64/include/asm/extable.h
create mode 100644 arch/powerpc/include/asm/extable.h
delete mode 100644 arch/powerpc/lib/usercopy_64.c
create mode 100644 arch/s390/include/asm/extable.h
delete mode 100644 arch/score/include/asm/extable.h
create mode 100644 arch/sh/include/asm/extable.h
create mode 100644 include/asm-generic/extable.h
create mode 100644 lib/usercopy.c
On Fri, Apr 07, 2017 at 01:24:24AM +0100, Al Viro wrote:
> * ibmvnet bugs spotted and fixed; that'll get fed into net-next
> ASAP.
... except that net-next had them fixed in much better way - by removing
the crap in question completely. Commit dropped.