2019-10-06 22:23:23

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sat, May 21, 2016 at 09:59:07PM -0700, Linus Torvalds wrote:
> We really should avoid the "__{get,put}_user()" functions entirely,
> because they can easily be mis-used and the original intent of being
> used for simple direct user accesses no longer holds in a post-SMAP/PAN
> world.
>
> Manually optimizing away the user access range check makes no sense any
> more, when the range check is generally much cheaper than the "enable
> user accesses" code that the __{get,put}_user() functions still need.
>
> So instead of __put_user(), use the unsafe_put_user() interface with
> user_access_{begin,end}() that really does generate better code these
> days, and which is generally a nicer interface. Under some loads, the
> multiple user writes that filldir() does are actually quite noticeable.
>
> This also makes the dirent name copy use unsafe_put_user() with a couple
> of macros. We do not want to make function calls with SMAP/PAN
> disabled, and the code this generates is quite good when the
> architecture uses "asm goto" for unsafe_put_user() like x86 does.
>
> Note that this doesn't bother with the legacy cases. Nobody should use
> them anyway, so performance doesn't really matter there.
>
> Signed-off-by: Linus Torvalds <[email protected]>

Linus,

this patch causes all my sparc64 emulations to stall during boot. It causes
all alpha emulations to crash with [1a] and [1b] when booting from a virtual
disk, and one of the xtensa emulations to crash with [2].

Reverting this patch fixes the problem.

Guenter

---
[1a]

Unable to handle kernel paging request at virtual address 0000000000000004
rcS(47): Oops -1
pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Not tainted
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 = 0000000000000000 t0 = 0000000000000000 t1 = 0000000120117e8b
t2 = 646e617275303253 t3 = 646e617275300000 t4 = 0000000000007fe8
t5 = 0000000120117e78 t6 = 0000000000000000 t7 = fffffc0007ec8000
s0 = fffffc0007dbca56 s1 = 000000000000000a s2 = 0000000000000020
s3 = fffffc0007ecbec8 s4 = 0000000000000008 s5 = 0000000000000021
s6 = 1cd2631fe897bf5a
a0 = fffffc0007dbca56 a1 = 2f2f2f2f2f2f2f2f a2 = 000000000000000a
a3 = 1cd2631fe897bf5a a4 = 0000000000000021 a5 = 0000000000000008
t8 = 0000000000000020 t9 = 0000000000000000 t10= fffffc0007dbca60
t11= 0000000000000001 pv = fffffc0000b9a810 at = 0000000000000001
gp = fffffc0000f03930 sp = (____ptrval____)
Disabling lock debugging due to kernel taint
Trace:
[<fffffc00004e7a08>] call_filldir+0xe8/0x1b0
[<fffffc00004e8684>] ext4_readdir+0x924/0xa70
[<fffffc0000ba3088>] _raw_spin_unlock+0x18/0x30
[<fffffc00003f751c>] __handle_mm_fault+0x9fc/0xc30
[<fffffc0000450c68>] iterate_dir+0x198/0x240
[<fffffc0000450b2c>] iterate_dir+0x5c/0x240
[<fffffc00004518b8>] ksys_getdents64+0xa8/0x160
[<fffffc0000451990>] sys_getdents64+0x20/0x40
[<fffffc0000451280>] filldir64+0x0/0x320
[<fffffc0000311634>] entSys+0xa4/0xc0

---
[1b]

Unable to handle kernel paging request at virtual address 0000000000000004
reboot(50): Oops -1
pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Tainted: G D
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 = 0000000000000000 t0 = 0000000067736d6b t1 = 000000012011445b
t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000007ef8
t5 = 0000000120114448 t6 = 0000000000000000 t7 = fffffc0007eec000
s0 = fffffc000792b5c3 s1 = 0000000000000004 s2 = 0000000000000018
s3 = fffffc0007eefec8 s4 = 0000000000000008 s5 = 00000000f00000a3
s6 = 000000000000000b
a0 = fffffc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0000000000000004
a3 = 000000000000000b a4 = 00000000f00000a3 a5 = 0000000000000008
t8 = 0000000000000018 t9 = 0000000000000000 t10= 0000000022e1d02a
t11= 000000011f8fd3b8 pv = fffffc0000b9a810 at = 0000000022e1ccf8
gp = fffffc0000f03930 sp = (____ptrval____)
Trace:
[<fffffc00004ccba0>] proc_readdir_de+0x170/0x300
[<fffffc0000451280>] filldir64+0x0/0x320
[<fffffc00004c565c>] proc_root_readdir+0x3c/0x80
[<fffffc0000450c68>] iterate_dir+0x198/0x240
[<fffffc00004518b8>] ksys_getdents64+0xa8/0x160
[<fffffc0000451990>] sys_getdents64+0x20/0x40
[<fffffc0000451280>] filldir64+0x0/0x320
[<fffffc0000311634>] entSys+0xa4/0xc0

---
[2]

Unable to handle kernel paging request at virtual address 0000000000000004
reboot(50): Oops -1
pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Tainted: G D
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 = 0000000000000000 t0 = 0000000067736d6b t1 = 000000012011445b
t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000007ef8
t5 = 0000000120114448 t6 = 0000000000000000 t7 = fffffc0007eec000
s0 = fffffc000792b5c3 s1 = 0000000000000004 s2 = 0000000000000018
s3 = fffffc0007eefec8 s4 = 0000000000000008 s5 = 00000000f00000a3
s6 = 000000000000000b
a0 = fffffc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0000000000000004
a3 = 000000000000000b a4 = 00000000f00000a3 a5 = 0000000000000008
t8 = 0000000000000018 t9 = 0000000000000000 t10= 0000000022e1d02a
t11= 000000011fd6f3b8 pv = fffffc0000b9a810 at = 0000000022e1ccf8
gp = fffffc0000f03930 sp = (____ptrval____)
Trace:
[<fffffc00004ccba0>] proc_readdir_de+0x170/0x300
[<fffffc0000451280>] filldir64+0x0/0x320
[<fffffc00004c565c>] proc_root_readdir+0x3c/0x80
[<fffffc0000450c68>] iterate_dir+0x198/0x240
[<fffffc00004518b8>] ksys_getdents64+0xa8/0x160
[<fffffc0000451990>] sys_getdents64+0x20/0x40
[<fffffc0000451280>] filldir64+0x0/0x320
[<fffffc0000311634>] entSys+0xa4/0xc0

Code:
00000000
00063301
000007a3
00001111
00003f64

Segmentation fault


2019-10-06 23:07:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck <[email protected]> wrote:
>
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

Ho humm. I've run variations of that patch over a few years on x86,
but obviously not on alpha/sparc.

At least I should still be able to read alpha assembly, even after all
these years. Would you mind sending me the result of

make fs/readdir.s

on alpha with the broken config? I'd hope that the sparc issue is the same.

Actually, could you also do

make fs/readdir.o

and then send me the "objdump --disassemble" of that? That way I get
the instruction offsets without having to count by hand.

> Unable to handle kernel paging request at virtual address 0000000000000004
> rcS(47): Oops -1
> pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Not tainted
> pc is at 0x4

That is _funky_. I'm not seeing how it could possibly jump to 0x4, but
it clearly does.

That said, are you sure it's _that_ commit? Because this pattern:

> a0 = fffffc0007dbca56 a1 = 2f2f2f2f2f2f2f2f a2 = 000000000000000a

implicates the memchr('/') call in the next one. That's a word full of
'/' characters.

Of course, it could just be left-over register contents from that
memchr(), but it makes me wonder. Particularly since it seems to
happen early in filldir64():

> ra is at filldir64+0x64/0x320

which is just a fairly small handful of instructions in, and I
wouldn't be shocked if that's the return address for the call to
memchr.

Linus

2019-10-06 23:37:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 4:06 PM Linus Torvalds
<[email protected]> wrote:
>
> Ho humm. I've run variations of that patch over a few years on x86,
> but obviously not on alpha/sparc.

Oooh.

I wonder... This may be the name string copy loop. And it's special in
that the result may not be aligned.

Now, a "__put_user()" with an unaligned address _should_ work - it's
very easy to trigger that from user space by just giving an unaligned
address to any system call that then writes a single word.

But alpha does

#define __put_user_32(x, addr) \
__asm__ __volatile__("1: stl %r2,%1\n" \
"2:\n" \
EXC(1b,2b,$31,%0) \
: "=r"(__pu_err) \
: "m"(__m(addr)), "rJ"(x), "0"(__pu_err))

iow it implements that 32-bit __put_user() as a 'stl'.

Which will trap if it's not aligned.

And I wonder how much testing that has ever gotten. Nobody really does
unaigned accesses on alpha.

We need to do that memcpy unrolling on x86, because x86 actually uses
"user_access_begin()" and we have magic rules about what is inside
that region.

But on alpha (and sparc) it might be better to just do "__copy_to_user()".

Anyway, this does look like a possible latent bug where the alpha
unaligned trap doesn't then handle the case of exceptions. I know it
_tries_, but I doubt it's gotten a whole lot of testing.

Anyway, let me think about this, but just for testing, does the
attached patch make any difference? It's not the right thing in
general (and most definitely not on x86), but for testing whether this
is about unaligned accesses it might work.

It's entirely untested, and in fact on x86 it should cause objtool to
complain about a function call with AC set. But I think that on alpha
and sparc, using __copy_to_user() for the name copy should work, and
would work around the unaligned issue.

That said, if it *is* the unaligned issue, then that just means that
we have a serious bug elsewhere in the alpha port. Maybe nobody cares.

Linus


Attachments:
patch.diff (658.00 B)

2019-10-07 00:05:06

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On 10/6/19 4:35 PM, Linus Torvalds wrote:
[ ... ]

> Anyway, let me think about this, but just for testing, does the
> attached patch make any difference? It's not the right thing in
> general (and most definitely not on x86), but for testing whether this
> is about unaligned accesses it might work.
>

All my alpha, sparc64, and xtensa tests pass with the attached patch
applied on top of v5.4-rc2. I didn't test any others.

I'll (try to) send you some disassembly next.

Guenter

2019-10-07 00:24:01

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 06, 2019 at 04:06:16PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck <[email protected]> wrote:
> >
> > this patch causes all my sparc64 emulations to stall during boot. It causes
> > all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> > disk, and one of the xtensa emulations to crash with [2].
>
> Ho humm. I've run variations of that patch over a few years on x86,
> but obviously not on alpha/sparc.
>
> At least I should still be able to read alpha assembly, even after all
> these years. Would you mind sending me the result of
>
> make fs/readdir.s
>
> on alpha with the broken config? I'd hope that the sparc issue is the same.
>
> Actually, could you also do
>
> make fs/readdir.o
>
> and then send me the "objdump --disassemble" of that? That way I get
> the instruction offsets without having to count by hand.
>

Both attached for alpha.

> > Unable to handle kernel paging request at virtual address 0000000000000004
> > rcS(47): Oops -1
> > pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Not tainted
> > pc is at 0x4
>
> That is _funky_. I'm not seeing how it could possibly jump to 0x4, but
> it clearly does.
>
> That said, are you sure it's _that_ commit? Because this pattern:
>
Bisect on sparc pointed to this commit, and re-running the tests with
the commit reverted passed for all architectures. I didn't check any
further.

Please let me know if you need anything else at this point.

Thanks,
Guenter


Attachments:
(No filename) (1.54 kB)
readdir.s (65.86 kB)
readdir.s.objdump (32.19 kB)
Download all attachments

2019-10-07 01:18:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck <[email protected]> wrote:
>
> All my alpha, sparc64, and xtensa tests pass with the attached patch
> applied on top of v5.4-rc2. I didn't test any others.

Okay... I really wish my guess had been wrong.

Because fixing filldir64 isn't the problem. I can come up with
multiple ways to avoid the unaligned issues if that was the problem.

But it does look to me like the fundamental problem is that unaligned
__put_user() calls might just be broken on alpha (and likely sparc
too). Because that looks to be the only difference between the
__copy_to_user() approach and using unsafe_put_user() in a loop.

Now, I should have handled unaligned things differently in the first
place, and in that sense I think commit 9f79b78ef744 ("Convert
filldir[64]() from __put_user() to unsafe_put_user()") really is
non-optimal on architectures with alignment issues.

And I'll fix it.

But at the same time, the fact that "non-optimal" turns into "doesn't
work" is a fairly nasty issue.

> I'll (try to) send you some disassembly next.

Thanks, verified.

The "ra is at filldir64+0x64/0x320" is indeed right at the return
point of the "jsr verify_dirent_name".

But the problem isn't there - that's just left-over state. I'm pretty
sure that function worked fine, and returned.

The problem is that "pc is at 0x4" and the page fault that then
happens at that address as a result.

And that seems to be due to this:

8c0: 00 00 29 2c ldq_u t0,0(s0)
8c4: 07 00 89 2c ldq_u t3,7(s0)
8c8: 03 04 e7 47 mov t6,t2
8cc: c1 06 29 48 extql t0,s0,t0
8d0: 44 0f 89 48 extqh t3,s0,t3
8d4: 01 04 24 44 or t0,t3,t0
8d8: 00 00 22 b4 stq t0,0(t1)

that's the "get_unaligned((type *)src)" (the first six instructions)
followed by the "unsafe_put_user()" done with a single "stq". That's
the guts of the unsafe_copy_loop() as part of
unsafe_copy_dirent_name()

And what I think happens is that it is writing to user memory that is

(a) unaligned

(b) not currently mapped in user space

so then the do_entUna() function tries to handle the unaligned trap,
but then it takes an exception while doing that (due to the unmapped
page), and then something in that nested exception mess causes it to
mess up badly and corrupt the register contents on stack, and it
returns with garbage in 'pc', and then you finally die with that

Unable to handle kernel paging request at virtual address 0000000000000004
pc is at 0x4

thing.

And yes, I'll fix that name copy loop in filldir to align the
destination first, *but* if I'm right, it means that something like
this should also likely cause issues:

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/mman.h>

int main(int argc, char **argv)
{
void *mymap;
uid_t *bad_ptr = (void *) 0x01;

/* Create unpopulated memory area */
mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE
| MAP_ANONYMOUS, -1, 0);

/* Unaligned uidpointer in that memory area */
bad_ptr = mymap+1;

/* Make the kernel do put_user() on it */
return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2);
}

because that simple user mode program should cause that same "page
fault on unaligned put_user()" behavior as far as I can tell.

Mind humoring me and trying that on your alpha machine (or emulator,
or whatever)?

Linus

2019-10-07 01:26:11

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 06, 2019 at 06:17:02PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck <[email protected]> wrote:
> >
> > All my alpha, sparc64, and xtensa tests pass with the attached patch
> > applied on top of v5.4-rc2. I didn't test any others.
>
> Okay... I really wish my guess had been wrong.
>
> Because fixing filldir64 isn't the problem. I can come up with
> multiple ways to avoid the unaligned issues if that was the problem.
>
> But it does look to me like the fundamental problem is that unaligned
> __put_user() calls might just be broken on alpha (and likely sparc
> too). Because that looks to be the only difference between the
> __copy_to_user() approach and using unsafe_put_user() in a loop.
>
> Now, I should have handled unaligned things differently in the first
> place, and in that sense I think commit 9f79b78ef744 ("Convert
> filldir[64]() from __put_user() to unsafe_put_user()") really is
> non-optimal on architectures with alignment issues.
>
> And I'll fix it.

Ugh... I wonder if it would be better to lift STAC/CLAC out of
raw_copy_to_user(), rather than trying to reinvent its guts
in readdir.c...

2019-10-07 02:07:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 6:24 PM Al Viro <[email protected]> wrote:
>
> Ugh... I wonder if it would be better to lift STAC/CLAC out of
> raw_copy_to_user(), rather than trying to reinvent its guts
> in readdir.c...

Yeah, I suspect that's the best option.

Do something like

- lift STAC/CLAC out of raw_copy_to_user

- rename it to unsafe_copy_to_user

- create a new raw_copy_to_user that is just unsafe_copy_to_user()
with the STAC/CLAC around it.

and the end result would actually be cleanert than what we have now
(which duplicates that STAC/CLAC for each size case etc).

And then for the "architecture doesn't have user_access_begin/end()"
fallback case, we just do

#define unsafe_copy_to_user raw_copy_to_user

and the only slight painpoint is that we need to deal with that
copy_user_generic() case too.

We'd have to mark it uaccess_safe in objtool (but we already have that
__memcpy_mcsafe and csum_partial_copy_generic, os it all makes sense),
and we'd have to make all the other copy_user_generic() cases then do
the CLAC/STAC dance too or something.

ANYWAY. As mentioned, I'm not actually all that worried about this all.

I could easily also just see the filldir() copy do an extra

#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
if (len && (1 & (uint_ptr_t)dst)) .. copy byte ..
if (len > 1 && (2 & (uint_ptr_t)dst)) .. copy word ..
if (len > 3 && (4 & (uint_ptr_t)dst) && sizeof(unsigned long) > 4)
.. copy dword ..
#endif

at the start to align the destination.

The filldir code is actually somewhat unusual in that it deals with
pretty small strings on average, so just doing this might be more
efficient anyway.

So that doesn't worry me. Multiple ways to solve that part.

The "uhhuh, unaligned accesses cause more than performance problems" -
that's what worries me.

Linus

2019-10-07 02:33:54

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On 10/6/19 6:17 PM, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck <[email protected]> wrote:
[ ... ]
> And yes, I'll fix that name copy loop in filldir to align the
> destination first, *but* if I'm right, it means that something like
> this should also likely cause issues:
>
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <sys/mman.h>
>
> int main(int argc, char **argv)
> {
> void *mymap;
> uid_t *bad_ptr = (void *) 0x01;
>
> /* Create unpopulated memory area */
> mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE
> | MAP_ANONYMOUS, -1, 0);
>
> /* Unaligned uidpointer in that memory area */
> bad_ptr = mymap+1;
>
> /* Make the kernel do put_user() on it */
> return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2);
> }
>
> because that simple user mode program should cause that same "page
> fault on unaligned put_user()" behavior as far as I can tell.
>
> Mind humoring me and trying that on your alpha machine (or emulator,
> or whatever)?
>

Here you are. This is with v5.4-rc2 and your previous patch applied
on top.

/ # ./mmtest
Unable to handle kernel paging request at virtual address 0000000000000004
mmtest(75): Oops -1
pc = [<0000000000000004>] ra = [<fffffc0000311584>] ps = 0000 Not tainted
pc is at 0x4
ra is at entSys+0xa4/0xc0
v0 = fffffffffffffff2 t0 = 0000000000000000 t1 = 0000000000000000
t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000000000
t5 = 000000000000fffe t6 = 0000000000000000 t7 = fffffc0007edc000
s0 = 0000000000000000 s1 = 00000001200006f0 s2 = 00000001200df19f
s3 = 00000001200ea0b9 s4 = 0000000120114630 s5 = 00000001201145d8
s6 = 000000011f955c50
a0 = 000002000002c001 a1 = 000002000002c005 a2 = 000002000002c009
a3 = 0000000000000000 a4 = ffffffffffffffff a5 = 0000000000000000
t8 = 0000000000000000 t9 = fffffc0000000000 t10= 0000000000000000
t11= 000000011f955788 pv = fffffc0000349450 at = 00000000f8db54d3
gp = fffffc0000f2a160 sp = 00000000ab237c72
Disabling lock debugging due to kernel taint
Trace:

Code:
00000000
00063301
000007b6
00001111
00003f8d

Segmentation fault

Guenter

2019-10-07 02:52:27

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 06, 2019 at 07:06:19PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 6:24 PM Al Viro <[email protected]> wrote:
> >
> > Ugh... I wonder if it would be better to lift STAC/CLAC out of
> > raw_copy_to_user(), rather than trying to reinvent its guts
> > in readdir.c...
>
> Yeah, I suspect that's the best option.
>
> Do something like
>
> - lift STAC/CLAC out of raw_copy_to_user
>
> - rename it to unsafe_copy_to_user
>
> - create a new raw_copy_to_user that is just unsafe_copy_to_user()
> with the STAC/CLAC around it.
>
> and the end result would actually be cleanert than what we have now
> (which duplicates that STAC/CLAC for each size case etc).
>
> And then for the "architecture doesn't have user_access_begin/end()"
> fallback case, we just do
>
> #define unsafe_copy_to_user raw_copy_to_user

Callers of raw_copy_to_user():
arch/hexagon/mm/uaccess.c:27: uncleared = raw_copy_to_user(dest, &empty_zero_page, PAGE_SIZE);
arch/hexagon/mm/uaccess.c:34: count = raw_copy_to_user(dest, &empty_zero_page, count);
arch/powerpc/kvm/book3s_64_mmu_radix.c:68: ret = raw_copy_to_user(to, from, n);
arch/s390/include/asm/uaccess.h:150: size = raw_copy_to_user(ptr, x, size);
include/asm-generic/uaccess.h:145: return unlikely(raw_copy_to_user(ptr, x, size)) ? -EFAULT : 0;
include/linux/uaccess.h:93: return raw_copy_to_user(to, from, n);
include/linux/uaccess.h:102: return raw_copy_to_user(to, from, n);
include/linux/uaccess.h:131: n = raw_copy_to_user(to, from, n);
lib/iov_iter.c:142: n = raw_copy_to_user(to, from, n);
lib/usercopy.c:28: n = raw_copy_to_user(to, from, n);


Out of those, only __copy_to_user_inatomic(), __copy_to_user(),
_copy_to_user() and iov_iter.c:copyout() can be called on
any architecture.

The last two should just do user_access_begin()/user_access_end()
instead of access_ok(). __copy_to_user_inatomic() has very few callers as well:

arch/mips/kernel/unaligned.c:1307: res = __copy_to_user_inatomic(addr, fpr, sizeof(*fpr));
drivers/gpu/drm/i915/i915_gem.c:345: unwritten = __copy_to_user_inatomic(user_data,
lib/test_kasan.c:471: unused = __copy_to_user_inatomic(usermem, kmem, size + 1);
mm/maccess.c:98: ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);

So few, in fact, that I wonder if we want to keep it at all; the only
thing stopping me from "let's remove it" is that I don't understand
the i915 side of things. Where does it do an equivalent of access_ok()?

And mm/maccess.c one is __probe_kernel_write(), so presumably we don't
want stac/clac there at all...

So do we want to bother with separation between raw_copy_to_user() and
unsafe_copy_to_user()? After all, __copy_to_user() also has only few
callers, most of them in arch/*

I'll take a look into that tomorrow - half-asleep right now...

2019-10-07 03:13:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 7:50 PM Al Viro <[email protected]> wrote:
>
> Out of those, only __copy_to_user_inatomic(), __copy_to_user(),
> _copy_to_user() and iov_iter.c:copyout() can be called on
> any architecture.
>
> The last two should just do user_access_begin()/user_access_end()
> instead of access_ok(). __copy_to_user_inatomic() has very few callers as well:

Yeah, good points.

It looks like it would be better to just change over semantics
entirely to the unsafe_copy_user() model.

> So few, in fact, that I wonder if we want to keep it at all; the only
> thing stopping me from "let's remove it" is that I don't understand
> the i915 side of things. Where does it do an equivalent of access_ok()?

Honestly, if you have to ask, I think the answer is: just add one.

Every single time we've had people who optimized things to try to
avoid the access_ok(), they just caused bugs and problems.

In this case, I think it's done a few callers up in i915_gem_pread_ioctl():

if (!access_ok(u64_to_user_ptr(args->data_ptr),
args->size))
return -EFAULT;

but honestly, trying to optimize away another "access_ok()" is just
not worth it. I'd rather have an extra one than miss one.

> And mm/maccess.c one is __probe_kernel_write(), so presumably we don't
> want stac/clac there at all...

Yup.

> So do we want to bother with separation between raw_copy_to_user() and
> unsafe_copy_to_user()? After all, __copy_to_user() also has only few
> callers, most of them in arch/*

No, you're right. Just switch over.

> I'll take a look into that tomorrow - half-asleep right now...

Thanks. No huge hurry.

Linus

2019-10-07 03:15:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 7:30 PM Guenter Roeck <[email protected]> wrote:
>
> > Mind humoring me and trying that on your alpha machine (or emulator,
> > or whatever)?
>
> Here you are. This is with v5.4-rc2 and your previous patch applied
> on top.
>
> / # ./mmtest
> Unable to handle kernel paging request at virtual address 0000000000000004

Oookay.

Well, that's what I expected, but it's good to just have it confirmed.

Well, not "good" in this case. Bad bad bad.

The fs/readdir.c changes clearly exposed a pre-existing bug on alpha.
Not making excuses for it, but at least it explains why code that
_looks_ correct ends up causing that kind of issue.

I guess the other 'strict alignment' architectures should be checking
that test program too. I'll post my test program to the arch
maintainers list.

Linus

2019-10-07 04:13:01

by Max Filippov

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck <[email protected]> wrote:
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

[...]

> [2]
>
> Unable to handle kernel paging request at virtual address 0000000000000004
> reboot(50): Oops -1
> pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Tainted: G D
> pc is at 0x4
> ra is at filldir64+0x64/0x320
> v0 = 0000000000000000 t0 = 0000000067736d6b t1 = 000000012011445b
> t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000007ef8
> t5 = 0000000120114448 t6 = 0000000000000000 t7 = fffffc0007eec000
> s0 = fffffc000792b5c3 s1 = 0000000000000004 s2 = 0000000000000018
> s3 = fffffc0007eefec8 s4 = 0000000000000008 s5 = 00000000f00000a3
> s6 = 000000000000000b
> a0 = fffffc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0000000000000004
> a3 = 000000000000000b a4 = 00000000f00000a3 a5 = 0000000000000008
> t8 = 0000000000000018 t9 = 0000000000000000 t10= 0000000022e1d02a
> t11= 000000011fd6f3b8 pv = fffffc0000b9a810 at = 0000000022e1ccf8
> gp = fffffc0000f03930 sp = (____ptrval____)
> Trace:
> [<fffffc00004ccba0>] proc_readdir_de+0x170/0x300
> [<fffffc0000451280>] filldir64+0x0/0x320
> [<fffffc00004c565c>] proc_root_readdir+0x3c/0x80
> [<fffffc0000450c68>] iterate_dir+0x198/0x240
> [<fffffc00004518b8>] ksys_getdents64+0xa8/0x160
> [<fffffc0000451990>] sys_getdents64+0x20/0x40
> [<fffffc0000451280>] filldir64+0x0/0x320
> [<fffffc0000311634>] entSys+0xa4/0xc0

This doesn't look like a dump from xtensa core.
v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace
doesn't work well, because all directories appear to be empty.

__put_user/__get_user don't do unaligned access on xtensa,
they check address alignment and return -EFAULT if it's bad.

--
Thanks.
-- Max

2019-10-07 12:20:42

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

Hi Max,

On 10/6/19 9:04 PM, Max Filippov wrote:
> On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck <[email protected]> wrote:
>> this patch causes all my sparc64 emulations to stall during boot. It causes
>> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
>> disk, and one of the xtensa emulations to crash with [2].
>
> [...]
>
>> [2]
>>
>> Unable to handle kernel paging request at virtual address 0000000000000004
>> reboot(50): Oops -1
>> pc = [<0000000000000004>] ra = [<fffffc00004512e4>] ps = 0000 Tainted: G D
>> pc is at 0x4
>> ra is at filldir64+0x64/0x320
>> v0 = 0000000000000000 t0 = 0000000067736d6b t1 = 000000012011445b
>> t2 = 0000000000000000 t3 = 0000000000000000 t4 = 0000000000007ef8
>> t5 = 0000000120114448 t6 = 0000000000000000 t7 = fffffc0007eec000
>> s0 = fffffc000792b5c3 s1 = 0000000000000004 s2 = 0000000000000018
>> s3 = fffffc0007eefec8 s4 = 0000000000000008 s5 = 00000000f00000a3
>> s6 = 000000000000000b
>> a0 = fffffc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0000000000000004
>> a3 = 000000000000000b a4 = 00000000f00000a3 a5 = 0000000000000008
>> t8 = 0000000000000018 t9 = 0000000000000000 t10= 0000000022e1d02a
>> t11= 000000011fd6f3b8 pv = fffffc0000b9a810 at = 0000000022e1ccf8
>> gp = fffffc0000f03930 sp = (____ptrval____)
>> Trace:
>> [<fffffc00004ccba0>] proc_readdir_de+0x170/0x300
>> [<fffffc0000451280>] filldir64+0x0/0x320
>> [<fffffc00004c565c>] proc_root_readdir+0x3c/0x80
>> [<fffffc0000450c68>] iterate_dir+0x198/0x240
>> [<fffffc00004518b8>] ksys_getdents64+0xa8/0x160
>> [<fffffc0000451990>] sys_getdents64+0x20/0x40
>> [<fffffc0000451280>] filldir64+0x0/0x320
>> [<fffffc0000311634>] entSys+0xa4/0xc0
>
> This doesn't look like a dump from xtensa core.
> v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace
> doesn't work well, because all directories appear to be empty.
>
> __put_user/__get_user don't do unaligned access on xtensa,
> they check address alignment and return -EFAULT if it's bad.
>
You are right, sorry; I must have mixed that up. xtensa doesn't crash.
The boot stalls, similar to sparc64. This is only seen with my nommu
test (de212:kc705-nommu:nommu_kc705_defconfig). xtensa mmu tests are fine,
at least for me, but then I only run tests with initrd (which for some
reason doesn't crash on alpha either).

Guenter

2019-10-07 15:43:57

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

> From: Linus Torvalds
> Sent: 07 October 2019 04:12
...
> In this case, I think it's done a few callers up in i915_gem_pread_ioctl():
>
> if (!access_ok(u64_to_user_ptr(args->data_ptr),
> args->size))
> return -EFAULT;
>
> but honestly, trying to optimize away another "access_ok()" is just
> not worth it. I'd rather have an extra one than miss one.

You don't really want an extra access_ok() for every 'word' of a copy.
Some copies have to be done a word at a time.

And the checks someone added to copy_to/from_user() to detect kernel
buffer overruns must kill performance when the buffers are way down the stack
or in kmalloc()ed space.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-10-07 17:35:08

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 06, 2019 at 08:11:42PM -0700, Linus Torvalds wrote:

> > So do we want to bother with separation between raw_copy_to_user() and
> > unsafe_copy_to_user()? After all, __copy_to_user() also has only few
> > callers, most of them in arch/*
>
> No, you're right. Just switch over.
>
> > I'll take a look into that tomorrow - half-asleep right now...
>
> Thanks. No huge hurry.

Tangentially related: copy_regster_to_user() and copy_regset_from_user().
That's where we do access_ok(), followed by calls of ->get() and
->set() resp. Those tend to either use user_regset_copy{out,in}(),
or open-code those. The former variant tends to lead to few calls
of __copy_{to,from}_user(); the latter... On x86 it ends up doing
this:
static int genregs_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
if (kbuf) {
unsigned long *k = kbuf;
while (count >= sizeof(*k)) {
*k++ = getreg(target, pos);
count -= sizeof(*k);
pos += sizeof(*k);
}
} else {
unsigned long __user *u = ubuf;
while (count >= sizeof(*u)) {
if (__put_user(getreg(target, pos), u++))
return -EFAULT;
count -= sizeof(*u);
pos += sizeof(*u);
}
}

return 0;
}

Potentially doing arseloads of stac/clac as it goes. OTOH, getreg()
(and setreg()) in there are not entirely trivial, so blanket
user_access_begin()/user_access_end() over the entire loop might be
a bad idea...

How hot is that codepath? I know that arch/um used to rely on it
(== PTRACE_[GS]ETREGS) quite a bit...

2019-10-07 18:14:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 8:40 AM David Laight <[email protected]> wrote:
>
> You don't really want an extra access_ok() for every 'word' of a copy.

Yes you do.

> Some copies have to be done a word at a time.

Completely immaterial. If you can't see the access_ok() close to the
__get/put_user(), you have a bug.

Plus the access_ok() is cheap. The real cost is the STAC/CLAC.

So stop with the access_ok() "optimizations". They are broken garbage.

Really.

I've been very close to just removing __get_user/__put_user several
times, exactly because people do completely the wrong thing with them
- not speeding code up, but making it unsafe and buggy.

The new "user_access_begin/end()" model is much better, but it also
has actual STATIC checking that there are no function calls etc inside
th4e region, so it forces you to do the loop properly and tightly, and
not the incorrect "I checked the range somewhere else, now I'm doing
an unsafe copy".

And it actually speeds things up, unlike the access_ok() games.

Linus

2019-10-07 18:14:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 10:34 AM Al Viro <[email protected]> wrote:
>
> Tangentially related: copy_regster_to_user() and copy_regset_from_user().

Not a worry. It's not performance-critical code, and if it ever is, it
needs to be rewritten anyway.


> The former variant tends to lead to few calls
> of __copy_{to,from}_user(); the latter... On x86 it ends up doing
> this:

Just replace the __put_user() with a put_user() and be done with it.
That code isn't acceptable, and if somebody ever complains about
performance it's not the lack of __put_user that is the problem.

Linus

2019-10-07 18:23:03

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 11:13:27AM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 10:34 AM Al Viro <[email protected]> wrote:
> >
> > Tangentially related: copy_regster_to_user() and copy_regset_from_user().
>
> Not a worry. It's not performance-critical code, and if it ever is, it
> needs to be rewritten anyway.
>
> > The former variant tends to lead to few calls
> > of __copy_{to,from}_user(); the latter... On x86 it ends up doing
> > this:
>
> Just replace the __put_user() with a put_user() and be done with it.
> That code isn't acceptable, and if somebody ever complains about
> performance it's not the lack of __put_user that is the problem.

I wonder if it would be better off switching to several "copy in bulk"
like e.g. ppc does. That boilerplate with parallel "to/from kernel"
and "to/from userland" loops is asking for bugs - the calling
conventions like "pass kbuf and ubuf; exactly one must be NULL"
tend to be trouble, IME; I'm not saying we should just pass
struct iov_iter * instead of count+pos+kbuf+ubuf to ->get() and
->set(), but it might clean the things up nicely.

Let me look into that zoo a bit more...

2019-10-07 18:27:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds
<[email protected]> wrote:
>
> >
> > The last two should just do user_access_begin()/user_access_end()
> > instead of access_ok(). __copy_to_user_inatomic() has very few callers as well:
>
> Yeah, good points.

Looking at it some more this morning, I think it's actually pretty painful.

The good news is that right now x86 is the only architecture that does
that user_access_begin(), so we don't need to worry about anything
else. Apparently the ARM people haven't had enough performance
problems with the PAN bit for them to care.

We can have a fallback wrapper for unsafe_copy_to_user() for other
architectures that just does the __copy_to_user().

But on x86, if we move the STAC/CLAC out of the low-level copy
routines and into the callers, we'll have a _lot_ of churn. I thought
it would be mostly a "teach objtool" thing, but we have lots of
different versions of it. Not just the 32-bit vs 64-bit, it's embedded
in all the low-level asm implementations.

And we don't want the regular "copy_to/from_user()" to then have to
add the STAC/CLAC at the call-site. So then we'd want to un-inline
copy_to_user() entirely.

Which all sounds like a really good idea, don't get me wrong. I think
we inline it way too aggressively now. But it'sa _big_ job.

So we probably _should_

- remove INLINE_COPY_TO/FROM_USER

- remove all the "small constant size special cases".

- make "raw_copy_to/from_user()" have the "unsafe" semantics and make
the out-of-line copy in lib/usercopy.c be the only real interface

- get rid of a _lot_ of oddities

but looking at just how much churn this is, I suspect that for 5.4
it's a bit late to do quite that much cleanup.

I hope you prove me wrong. But I'll look at a smaller change to just
make x86 use the current special copy loop (as
"unsafe_copy_to_user()") and have everybody else do the trivial
wrapper.

Because we definitely should do that cleanup (it also fixes the whole
"atomic copy in kernel space" issue that you pointed to that doesn't
actually want STAC/CLAC at all), but it just looks fairly massive to
me.

Linus

2019-10-07 18:37:28

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 11:28 AM Linus Torvalds
<[email protected]> wrote:
>
> On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > >
> > > The last two should just do user_access_begin()/user_access_end()
> > > instead of access_ok(). __copy_to_user_inatomic() has very few callers as well:
> >
> > Yeah, good points.
>
> Looking at it some more this morning, I think it's actually pretty painful.

Late to this party ,,, but my ia64 console today is full of:

irqbalance(5244): unaligned access to 0x2000000800042f9b, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042fbb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042fdb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042ffb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x200000080004301b, ip=0xa0000001002fef90
ia64_handle_unaligned: 95 callbacks suppressed
irqbalance(5244): unaligned access to 0x2000000800042f9b, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042fbb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042fdb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x2000000800042ffb, ip=0xa0000001002fef90
irqbalance(5244): unaligned access to 0x200000080004301b, ip=0xa0000001002fef90
ia64_handle_unaligned: 95 callbacks suppressed

Those ip's point into filldir64()

-Tony

2019-10-07 19:10:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 11:36 AM Tony Luck <[email protected]> wrote:
>
> Late to this party ,,, but my ia64 console today is full of:

Hmm? I thought ia64 did unaligneds ok.

But regardless, this is my current (as yet untested) patch. This is
not the big user access cleanup that I hope Al will do, this is just a
"ok, x86 is the only one who wants a special unsafe_copy_to_user()
right now" patch.

Linus


Attachments:
patch.diff (4.50 kB)

2019-10-07 19:22:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck <[email protected]> wrote:
>
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

So I think your alpha emulation environment may be broken, because
Michael Cree reports that it works for him on real hardware, but he
does see the kernel unaligned count being high.

But regardless, this is my current fairly minimal patch that I think
should fix the unaligned issue, while still giving the behavior we
want on x86. I hope Al can do something nicer, but I think this is
"acceptable".

I'm running this now on x86, and I verified that x86-32 code
generation looks sane too, but it woudl be good to verify that this
makes the alignment issue go away on other architectures.

Linus


Attachments:
patch.diff (4.50 kB)

2019-10-07 19:50:05

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 12:09 PM Linus Torvalds
<[email protected]> wrote:
> Hmm? I thought ia64 did unaligneds ok.

If PSR.ac is set, we trap. If it isn't set, then model specific
(though all implementations will
trap for an unaligned access that crosses a 4K boundary).

Linux sets PSR.ac. Applications can use prctl(PR_SET_UNALIGN) to choose whether
they want the kernel to silently fix things or to send SIGBUS.

Kernel always noisily (rate limited) fixes up unaligned access.

Your patch does make all the messages go away.

Tested-by: Tony Luck <[email protected]>

2019-10-07 20:05:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 12:49 PM Tony Luck <[email protected]> wrote:
>
> If PSR.ac is set, we trap. If it isn't set, then model specific
> (though all implementations will
> trap for an unaligned access that crosses a 4K boundary).

Ok. At that point, setting AC unconditionally is the better model just
to get test coverage for "it will trap occasionally anyway".

Odd "almost-but-not-quite x86" both in naming and in behavior (AC was
a no-op in kernel-mode until SMAP).

> Your patch does make all the messages go away.
>
> Tested-by: Tony Luck <[email protected]>

Ok, I'll commit it, and we'll see what Al can come up with that might
be a bigger cleanup.

Linus

2019-10-07 20:29:42

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 12:21:25PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck <[email protected]> wrote:
> >
> > this patch causes all my sparc64 emulations to stall during boot. It causes
> > all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> > disk, and one of the xtensa emulations to crash with [2].
>
> So I think your alpha emulation environment may be broken, because
> Michael Cree reports that it works for him on real hardware, but he
> does see the kernel unaligned count being high.
>
Yes, that possibility always exists, unfortunately.

> But regardless, this is my current fairly minimal patch that I think
> should fix the unaligned issue, while still giving the behavior we
> want on x86. I hope Al can do something nicer, but I think this is
> "acceptable".
>
> I'm running this now on x86, and I verified that x86-32 code
> generation looks sane too, but it woudl be good to verify that this
> makes the alignment issue go away on other architectures.
>
> Linus

I started a complete test run with the patch applied. I'll let you know
how it went after it is complete - it should be done in a couple of hours.

Guenter

2019-10-07 23:27:45

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On 10/7/19 12:21 PM, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck <[email protected]> wrote:
>>
>> this patch causes all my sparc64 emulations to stall during boot. It causes
>> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
>> disk, and one of the xtensa emulations to crash with [2].
>
> So I think your alpha emulation environment may be broken, because
> Michael Cree reports that it works for him on real hardware, but he
> does see the kernel unaligned count being high.
>
> But regardless, this is my current fairly minimal patch that I think
> should fix the unaligned issue, while still giving the behavior we
> want on x86. I hope Al can do something nicer, but I think this is
> "acceptable".
>
> I'm running this now on x86, and I verified that x86-32 code
> generation looks sane too, but it woudl be good to verify that this
> makes the alignment issue go away on other architectures.
>
> Linus
>

Test results look good. Feel free to add
Tested-by: Guenter Roeck <[email protected]>
to your patch.

Build results:
total: 158 pass: 154 fail: 4
Failed builds:
arm:allmodconfig
m68k:defconfig
mips:allmodconfig
sparc64:allmodconfig
Qemu test results:
total: 391 pass: 390 fail: 1
Failed tests:
ppc64:mpc8544ds:ppc64_e5500_defconfig:nosmp:initrd

This is with "regulator: fixed: Prevent NULL pointer dereference when !CONFIG_OF"
applied as well. The other failures are unrelated.

arm:

arch/arm/crypto/aes-ce-core.S:299: Error:
selected processor does not support `movw ip,:lower16:.Lcts_permute_table' in ARM mode

Fix is pending in crypto tree.

m68k:

c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported'

I don't know the status.

mips:

drivers/staging/octeon/ethernet-defines.h:30:38: error:
'CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE' undeclared
and other similar errors. I don't know the status.

ppc64:

powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to `node_reclaim_distance'

Reported against offending patch earlier today.

sparc64:

drivers/watchdog/cpwd.c:500:19: error: 'compat_ptr_ioctl' undeclared here

Oops. I'll need to look into that. Looks like the patch to use a new
infrastructure made it into the kernel but the infrastructure itself
didn't make it after all.

Guenter

2019-10-08 03:29:45

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote:

> But on x86, if we move the STAC/CLAC out of the low-level copy
> routines and into the callers, we'll have a _lot_ of churn. I thought
> it would be mostly a "teach objtool" thing, but we have lots of
> different versions of it. Not just the 32-bit vs 64-bit, it's embedded
> in all the low-level asm implementations.
>
> And we don't want the regular "copy_to/from_user()" to then have to
> add the STAC/CLAC at the call-site. So then we'd want to un-inline
> copy_to_user() entirely.

For x86? Sure, why not... Note, BTW, that for short constant-sized
copies we *do* STAC/CLAC at the call site - see those
__uaccess_begin_nospec();
in raw_copy_{from,to}_user() in the switches...

> Which all sounds like a really good idea, don't get me wrong. I think
> we inline it way too aggressively now. But it'sa _big_ job.
>
> So we probably _should_
>
> - remove INLINE_COPY_TO/FROM_USER
>
> - remove all the "small constant size special cases".
>
> - make "raw_copy_to/from_user()" have the "unsafe" semantics and make
> the out-of-line copy in lib/usercopy.c be the only real interface
>
> - get rid of a _lot_ of oddities

Not that many, really. All we need is a temporary cross-architecture
__uaccess_begin_nospec(), so that __copy_{to,from}_user() could have
that used, instead of having it done in (x86) raw_copy_..._...().

Other callers of raw_copy_...() would simply wrap it into user_access_begin()/
user_access_end() pairs; this kludge is needed only in __copy_from_user()
and __copy_to_user(), and only until we kill their callers outside of
arch/*. Which we can do, in a cycle or two. _ANY_ use of
that temporary kludge outside of those two functions will be grepped
for and LARTed into the ground.

> I hope you prove me wrong. But I'll look at a smaller change to just
> make x86 use the current special copy loop (as
> "unsafe_copy_to_user()") and have everybody else do the trivial
> wrapper.
>
> Because we definitely should do that cleanup (it also fixes the whole
> "atomic copy in kernel space" issue that you pointed to that doesn't
> actually want STAC/CLAC at all), but it just looks fairly massive to
> me.

AFAICS, it splits nicely.

1) cross-architecture user_access_begin_dont_use(): on everything
except x86 it's empty, on x86 - __uaccess_begin_nospec().

2) stac/clac lifted into x86 raw_copy_..._user() out of
copy_user_generic_unrolled(), copy_user_generic_string() and
copy_user_enhanced_fast_string(). Similar lift out of
__copy_user_nocache().

3) lifting that thing as user_access_begin_dont_use() into
__copy_..._user...() and as user_access_begin() into other
generic callers, consuming access_ok() in the latter.
__copy_to_user_inatomic() can die at the same stage.

4) possibly uninlining on x86 (and yes, killing the special
size handling off). We don't need to touch the inlining
decisions for any other architectures.

At that point raw_copy_to_user() is available for e.g.
readdir.c to play with.

And up to that point only x86 sees any kind of code changes,
so we don't have to worry about other architectures.

5) kill the __copy_...() users outside of arch/*, alone with
quite a few other weird shits in there. A cycle or two,
with the final goal being to kill the damn things off.

6) arch/* users get arch-by-arch treatment - mostly
it's sigframe handling. Won't affect the generic code
and would be independent for different architectures.
Can happen in parallel with (5), actually.

7) ... at that point user_access_begin_dont_user() gets
removed and thrown into the pile of mangled fingers of
those who'd ignored all warnings and used it somewhere
else.

I don't believe that (5) would be doable entirely in
this cycle, but quite a few bits might be.

On a somewhat related note, do you see any problems with

void *copy_mount_options(const void __user * data)
{
unsigned offs, size;
char *copy;

if (!data)
return NULL;

copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!copy)
return ERR_PTR(-ENOMEM);

offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);

if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
kfree(copy);
return ERR_PTR(-EFAULT);
}
if (offs) {
if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
memset(copy + PAGE_SIZE - offs, 0, offs);
}
return copy;
}

on the theory that any fault halfway through a page means a race with
munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
And to hell with exact_copy_from_user(), byte-by-byte copying, etc.

2019-10-08 04:10:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 8:29 PM Al Viro <[email protected]> wrote:
>
> For x86? Sure, why not... Note, BTW, that for short constant-sized
> copies we *do* STAC/CLAC at the call site - see those
> __uaccess_begin_nospec();
> in raw_copy_{from,to}_user() in the switches...

Yeah, an that code almost never actually triggers in practice. The
code is pointless and dead.

The thing is, it's only ever used for the double undescore versions,
and the ones that do have have it are almost never constant sizes in
the first place.

And yes, there's like a couple of cases in the whole kernel.

Just remove those constant size cases. They are pointless and just
complicate our headers and slow down the compile for no good reason.

Try the attached patch, and then count the number of "rorx"
instructions in the kernel. Hint: not many. On my personal config,
this triggers 15 times in the whole kernel build (not counting
modules).

It's not worth it. The "speedup" from using __copy_{to,from}_user()
with the fancy inlining is negligible. All the cost is in the
STAC/CLAC anyway, the code might as well be deleted.

> 1) cross-architecture user_access_begin_dont_use(): on everything
> except x86 it's empty, on x86 - __uaccess_begin_nospec().

No, just do a proper range check, and use user_access_begin()

Stop trying to optimize that range check away. It's a couple of fast
instructions.

The only ones who don't want the range check are the actual kernel
copy ones, but they don't want the user_access_begin() either.

> void *copy_mount_options(const void __user * data)
> {
> unsigned offs, size;
> char *copy;
>
> if (!data)
> return NULL;
>
> copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
> if (!copy)
> return ERR_PTR(-ENOMEM);
>
> offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);
>
> if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
> kfree(copy);
> return ERR_PTR(-EFAULT);
> }
> if (offs) {
> if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
> memset(copy + PAGE_SIZE - offs, 0, offs);
> }
> return copy;
> }
>
> on the theory that any fault halfway through a page means a race with
> munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
> And to hell with exact_copy_from_user(), byte-by-byte copying, etc.

Looks reasonable.

Linus


Attachments:
patch.diff (2.90 kB)

2019-10-08 04:15:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
<[email protected]> wrote:
>
> Try the attached patch, and then count the number of "rorx"
> instructions in the kernel. Hint: not many. On my personal config,
> this triggers 15 times in the whole kernel build (not counting
> modules).

So here's a serious patch that doesn't just mark things for counting -
it just removes the cases entirely.

Doesn't this look nice:

2 files changed, 2 insertions(+), 133 deletions(-)

and it is one less thing to worry about when doing further cleanup.

Seriously, if any of those __copy_{to,from}_user() constant cases were
a big deal, we can turn them into get_user/put_user calls. But only
after they show up as an actual performance issue.

Linus


Attachments:
patch.diff (4.79 kB)

2019-10-08 04:25:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
<[email protected]> wrote:
>
> Try the attached patch, and then count the number of "rorx"
> instructions in the kernel. Hint: not many. On my personal config,
> this triggers 15 times in the whole kernel build (not counting
> modules).

.. and four of them are in perf_callchain_user(), and are due to those
"__copy_from_user_nmi()" with either 4-byte or 8-byte copies.

It might as well just use __get_user() instead.

The point being that the silly code in the header files is just
pointless. We shouldn't do it.

Linus

2019-10-08 05:00:32

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 09:09:14PM -0700, Linus Torvalds wrote:

> > 1) cross-architecture user_access_begin_dont_use(): on everything
> > except x86 it's empty, on x86 - __uaccess_begin_nospec().
>
> No, just do a proper range check, and use user_access_begin()
>
> Stop trying to optimize that range check away. It's a couple of fast
> instructions.
>
> The only ones who don't want the range check are the actual kernel
> copy ones, but they don't want the user_access_begin() either.

Not at the first step. Sure, in the end we want exactly that, and
we want it ASAP. However, the main reason it grows into a tangled
mess that would be over the top for this cycle is the impacts in
arseload of places all over arch/*.

That way we can untangle those. The initial segment that would
allow to use raw_copy_to_user() cleanly in readdir.c et.al.
could be done with provably zero impact on anything in arch/*
outside of arch/x86 usercopy-related code.

Moreover, it will be fairly small. And after that the rest can
be done in any order, independent from each other. I want to
kill __copy_... completely, and I believe we'll be able to do
just that in a cycle or two.

Once that is done, the helper disappears along with __copy_...().
And it will be documented as a temporary kludge, don't use
anywhere else, no matter what from the very beginning. For
all the couple of cycles it'll take.

I'm serious about getting rid of __copy_...() in that timeframe.
There's not that much left.

The reason I don't want to do a blanket search-and-replace turning
them all into copy_...() is simply that their use is a good indicator
of code in need of serious beating^Wamount of careful review.

And hell, we might end up doing just that on case-by-case basis.
Often enough we will, by what I'd seen there...

Again, this kluge is only a splitup aid - by the end of the series
it's gone. All it allows is to keep it easier to review.

Note, BTW, that bits and pieces converting a given pointless use
of __copy_...() to copy_...() can be reordered freely at any point
of the sequence - I've already got several. _Some_ of (5) will
be conversions a-la readdir.c one and that has to follow (4), but
most of it won't be like that.

> > void *copy_mount_options(const void __user * data)
> > {
> > unsigned offs, size;
> > char *copy;
> >
> > if (!data)
> > return NULL;
> >
> > copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > if (!copy)
> > return ERR_PTR(-ENOMEM);
> >
> > offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);
> >
> > if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
> > kfree(copy);
> > return ERR_PTR(-EFAULT);
> > }
> > if (offs) {
> > if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
> > memset(copy + PAGE_SIZE - offs, 0, offs);
> > }
> > return copy;
> > }
> >
> > on the theory that any fault halfway through a page means a race with
> > munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
> > And to hell with exact_copy_from_user(), byte-by-byte copying, etc.
>
> Looks reasonable.

OK... BTW, do you agree that the use of access_ok() in
drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early
cutoff, so we don't bother waiting if user has passed an obviously bogus
address. copy_to_user() is used for actual copying there...

There are other places following that pattern and IMO they are all
wrong. Another variety is a half-arsed filter trying to prevent warnings
from too large (and user-controllable) kmalloc() of buffer we'll be
copying to. Which is worth very little, since kmalloc() will scream and
fail well before access_ok() limits will trip. Those need explicit capping
of the size, IMO...

2019-10-08 05:06:44

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 09:14:51PM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Try the attached patch, and then count the number of "rorx"
> > instructions in the kernel. Hint: not many. On my personal config,
> > this triggers 15 times in the whole kernel build (not counting
> > modules).
>
> So here's a serious patch that doesn't just mark things for counting -
> it just removes the cases entirely.
>
> Doesn't this look nice:
>
> 2 files changed, 2 insertions(+), 133 deletions(-)
>
> and it is one less thing to worry about when doing further cleanup.
>
> Seriously, if any of those __copy_{to,from}_user() constant cases were
> a big deal, we can turn them into get_user/put_user calls. But only
> after they show up as an actual performance issue.

Makes sense. I'm not arguing against doing that. Moreover, I suspect
that other architectures will be similar, at least once the
sigframe-related code for given architecture is dealt with. But that's
more of a "let's look at that later" thing (hopefully with maintainers
of architectures getting involved).

2019-10-08 06:29:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 8, 2019 at 1:30 AM Guenter Roeck <[email protected]> wrote:
> m68k:
>
> c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported'
>
> I don't know the status.

Fall-out from the (non)inline optimization. Patch available:
https://lore.kernel.org/lkml/[email protected]/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-10-08 10:00:44

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

From: Linus Torvalds <[email protected]>
> Sent: 07 October 2019 19:11
...
> I've been very close to just removing __get_user/__put_user several
> times, exactly because people do completely the wrong thing with them
> - not speeding code up, but making it unsafe and buggy.

They could do the very simple check that 'user_ptr+size < kernel_base'
rather than the full window check under the assumption that access_ok()
has been called and that the likely errors are just overruns.

> The new "user_access_begin/end()" model is much better, but it also
> has actual STATIC checking that there are no function calls etc inside
> the region, so it forces you to do the loop properly and tightly, and
> not the incorrect "I checked the range somewhere else, now I'm doing
> an unsafe copy".
>
> And it actually speeds things up, unlike the access_ok() games.

I've code that does:
if (!access_ok(...))
return -EFAULT;
...
for (...) {
if (__get_user(tmp_u64, user_ptr++))
return -EFAULT;
writeq(tmp_u64, io_ptr++);
}
(Although the code is more complex because not all transfers are multiples of 8 bytes.)

With user_access_begin/end() I'd probably want to put the copy loop
inside a function (which will probably get inlined) to avoid convoluted
error processing.
So you end up with:
if (!user_access_ok())
return _EFAULT;
user_access_begin();
rval = do_copy_code(...);
user_access_end();
return rval;
Which, at the source level (at least) breaks your 'no function calls' rule.
The writeq() might also break it as well.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-10-08 13:17:22

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
>
> OK... BTW, do you agree that the use of access_ok() in
> drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early
> cutoff, so we don't bother waiting if user has passed an obviously bogus
> address. copy_to_user() is used for actual copying there...

Yes, it's wrong, and not needed. I'll go rip it out unless you want to?

thanks,

greg k-h

2019-10-08 15:31:51

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> >
> > OK... BTW, do you agree that the use of access_ok() in
> > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early
> > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > address. copy_to_user() is used for actual copying there...
>
> Yes, it's wrong, and not needed. I'll go rip it out unless you want to?

I'll throw it into misc queue for now; it has no prereqs and nothing is going
to depend upon it.

While looking for more of the same pattern: usb_device_read(). Frankly,
usb_device_dump() calling conventions look ugly - it smells like it
would be much happier as seq_file. Iterator would take some massage,
but that seems to be doable. Anyway, that's a separate story...

2019-10-08 15:39:06

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote:
> On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> > >
> > > OK... BTW, do you agree that the use of access_ok() in
> > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early
> > > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > > address. copy_to_user() is used for actual copying there...
> >
> > Yes, it's wrong, and not needed. I'll go rip it out unless you want to?
>
> I'll throw it into misc queue for now; it has no prereqs and nothing is going
> to depend upon it.

Great, thanks.

> While looking for more of the same pattern: usb_device_read(). Frankly,
> usb_device_dump() calling conventions look ugly - it smells like it
> would be much happier as seq_file. Iterator would take some massage,
> but that seems to be doable. Anyway, that's a separate story...

That's just a debugfs file, and yes, it should be moved to seq_file. I
think I tried it a long time ago, but given it's just a debugging thing,
I gave up as it wasn't worth it.

But yes, the access_ok() there also seems odd, and should be dropped.

thanks,

greg k-h

2019-10-08 17:07:44

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 05:38:31PM +0200, Greg KH wrote:
> On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote:
> > On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> > > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> > > >
> > > > OK... BTW, do you agree that the use of access_ok() in
> > > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early
> > > > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > > > address. copy_to_user() is used for actual copying there...
> > >
> > > Yes, it's wrong, and not needed. I'll go rip it out unless you want to?
> >
> > I'll throw it into misc queue for now; it has no prereqs and nothing is going
> > to depend upon it.
>
> Great, thanks.
>
> > While looking for more of the same pattern: usb_device_read(). Frankly,
> > usb_device_dump() calling conventions look ugly - it smells like it
> > would be much happier as seq_file. Iterator would take some massage,
> > but that seems to be doable. Anyway, that's a separate story...
>
> That's just a debugfs file, and yes, it should be moved to seq_file. I
> think I tried it a long time ago, but given it's just a debugging thing,
> I gave up as it wasn't worth it.
>
> But yes, the access_ok() there also seems odd, and should be dropped.

I'm almost tempted to keep it there as a reminder/grep fodder ;-)

Seriously, though, it might be useful to have a way of marking the places
in need of gentle repair of retrocranial inversions _without_ attracting
the "checkpatch warning of the week" crowd...

2019-10-08 19:59:39

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote:

> The good news is that right now x86 is the only architecture that does
> that user_access_begin(), so we don't need to worry about anything
> else. Apparently the ARM people haven't had enough performance
> problems with the PAN bit for them to care.

Take a look at this:
static inline unsigned long raw_copy_from_user(void *to,
const void __user *from, unsigned long n)
{
unsigned long ret;
if (__builtin_constant_p(n) && (n <= 8)) {
ret = 1;

switch (n) {
case 1:
barrier_nospec();
__get_user_size(*(u8 *)to, from, 1, ret);
break;
case 2:
barrier_nospec();
__get_user_size(*(u16 *)to, from, 2, ret);
break;
case 4:
barrier_nospec();
__get_user_size(*(u32 *)to, from, 4, ret);
break;
case 8:
barrier_nospec();
__get_user_size(*(u64 *)to, from, 8, ret);
break;
}
if (ret == 0)
return 0;
}

barrier_nospec();
allow_read_from_user(from, n);
ret = __copy_tofrom_user((__force void __user *)to, from, n);
prevent_read_from_user(from, n);
return ret;
}

That's powerpc. And while the constant-sized bits are probably pretty
useless there as well, note the allow_read_from_user()/prevent_read_from_user()
part. Looks suspiciously similar to user_access_begin()/user_access_end()...

The difference is, they have separate "for read" and "for write" primitives
and they want the range in their user_access_end() analogue. Separating
the read and write isn't a problem for callers (we want them close to
the actual memory accesses). Passing the range to user_access_end() just
might be tolerable, unless it makes you throw up...

2019-10-08 20:17:16

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote:

> That's powerpc. And while the constant-sized bits are probably pretty
> useless there as well, note the allow_read_from_user()/prevent_read_from_user()
> part. Looks suspiciously similar to user_access_begin()/user_access_end()...
>
> The difference is, they have separate "for read" and "for write" primitives
> and they want the range in their user_access_end() analogue. Separating
> the read and write isn't a problem for callers (we want them close to
> the actual memory accesses). Passing the range to user_access_end() just
> might be tolerable, unless it makes you throw up...

BTW, another related cleanup is futex_atomic_op_inuser() and
arch_futex_atomic_op_inuser(). In the former we have
if (!access_ok(uaddr, sizeof(u32)))
return -EFAULT;

ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
if (ret)
return ret;
and in the latter we've got STAC/CLAC pairs stuck into inlined bits
on x86. As well as allow_write_to_user(uaddr, sizeof(*uaddr)) on
ppc...

I don't see anything in x86 one objtool would've barfed if we pulled
STAC/CLAC out and turned access_ok() into user_access_begin(),
with matching user_access_end() right after the call of
arch_futex_atomic_op_inuser(). Everything is inlined there and
no scary memory accesses would get into the scope (well, we do
have
if (!ret)
*oval = oldval;
in the very end of arch_futex_atomic_op_inuser() there, but oval
is the address of a local variable in the sole caller; if we run
with kernel stack on ring 3 page, we are deeply fucked *and*
wouldn't have survived that far into futex_atomic_op_inuser() anyway ;-)

2019-10-08 20:36:45

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote:

> The difference is, they have separate "for read" and "for write" primitives
> and they want the range in their user_access_end() analogue. Separating
> the read and write isn't a problem for callers (we want them close to
> the actual memory accesses). Passing the range to user_access_end() just
> might be tolerable, unless it makes you throw up...

NOTE: I'm *NOT* suggesting to bring back the VERIFY_READ/VERIFY_WRITE
argument to access_ok(). We'd gotten rid of it, and for a very good
reason (and decades overdue).

The main difference between access_ok() and user_access_begin() is that
the latter is right next to actual memory access, with user_access_end()
on the other side, also very close. And most of those guys would be
concentrated in a few functions, where we bloody well know which
direction we are copying.

Even if we try and map ppc allow_..._to_user() on user_access_begin(),
access_ok() remains as it is (and I hope we'll get rid of the majority
of its caller in process).

2019-10-10 20:00:04

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Mon, Oct 07, 2019 at 09:24:17PM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Try the attached patch, and then count the number of "rorx"
> > instructions in the kernel. Hint: not many. On my personal config,
> > this triggers 15 times in the whole kernel build (not counting
> > modules).
>
> .. and four of them are in perf_callchain_user(), and are due to those
> "__copy_from_user_nmi()" with either 4-byte or 8-byte copies.
>
> It might as well just use __get_user() instead.
>
> The point being that the silly code in the header files is just
> pointless. We shouldn't do it.

FWIW, the one that looks the most potentiall sensitive in that bunch is
arch/x86/kvm/paging_tmpl.h:388: if (unlikely(__copy_from_user(&pte, ptep_user, sizeof(pte))))
in the bowels of KVM page fault handling. I would be very surprised if
the rest would be detectable...

Anyway, another question you way: what do you think of try/catch approaches
to __get_user() blocks, like e.g. restore_sigcontext() is doing?

Should that be available outside of arch/*? For that matter, would
it be a good idea to convert get_user_ex() users in arch/x86 to
unsafe_get_user()?

2019-10-10 22:13:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Thu, Oct 10, 2019 at 12:55 PM Al Viro <[email protected]> wrote:
>
> Anyway, another question you way: what do you think of try/catch approaches
> to __get_user() blocks, like e.g. restore_sigcontext() is doing?

I'd rather have them converted to our unsafe_get/put_user() instead.

We don't generate great code for the "get" case (because of how gcc
doesn't allow us to mix "asm goto" and outputs), but I really despise
the x86-specific "{get,put}_user_ex()" machinery. It's not actually
doing a real try/catch at all, and will just keep taking faults if one
happens.

But I've not gotten around to rewriting those disgusting sequences to
the unsafe_get/put_user() model. I did look at it, and it requires
some changes exactly *because* the _ex() functions are broken and
continue, but also because the current code ends up also doing other
things inside the try/catch region that you're not supposed to do in a
user_access_begin/end() region .

> Should that be available outside of arch/*? For that matter, would
> it be a good idea to convert get_user_ex() users in arch/x86 to
> unsafe_get_user()?

See above: yes, it would be a good idea to convert to
unsafe_get/put_user(), and no, we don't want to expose the horrid
*_ex() model to other architectures.

Linus

2019-10-11 00:11:48

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote:
> On Thu, Oct 10, 2019 at 12:55 PM Al Viro <[email protected]> wrote:
> >
> > Anyway, another question you way: what do you think of try/catch approaches
> > to __get_user() blocks, like e.g. restore_sigcontext() is doing?
>
> I'd rather have them converted to our unsafe_get/put_user() instead.
>
> We don't generate great code for the "get" case (because of how gcc
> doesn't allow us to mix "asm goto" and outputs), but I really despise
> the x86-specific "{get,put}_user_ex()" machinery. It's not actually
> doing a real try/catch at all, and will just keep taking faults if one
> happens.
>
> But I've not gotten around to rewriting those disgusting sequences to
> the unsafe_get/put_user() model. I did look at it, and it requires
> some changes exactly *because* the _ex() functions are broken and
> continue, but also because the current code ends up also doing other
> things inside the try/catch region that you're not supposed to do in a
> user_access_begin/end() region .

Hmm... Which one was that? AFAICS, we have
do_sys_vm86: only get_user_ex()
restore_sigcontext(): get_user_ex(), set_user_gs()
ia32_restore_sigcontext(): get_user_ex()

So at least get_user_try/get_user_ex/get_user_catch should be killable.
The other side...
save_v86_state(): put_user_ex()
setup_sigcontext(): put_user_ex()
__setup_rt_frame(): put_user_ex(), static_cpu_has()
another one in __setup_rt_frame(): put_user_ex()
x32_setup_rt_frame(): put_user_ex()
ia32_setup_sigcontext(): put_user_ex()
ia32_setup_frame(): put_user_ex()
another one in ia32_setup_frame(): put_user_ex(), static_cpu_has()

IDGI... Is static_cpu_has() not allowed in there? Looks like it's all inlines
and doesn't do any potentially risky memory accesses... What am I missing?

As for the try/catch model... How about
if (!user_access_begin())
sod off
...
unsafe_get_user(..., l);
...
unsafe_get_user_nojump();
...
unsafe_get_user_nojump();
...
if (user_access_did_fail())
goto l;

user_access_end()
...
return 0;
l:
...
user_access_end()
return -EFAULT;

making it clear that we are delaying the check for failures until it's
more convenient. And *not* trying to trick C parser into enforcing
anything - let objtool do it and to hell with do { and } while (0) in
magic macros. Could be mixed with the normal unsafe_..._user() without
any problems, AFAICS...

2019-10-11 00:32:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Thu, Oct 10, 2019 at 5:11 PM Al Viro <[email protected]> wrote:
>
> On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote:
>
> > But I've not gotten around to rewriting those disgusting sequences to
> > the unsafe_get/put_user() model. I did look at it, and it requires
> > some changes exactly *because* the _ex() functions are broken and
> > continue, but also because the current code ends up also doing other
> > things inside the try/catch region that you're not supposed to do in a
> > user_access_begin/end() region .
>
> Hmm... Which one was that? AFAICS, we have
> do_sys_vm86: only get_user_ex()
> restore_sigcontext(): get_user_ex(), set_user_gs()
> ia32_restore_sigcontext(): get_user_ex()

Try this patch.

It works fine (well, it worked fine the lastr time I tried this, I
might have screwed something up just now: I re-created the patch since
I hadn't saved it).

It's nice and clean, and does

1 file changed, 9 insertions(+), 91 deletions(-)

by just deleting all the nasty *_ex() macros entirely, replacing them
with unsafe_get/put_user() calls.

And now those try/catch regions actually work like try/catch regions,
and a fault branches to the catch.

BUT.

It does change semantics, and you get warnings like

arch/x86/ia32/ia32_signal.c: In function ‘ia32_restore_sigcontext’:
arch/x86/ia32/ia32_signal.c:114:9: warning: ‘buf’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
114 | err |= fpu__restore_sig(buf, 1);
| ^~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/ia32/ia32_signal.c:64:27: warning: ‘ds’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
64 | unsigned int pre = (seg) | 3; \
| ^
arch/x86/ia32/ia32_signal.c:74:18: note: ‘ds’ was declared here
...
arch/x86/kernel/signal.c: In function ‘restore_sigcontext’:
arch/x86/kernel/signal.c:152:9: warning: ‘buf’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
152 | err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

because it's true: those things reall may not be initialized, because
the catch thing could have jumped out.

So the code actually needs to properly return the error early, or
initialize the segments that didn't get loaded to 0, or something.

And when I posted that, Luto said "just get rid of the get_user_ex()
entirely, instead of changing semantics of the existing ones to be
sane.

Which is probably right. There aren't that many.

I *thought* there were also cases of us doing some questionably things
inside the get_user_try sections, but those seem to have gotten fixed
already independently, so it's really just the "make try/catch really
try/catch" change that needs some editing of our current broken stuff
that depends on it not actually *catching* exceptions, but on just
continuing on to the next one.

Linus


Attachments:
patch.diff (5.34 kB)

2019-10-13 18:15:17

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Thu, Oct 10, 2019 at 05:31:13PM -0700, Linus Torvalds wrote:

> So the code actually needs to properly return the error early, or
> initialize the segments that didn't get loaded to 0, or something.
>
> And when I posted that, Luto said "just get rid of the get_user_ex()
> entirely, instead of changing semantics of the existing ones to be
> sane.
>
> Which is probably right. There aren't that many.
>
> I *thought* there were also cases of us doing some questionably things
> inside the get_user_try sections, but those seem to have gotten fixed
> already independently, so it's really just the "make try/catch really
> try/catch" change that needs some editing of our current broken stuff
> that depends on it not actually *catching* exceptions, but on just
> continuing on to the next one.

Umm... TBH, I wonder if we would be better off if restore_sigcontext()
(i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
entire[*] struct sigcontext into a local variable and then copied fields
to pt_regs... The thing is small enough for not blowing the stack (256
bytes max. and it's on a shallow stack) and big enough to make "fancy
memcpy + let the compiler think how to combine in-kernel copies"
potentially better than hardwired sequence of 64bit loads/stores...

[*] OK, sans ->reserved part in the very end on 64bit. 192 bytes to
copy.

Same for do_sys_vm86(), perhaps - we want regs/flags/cpu_type and
screen_bitmap there, i.e. the beginning of struct vm86plus_struct
and of struct vm86_struct... 24*32bit. IOW, 96-byte memcpy +
gcc-visible field-by-field copying vs. hardwired sequence of
32bit loads (with some 16bit ones thrown in, for extra fun) and
compiler told not to reorder anything.

And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
are the only get_user_ex() users anywhere...

2019-10-13 18:45:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 11:13 AM Al Viro <[email protected]> wrote:
>
> Umm... TBH, I wonder if we would be better off if restore_sigcontext()
> (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
> entire[*] struct sigcontext into a local variable and then copied fields
> to pt_regs...

Probably ok., We've generally tried to avoid state that big on the
stack, but you're right that it's shallow.

> Same for do_sys_vm86(), perhaps.
>
> And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
> are the only get_user_ex() users anywhere...

Yeah, that sounds like a solid strategy for getting rid of them.

Particularly since we can't really make get_user_ex() generate
particularly good code (at least for now).

Now, put_user_ex() is a different thing - converting it to
unsafe_put_user() actually does make it generate very good code - much
better than copying data twice.

Linus

2019-10-13 19:14:27

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 11:43:57AM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 11:13 AM Al Viro <[email protected]> wrote:
> >
> > Umm... TBH, I wonder if we would be better off if restore_sigcontext()
> > (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
> > entire[*] struct sigcontext into a local variable and then copied fields
> > to pt_regs...
>
> Probably ok., We've generally tried to avoid state that big on the
> stack, but you're right that it's shallow.
>
> > Same for do_sys_vm86(), perhaps.
> >
> > And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
> > are the only get_user_ex() users anywhere...
>
> Yeah, that sounds like a solid strategy for getting rid of them.
>
> Particularly since we can't really make get_user_ex() generate
> particularly good code (at least for now).
>
> Now, put_user_ex() is a different thing - converting it to
> unsafe_put_user() actually does make it generate very good code - much
> better than copying data twice.

No arguments re put_user_ex side of things... Below is a completely
untested patch for get_user_ex elimination (it seems to build, but that's
it); in any case, I would really like to see comments from x86 folks
before it goes anywhere.

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 1cee10091b9f..28a32ccc32de 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -46,59 +46,38 @@
#define get_user_seg(seg) ({ unsigned int v; savesegment(seg, v); v; })
#define set_user_seg(seg, v) loadsegment_##seg(v)

-#define COPY(x) { \
- get_user_ex(regs->x, &sc->x); \
-}
-
-#define GET_SEG(seg) ({ \
- unsigned short tmp; \
- get_user_ex(tmp, &sc->seg); \
- tmp; \
-})
+#define COPY(x) regs->x = sc.x

-#define COPY_SEG_CPL3(seg) do { \
- regs->seg = GET_SEG(seg) | 3; \
-} while (0)
+#define COPY_SEG_CPL3(seg) regs->seg = sc.seg | 3

#define RELOAD_SEG(seg) { \
- unsigned int pre = (seg) | 3; \
+ unsigned int pre = sc.seg | 3; \
unsigned int cur = get_user_seg(seg); \
if (pre != cur) \
set_user_seg(seg, pre); \
}

static int ia32_restore_sigcontext(struct pt_regs *regs,
- struct sigcontext_32 __user *sc)
+ struct sigcontext_32 __user *usc)
{
- unsigned int tmpflags, err = 0;
- u16 gs, fs, es, ds;
- void __user *buf;
- u32 tmp;
+ struct sigcontext_32 sc;

/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;

- get_user_try {
- gs = GET_SEG(gs);
- fs = GET_SEG(fs);
- ds = GET_SEG(ds);
- es = GET_SEG(es);
-
- COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
- COPY(dx); COPY(cx); COPY(ip); COPY(ax);
- /* Don't touch extended registers */
+ if (unlikely(__copy_from_user(&sc, usc, sizeof(sc))))
+ goto Efault;

- COPY_SEG_CPL3(cs);
- COPY_SEG_CPL3(ss);
+ COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
+ COPY(dx); COPY(cx); COPY(ip); COPY(ax);
+ /* Don't touch extended registers */

- get_user_ex(tmpflags, &sc->flags);
- regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
- /* disable syscall checks */
- regs->orig_ax = -1;
+ COPY_SEG_CPL3(cs);
+ COPY_SEG_CPL3(ss);

- get_user_ex(tmp, &sc->fpstate);
- buf = compat_ptr(tmp);
- } get_user_catch(err);
+ regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+ /* disable syscall checks */
+ regs->orig_ax = -1;

/*
* Reload fs and gs if they have changed in the signal
@@ -111,11 +90,15 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
RELOAD_SEG(ds);
RELOAD_SEG(es);

- err |= fpu__restore_sig(buf, 1);
+ if (unlikely(fpu__restore_sig(compat_ptr(sc.fpstate), 1)))
+ goto Efault;

force_iret();
+ return 0;

- return err;
+Efault:
+ force_iret();
+ return -EFAULT;
}

asmlinkage long sys32_sigreturn(void)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61d93f062a36..ac81f06f8358 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -335,12 +335,9 @@ do { \
"i" (errret), "0" (retval)); \
})

-#define __get_user_asm_ex_u64(x, ptr) (x) = __get_user_bad()
#else
#define __get_user_asm_u64(x, ptr, retval, errret) \
__get_user_asm(x, ptr, retval, "q", "", "=r", errret)
-#define __get_user_asm_ex_u64(x, ptr) \
- __get_user_asm_ex(x, ptr, "q", "", "=r")
#endif

#define __get_user_size(x, ptr, size, retval, errret) \
@@ -390,41 +387,6 @@ do { \
: "=r" (err), ltype(x) \
: "m" (__m(addr)), "i" (errret), "0" (err))

-/*
- * This doesn't do __uaccess_begin/end - the exception handling
- * around it must do that.
- */
-#define __get_user_size_ex(x, ptr, size) \
-do { \
- __chk_user_ptr(ptr); \
- switch (size) { \
- case 1: \
- __get_user_asm_ex(x, ptr, "b", "b", "=q"); \
- break; \
- case 2: \
- __get_user_asm_ex(x, ptr, "w", "w", "=r"); \
- break; \
- case 4: \
- __get_user_asm_ex(x, ptr, "l", "k", "=r"); \
- break; \
- case 8: \
- __get_user_asm_ex_u64(x, ptr); \
- break; \
- default: \
- (x) = __get_user_bad(); \
- } \
-} while (0)
-
-#define __get_user_asm_ex(x, addr, itype, rtype, ltype) \
- asm volatile("1: mov"itype" %1,%"rtype"0\n" \
- "2:\n" \
- ".section .fixup,\"ax\"\n" \
- "3:xor"itype" %"rtype"0,%"rtype"0\n" \
- " jmp 2b\n" \
- ".previous\n" \
- _ASM_EXTABLE_EX(1b, 3b) \
- : ltype(x) : "m" (__m(addr)))
-
#define __put_user_nocheck(x, ptr, size) \
({ \
__label__ __pu_label; \
@@ -552,22 +514,6 @@ struct __large_struct { unsigned long buf[100]; };
#define __put_user(x, ptr) \
__put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))

-/*
- * {get|put}_user_try and catch
- *
- * get_user_try {
- * get_user_ex(...);
- * } get_user_catch(err)
- */
-#define get_user_try uaccess_try_nospec
-#define get_user_catch(err) uaccess_catch(err)
-
-#define get_user_ex(x, ptr) do { \
- unsigned long __gue_val; \
- __get_user_size_ex((__gue_val), (ptr), (sizeof(*(ptr)))); \
- (x) = (__force __typeof__(*(ptr)))__gue_val; \
-} while (0)
-
#define put_user_try uaccess_try
#define put_user_catch(err) uaccess_catch(err)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193e158d..301d34b256c6 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -47,23 +47,9 @@
#include <asm/sigframe.h>
#include <asm/signal.h>

-#define COPY(x) do { \
- get_user_ex(regs->x, &sc->x); \
-} while (0)
-
-#define GET_SEG(seg) ({ \
- unsigned short tmp; \
- get_user_ex(tmp, &sc->seg); \
- tmp; \
-})
-
-#define COPY_SEG(seg) do { \
- regs->seg = GET_SEG(seg); \
-} while (0)
-
-#define COPY_SEG_CPL3(seg) do { \
- regs->seg = GET_SEG(seg) | 3; \
-} while (0)
+#define COPY(x) regs->x = sc.x
+#define COPY_SEG(seg) regs->seg = sc.seg
+#define COPY_SEG_CPL3(seg) regs->seg = sc.seg | 3

#ifdef CONFIG_X86_64
/*
@@ -95,50 +81,53 @@ static void force_valid_ss(struct pt_regs *regs)
#endif

static int restore_sigcontext(struct pt_regs *regs,
- struct sigcontext __user *sc,
+ struct sigcontext __user *usc,
unsigned long uc_flags)
{
- unsigned long buf_val;
void __user *buf;
- unsigned int tmpflags;
- unsigned int err = 0;
+ struct sigcontext sc;
+ enum {
+#ifdef CONFIG_X86_32
+ To_copy = sizeof(struct sigcontext),
+#else
+ To_copy = offsetof(struct sigcontext, reserved1),
+#endif
+ };

/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;

- get_user_try {
+ if (unlikely(__copy_from_user(&sc, usc, To_copy)))
+ goto Efault;

#ifdef CONFIG_X86_32
- set_user_gs(regs, GET_SEG(gs));
- COPY_SEG(fs);
- COPY_SEG(es);
- COPY_SEG(ds);
+ set_user_gs(regs, sc.gs);
+ COPY_SEG(fs);
+ COPY_SEG(es);
+ COPY_SEG(ds);
#endif /* CONFIG_X86_32 */

- COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
- COPY(dx); COPY(cx); COPY(ip); COPY(ax);
+ COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
+ COPY(dx); COPY(cx); COPY(ip); COPY(ax);

#ifdef CONFIG_X86_64
- COPY(r8);
- COPY(r9);
- COPY(r10);
- COPY(r11);
- COPY(r12);
- COPY(r13);
- COPY(r14);
- COPY(r15);
+ COPY(r8);
+ COPY(r9);
+ COPY(r10);
+ COPY(r11);
+ COPY(r12);
+ COPY(r13);
+ COPY(r14);
+ COPY(r15);
#endif /* CONFIG_X86_64 */

- COPY_SEG_CPL3(cs);
- COPY_SEG_CPL3(ss);
+ COPY_SEG_CPL3(cs);
+ COPY_SEG_CPL3(ss);

- get_user_ex(tmpflags, &sc->flags);
- regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
- regs->orig_ax = -1; /* disable syscall checks */
+ regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+ regs->orig_ax = -1; /* disable syscall checks */

- get_user_ex(buf_val, &sc->fpstate);
- buf = (void __user *)buf_val;
- } get_user_catch(err);
+ buf = (void __user *)sc.fpstate;

#ifdef CONFIG_X86_64
/*
@@ -149,11 +138,14 @@ static int restore_sigcontext(struct pt_regs *regs,
force_valid_ss(regs);
#endif

- err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
-
+ if (unlikely(fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32))))
+ goto Efault;
force_iret();
+ return 0;

- return err;
+Efault:
+ force_iret();
+ return -EFAULT;
}

int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index a76c12b38e92..2b5183f8eb48 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -242,6 +242,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
struct vm86 *vm86 = tsk->thread.vm86;
struct kernel_vm86_regs vm86regs;
struct pt_regs *regs = current_pt_regs();
+ struct vm86_struct v;
unsigned long err = 0;

err = security_mmap_addr(0);
@@ -283,34 +284,32 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
sizeof(struct vm86plus_struct)))
return -EFAULT;

+ if (unlikely(__copy_from_user(&v, user_vm86,
+ offsetof(struct vm86_struct, int_revectored))))
+ return -EFAULT;
+
memset(&vm86regs, 0, sizeof(vm86regs));
- get_user_try {
- unsigned short seg;
- get_user_ex(vm86regs.pt.bx, &user_vm86->regs.ebx);
- get_user_ex(vm86regs.pt.cx, &user_vm86->regs.ecx);
- get_user_ex(vm86regs.pt.dx, &user_vm86->regs.edx);
- get_user_ex(vm86regs.pt.si, &user_vm86->regs.esi);
- get_user_ex(vm86regs.pt.di, &user_vm86->regs.edi);
- get_user_ex(vm86regs.pt.bp, &user_vm86->regs.ebp);
- get_user_ex(vm86regs.pt.ax, &user_vm86->regs.eax);
- get_user_ex(vm86regs.pt.ip, &user_vm86->regs.eip);
- get_user_ex(seg, &user_vm86->regs.cs);
- vm86regs.pt.cs = seg;
- get_user_ex(vm86regs.pt.flags, &user_vm86->regs.eflags);
- get_user_ex(vm86regs.pt.sp, &user_vm86->regs.esp);
- get_user_ex(seg, &user_vm86->regs.ss);
- vm86regs.pt.ss = seg;
- get_user_ex(vm86regs.es, &user_vm86->regs.es);
- get_user_ex(vm86regs.ds, &user_vm86->regs.ds);
- get_user_ex(vm86regs.fs, &user_vm86->regs.fs);
- get_user_ex(vm86regs.gs, &user_vm86->regs.gs);
-
- get_user_ex(vm86->flags, &user_vm86->flags);
- get_user_ex(vm86->screen_bitmap, &user_vm86->screen_bitmap);
- get_user_ex(vm86->cpu_type, &user_vm86->cpu_type);
- } get_user_catch(err);
- if (err)
- return err;
+
+ vm86regs.pt.bx = v.regs.ebx;
+ vm86regs.pt.cx = v.regs.ecx;
+ vm86regs.pt.dx = v.regs.edx;
+ vm86regs.pt.si = v.regs.esi;
+ vm86regs.pt.di = v.regs.edi;
+ vm86regs.pt.bp = v.regs.ebp;
+ vm86regs.pt.ax = v.regs.eax;
+ vm86regs.pt.ip = v.regs.eip;
+ vm86regs.pt.cs = v.regs.cs;
+ vm86regs.pt.flags = v.regs.eflags;
+ vm86regs.pt.sp = v.regs.esp;
+ vm86regs.pt.ss = v.regs.ss;
+ vm86regs.es = v.regs.es;
+ vm86regs.ds = v.regs.ds;
+ vm86regs.fs = v.regs.fs;
+ vm86regs.gs = v.regs.gs;
+
+ vm86->flags = v.flags;
+ vm86->screen_bitmap = v.screen_bitmap;
+ vm86->cpu_type = v.cpu_type;

if (copy_from_user(&vm86->int_revectored,
&user_vm86->int_revectored,

2019-10-13 19:25:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 12:10 PM Al Viro <[email protected]> wrote:
>
> No arguments re put_user_ex side of things... Below is a completely
> untested patch for get_user_ex elimination (it seems to build, but that's
> it); in any case, I would really like to see comments from x86 folks
> before it goes anywhere.

Please don't do this:

> + if (unlikely(__copy_from_user(&sc, usc, sizeof(sc))))
> + goto Efault;

Why would you use __copy_from_user()? Just don't.

> + if (unlikely(__copy_from_user(&v, user_vm86,
> + offsetof(struct vm86_struct, int_revectored))))

Same here.

There's no excuse for __copy_from_user().

Linus

2019-10-13 20:02:28

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 12:10 PM Al Viro <[email protected]> wrote:
> >
> > No arguments re put_user_ex side of things... Below is a completely
> > untested patch for get_user_ex elimination (it seems to build, but that's
> > it); in any case, I would really like to see comments from x86 folks
> > before it goes anywhere.
>
> Please don't do this:
>
> > + if (unlikely(__copy_from_user(&sc, usc, sizeof(sc))))
> > + goto Efault;
>
> Why would you use __copy_from_user()? Just don't.
>
> > + if (unlikely(__copy_from_user(&v, user_vm86,
> > + offsetof(struct vm86_struct, int_revectored))))
>
> Same here.
>
> There's no excuse for __copy_from_user().

Probably... Said that, vm86 one is preceded by
if (!access_ok(user_vm86, plus ?
sizeof(struct vm86_struct) :
sizeof(struct vm86plus_struct)))
return -EFAULT;
so I didn't want to bother. We'll need to eliminate most of
access_ok() anyway, and I figured that conversion to plain copy_from_user()
would go there as well.

Again, this is not a patch submission - just an illustration of what I meant
re getting rid of get_user_ex(). IOW, the whole thing is still in the
plotting stage.

Re plotting: how strongly would you object against passing the range to
user_access_end()? Powerpc folks have a very close analogue of stac/clac,
currently buried inside their __get_user()/__put_user()/etc. - the same
places where x86 does, including futex.h and friends.

And there it's even costlier than on x86. It would obviously be nice
to lift it at least out of unsafe_get_user()/unsafe_put_user() and
move into user_access_begin()/user_access_end(); unfortunately, in
one subarchitecture they really want it the range on the user_access_end()
side as well. That's obviously not fatal (they can bloody well save those
into thread_info at user_access_begin()), but right now we have relatively
few user_access_end() callers, so the interface changes are still possible.

Other architectures with similar stuff are riscv (no arguments, same
as for stac/clac), arm (uaccess_save_and_enable() on the way in,
return value passed to uaccess_restore() on the way out) and s390
(similar to arm, but there it's needed only to deal with nesting,
and I'm not sure it actually can happen).

It would be nice to settle the API while there are not too many users
outside of arch/x86; changing it later will be a PITA and we definitely
have architectures that do potentially costly things around the userland
memory access; user_access_begin()/user_access_end() is in the right
place to try and see if they fit there...

2019-10-13 20:27:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 12:59 PM Al Viro <[email protected]> wrote:
>
> Re plotting: how strongly would you object against passing the range to
> user_access_end()? Powerpc folks have a very close analogue of stac/clac,
> currently buried inside their __get_user()/__put_user()/etc. - the same
> places where x86 does, including futex.h and friends.
>
> And there it's even costlier than on x86. It would obviously be nice
> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
> move into user_access_begin()/user_access_end(); unfortunately, in
> one subarchitecture they really want it the range on the user_access_end()
> side as well.

Hmm. I'm ok with that.

Do they want the actual range, or would it prefer some kind of opaque
cookie that user_access_begin() returns (where 0 would mean "failure"
of course)?

I'm thinking like a local_irq_save/restore thing, which might be the
case on yet other architectures.

Linus

2019-10-15 05:37:15

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

Linus Torvalds <[email protected]> writes:
> On Sun, Oct 13, 2019 at 12:59 PM Al Viro <[email protected]> wrote:
>> Re plotting: how strongly would you object against passing the range to
>> user_access_end()? Powerpc folks have a very close analogue of stac/clac,
>> currently buried inside their __get_user()/__put_user()/etc. - the same
>> places where x86 does, including futex.h and friends.
>>
>> And there it's even costlier than on x86. It would obviously be nice
>> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
>> move into user_access_begin()/user_access_end(); unfortunately, in
>> one subarchitecture they really want it the range on the user_access_end()
>> side as well.
>
> Hmm. I'm ok with that.
>
> Do they want the actual range, or would it prefer some kind of opaque
> cookie that user_access_begin() returns (where 0 would mean "failure"
> of course)?

The code does want the actual range, or at least the range rounded to a
segment boundary (256MB).

But it can get that already from a value it stashes in current->thread,
it was just more natural to pass the addr/size with the way the code is
currently structured.

It seems to generate slightly better code to pass addr/size vs loading
it from current->thread, but it's probably in the noise vs everything
else that's going on.

So a cookie would work fine, we could return the encoded addr/size in
the cookie and that might generate better code than loading it back from
current->thread. Equally we could just use the value in current->thread
and not have any cookie at all.

cheers

2019-10-15 21:23:28

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

[futex folks and linux-arch Cc'd]

On Sun, Oct 13, 2019 at 08:59:49PM +0100, Al Viro wrote:

> Re plotting: how strongly would you object against passing the range to
> user_access_end()? Powerpc folks have a very close analogue of stac/clac,
> currently buried inside their __get_user()/__put_user()/etc. - the same
> places where x86 does, including futex.h and friends.
>
> And there it's even costlier than on x86. It would obviously be nice
> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
> move into user_access_begin()/user_access_end(); unfortunately, in
> one subarchitecture they really want it the range on the user_access_end()
> side as well. That's obviously not fatal (they can bloody well save those
> into thread_info at user_access_begin()), but right now we have relatively
> few user_access_end() callers, so the interface changes are still possible.
>
> Other architectures with similar stuff are riscv (no arguments, same
> as for stac/clac), arm (uaccess_save_and_enable() on the way in,
> return value passed to uaccess_restore() on the way out) and s390
> (similar to arm, but there it's needed only to deal with nesting,
> and I'm not sure it actually can happen).
>
> It would be nice to settle the API while there are not too many users
> outside of arch/x86; changing it later will be a PITA and we definitely
> have architectures that do potentially costly things around the userland
> memory access; user_access_begin()/user_access_end() is in the right
> place to try and see if they fit there...

Another question: right now we have
if (!access_ok(uaddr, sizeof(u32)))
return -EFAULT;

ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
if (ret)
return ret;
in kernel/futex.c. Would there be any objections to moving access_ok()
inside the instances and moving pagefault_disable()/pagefault_enable() outside?

Reasons:
* on x86 that would allow folding access_ok() with STAC into
user_access_begin(). The same would be doable on other usual suspects
(arm, arm64, ppc, riscv, s390), bringing access_ok() next to their
STAC counterparts.
* pagefault_disable()/pagefault_enable() pair is universal on
all architectures, really meant to by the nature of the beast and
lifting it into kernel/futex.c would get the same situation as with
futex_atomic_cmpxchg_inatomic(). Which also does access_ok() inside
the primitive (also foldable into user_access_begin(), at that).
* access_ok() would be closer to actual memory access (and
out of the generic code).

Comments?

2019-10-15 21:31:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 15, 2019 at 11:08 AM Al Viro <[email protected]> wrote:
>
> Another question: right now we have
> if (!access_ok(uaddr, sizeof(u32)))
> return -EFAULT;
>
> ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
> if (ret)
> return ret;
> in kernel/futex.c. Would there be any objections to moving access_ok()
> inside the instances and moving pagefault_disable()/pagefault_enable() outside?

I think we should remove all the "atomic" versions, and just make the
rule be that if you want atomic, you surround it with
pagefault_disable()/pagefault_enable().

That covers not just the futex ops (where "atomic" is actually
somewhat ambiguous - the ops themselves are atomic too, so the naming
might stay, although arguably the "futex" part makes that pointless
too), but also copy_to_user_inatomic() and the powerpc version of
__get_user_inatomic().

So we'd aim to get rid of all the "inatomic" ones entirely.

Same ultimately probably goes for the NMI versions. We should just
make it be a rule that we can use all of the user access functions
with pagefault_{dis,en}able() around them, and they'll be "safe" to
use in atomic context.

One issue with the NMI versions is that they actually want to avoid
the current value of set_fs(). So copy_from_user_nmi() (at least on
x86) is special in that it does

if (__range_not_ok(from, n, TASK_SIZE))
return n;

instead of access_ok() because of that issue.

NMI also has some other issues (nmi_uaccess_okay() on x86, at least),
but those *probably* could be handled at page fault time instead.

Anyway, NMI is so special that I'd suggest leaving it for later, but
the non-NMI atomic accesses I would suggest you clean up at the same
time.

I think the *only* reason we have the "inatomic()" versions is that
the regular ones do that "might_fault()" testing unconditionally, and
might_fault() _used_ to be just a might_sleep() - so it's not about
functionality per se, it's about "we have this sanity check that we
need to undo".

We've already made "might_fault()" look at pagefault_disabled(), so I
think a lot of the reasons for inatomic are entirely historical.

Linus

2019-10-16 02:25:38

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 15, 2019 at 12:00:34PM -0700, Linus Torvalds wrote:
> On Tue, Oct 15, 2019 at 11:08 AM Al Viro <[email protected]> wrote:
> >
> > Another question: right now we have
> > if (!access_ok(uaddr, sizeof(u32)))
> > return -EFAULT;
> >
> > ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
> > if (ret)
> > return ret;
> > in kernel/futex.c. Would there be any objections to moving access_ok()
> > inside the instances and moving pagefault_disable()/pagefault_enable() outside?
>
> I think we should remove all the "atomic" versions, and just make the
> rule be that if you want atomic, you surround it with
> pagefault_disable()/pagefault_enable().

Umm... I thought about that, but ended up with "it documents the intent" -
pagefault_disable() might be implicit (e.g. done by kmap_atomic()) or
several levels up the call chain. Not sure.

> That covers not just the futex ops (where "atomic" is actually
> somewhat ambiguous - the ops themselves are atomic too, so the naming
> might stay, although arguably the "futex" part makes that pointless
> too), but also copy_to_user_inatomic() and the powerpc version of
> __get_user_inatomic().

Eh? copy_to_user_inatomic() doesn't exist; __copy_to_user_inatomic()
does, but...

arch/mips/kernel/unaligned.c:1307: res = __copy_to_user_inatomic(addr, fpr, sizeof(*fpr));
drivers/gpu/drm/i915/i915_gem.c:313: unwritten = __copy_to_user_inatomic(user_data,
lib/test_kasan.c:510: unused = __copy_to_user_inatomic(usermem, kmem, size + 1);
mm/maccess.c:98: ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);

these are all callers it has left anywhere and I'm certainly going to kill it.
Now, __copy_from_user_inatomic() has a lot more callers left... Frankly,
the messier part of API is the nocache side of things. Consider e.g. this:
/* platform specific: cacheless copy */
static void cacheless_memcpy(void *dst, void *src, size_t n)
{
/*
* Use the only available X64 cacheless copy. Add a __user cast
* to quiet sparse. The src agument is already in the kernel so
* there are no security issues. The extra fault recovery machinery
* is not invoked.
*/
__copy_user_nocache(dst, (void __user *)src, n, 0);
}
or this
static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
{
#ifdef ARCH_HAS_NOCACHE_UACCESS
/*
* Using non-temporal mov to improve performance on non-cached
* writes, even though we aren't actually copying from user space.
*/
__copy_from_user_inatomic_nocache(offset, entry->buf, entry->len);
#else
memcpy_toio(offset, entry->buf, entry->len);
#endif

/* Ensure that the data is fully copied out before setting the flags */
wmb();

ntb_tx_copy_callback(entry, NULL);
}
"user" part is bollocks in both cases; moreover, I really wonder about that
ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment*
and it just so happens that ..._toio() doesn't require anything special on
x86. Have e.g. arm grow nocache stuff and the things will suddenly break,
won't they?

2019-10-16 03:38:13

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Tue, Oct 15, 2019 at 08:40:12PM +0100, Al Viro wrote:
> or this
> static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
> {
> #ifdef ARCH_HAS_NOCACHE_UACCESS
> /*
> * Using non-temporal mov to improve performance on non-cached
> * writes, even though we aren't actually copying from user space.
> */
> __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len);
> #else
> memcpy_toio(offset, entry->buf, entry->len);
> #endif
>
> /* Ensure that the data is fully copied out before setting the flags */
> wmb();
>
> ntb_tx_copy_callback(entry, NULL);
> }
> "user" part is bollocks in both cases; moreover, I really wonder about that
> ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment*
> and it just so happens that ..._toio() doesn't require anything special on
> x86. Have e.g. arm grow nocache stuff and the things will suddenly break,
> won't they?

Incidentally, there are two callers of __copy_from_user_inatomic_nocache() in
generic code:
lib/iov_iter.c:792: __copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len,
lib/iov_iter.c:849: if (__copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len,

Neither is done under under pagefault_disable(), AFAICS. This one
drivers/gpu/drm/qxl/qxl_ioctl.c:189: unwritten = __copy_from_user_inatomic_nocache
probably is - it has something called qxl_bo_kmap_atomic_page() called just prior,
which would seem to imply kmap_atomic() somewhere in it.
The same goes for
drivers/gpu/drm/i915/i915_gem.c:500: unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset,

So we have?5 callers anywhere. Two are not "inatomic" in any sense; source is
in userspace and we want nocache behaviour. Two _are_ done into a page that
had been fed through kmap_atomic(); the source is, again, in userland. And
the last one is complete BS - it wants memcpy_toio_nocache() and abuses this
thing.

Incidentally, in case of fault i915 caller ends up unmapping the page,
mapping it non-atomic (with kmap?) and doing plain copy_from_user(),
nocache be damned. qxl, OTOH, whines and fails all the way to userland...

2019-10-16 15:19:29

by Al Viro

[permalink] [raw]
Subject: [RFC] change of calling conventions for arch_futex_atomic_op_inuser()

On Tue, Oct 15, 2019 at 07:08:46PM +0100, Al Viro wrote:
> [futex folks and linux-arch Cc'd]

> Another question: right now we have
> if (!access_ok(uaddr, sizeof(u32)))
> return -EFAULT;
>
> ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
> if (ret)
> return ret;
> in kernel/futex.c. Would there be any objections to moving access_ok()
> inside the instances and moving pagefault_disable()/pagefault_enable() outside?
>
> Reasons:
> * on x86 that would allow folding access_ok() with STAC into
> user_access_begin(). The same would be doable on other usual suspects
> (arm, arm64, ppc, riscv, s390), bringing access_ok() next to their
> STAC counterparts.
> * pagefault_disable()/pagefault_enable() pair is universal on
> all architectures, really meant to by the nature of the beast and
> lifting it into kernel/futex.c would get the same situation as with
> futex_atomic_cmpxchg_inatomic(). Which also does access_ok() inside
> the primitive (also foldable into user_access_begin(), at that).
> * access_ok() would be closer to actual memory access (and
> out of the generic code).
>
> Comments?

FWIW, completely untested patch follows; just the (semimechanical) conversion
of calling conventions, no per-architecture followups included. Could futex
folks ACK/NAK that in principle?

commit 7babb6ad28cb3e80977fb6bd0405e3f81a943161
Author: Al Viro <[email protected]>
Date: Tue Oct 15 16:54:41 2019 -0400

arch_futex_atomic_op_inuser(): move access_ok() in and pagefault_disable() - out

Signed-off-by: Al Viro <[email protected]>

diff --git a/arch/alpha/include/asm/futex.h b/arch/alpha/include/asm/futex.h
index bfd3c01038f8..da67afd578fd 100644
--- a/arch/alpha/include/asm/futex.h
+++ b/arch/alpha/include/asm/futex.h
@@ -31,7 +31,8 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -53,8 +54,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/arc/include/asm/futex.h b/arch/arc/include/asm/futex.h
index 9d0d070e6c22..607d1c16d4dd 100644
--- a/arch/arc/include/asm/futex.h
+++ b/arch/arc/include/asm/futex.h
@@ -75,10 +75,12 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
{
int oldval = 0, ret;

+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
+
#ifndef CONFIG_ARC_HAS_LLSC
preempt_disable(); /* to guarantee atomic r-m-w of futex op */
#endif
- pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -101,7 +103,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
#ifndef CONFIG_ARC_HAS_LLSC
preempt_enable();
#endif
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 83c391b597d4..e133da303a98 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -134,10 +134,12 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret, tmp;

+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
+
#ifndef CONFIG_SMP
preempt_disable();
#endif
- pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -159,7 +161,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
#ifndef CONFIG_SMP
preempt_enable();
#endif
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 6cc26a127819..97f6a63810ec 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -48,7 +48,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
int oldval = 0, ret, tmp;
u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);

- pagefault_disable();
+ if (!access_ok(_uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -75,8 +76,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/hexagon/include/asm/futex.h b/arch/hexagon/include/asm/futex.h
index cb635216a732..8693dc5ae9ec 100644
--- a/arch/hexagon/include/asm/futex.h
+++ b/arch/hexagon/include/asm/futex.h
@@ -36,7 +36,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -62,8 +63,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/ia64/include/asm/futex.h b/arch/ia64/include/asm/futex.h
index 2e106d462196..1db26b432d8c 100644
--- a/arch/ia64/include/asm/futex.h
+++ b/arch/ia64/include/asm/futex.h
@@ -50,7 +50,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -74,8 +75,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/microblaze/include/asm/futex.h b/arch/microblaze/include/asm/futex.h
index 8c90357e5983..86131ed84c9a 100644
--- a/arch/microblaze/include/asm/futex.h
+++ b/arch/microblaze/include/asm/futex.h
@@ -34,7 +34,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -56,8 +57,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h
index b83b0397462d..86f224548651 100644
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -88,7 +88,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -115,8 +116,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/nds32/include/asm/futex.h b/arch/nds32/include/asm/futex.h
index 5213c65c2e0b..60b7ab74ed92 100644
--- a/arch/nds32/include/asm/futex.h
+++ b/arch/nds32/include/asm/futex.h
@@ -66,8 +66,9 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

- pagefault_disable();
switch (op) {
case FUTEX_OP_SET:
__futex_atomic_op("move %0, %3", ret, oldval, tmp, uaddr,
@@ -93,8 +94,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/openrisc/include/asm/futex.h b/arch/openrisc/include/asm/futex.h
index fe894e6331ae..865e9cd0d97b 100644
--- a/arch/openrisc/include/asm/futex.h
+++ b/arch/openrisc/include/asm/futex.h
@@ -35,7 +35,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -57,8 +58,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
index 50662b6cb605..6e2e4d10e3c8 100644
--- a/arch/parisc/include/asm/futex.h
+++ b/arch/parisc/include/asm/futex.h
@@ -40,11 +40,10 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
u32 tmp;

_futex_spin_lock_irqsave(uaddr, &flags);
- pagefault_disable();

ret = -EFAULT;
if (unlikely(get_user(oldval, uaddr) != 0))
- goto out_pagefault_enable;
+ goto out_unlock;

ret = 0;
tmp = oldval;
@@ -72,8 +71,7 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
if (ret == 0 && unlikely(put_user(tmp, uaddr) != 0))
ret = -EFAULT;

-out_pagefault_enable:
- pagefault_enable();
+out_unlock:
_futex_spin_unlock_irqrestore(uaddr, &flags);

if (!ret)
diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index eea28ca679db..d6e32b32f452 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,8 +35,9 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
{
int oldval = 0, ret;

+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
allow_write_to_user(uaddr, sizeof(*uaddr));
- pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -58,8 +59,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
-
*oval = oldval;

prevent_write_to_user(uaddr, sizeof(*uaddr));
diff --git a/arch/riscv/include/asm/futex.h b/arch/riscv/include/asm/futex.h
index 4ad6409c4647..84574acfb927 100644
--- a/arch/riscv/include/asm/futex.h
+++ b/arch/riscv/include/asm/futex.h
@@ -40,7 +40,8 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
{
int oldval = 0, ret = 0;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -67,8 +68,6 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/s390/include/asm/futex.h b/arch/s390/include/asm/futex.h
index 5e97a4353147..3c18a48baf44 100644
--- a/arch/s390/include/asm/futex.h
+++ b/arch/s390/include/asm/futex.h
@@ -28,8 +28,10 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
int oldval = 0, newval, ret;
mm_segment_t old_fs;

+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
+
old_fs = enable_sacf_uaccess();
- pagefault_disable();
switch (op) {
case FUTEX_OP_SET:
__futex_atomic_op("lr %2,%5\n",
@@ -54,7 +56,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
default:
ret = -ENOSYS;
}
- pagefault_enable();
disable_sacf_uaccess(old_fs);

if (!ret)
diff --git a/arch/sh/include/asm/futex.h b/arch/sh/include/asm/futex.h
index 3190ec89df81..b39cda09fb95 100644
--- a/arch/sh/include/asm/futex.h
+++ b/arch/sh/include/asm/futex.h
@@ -34,8 +34,6 @@ static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval,
u32 oldval, newval, prev;
int ret;

- pagefault_disable();
-
do {
ret = get_user(oldval, uaddr);

@@ -67,8 +65,6 @@ static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval,
ret = futex_atomic_cmpxchg_inatomic(&prev, uaddr, oldval, newval);
} while (!ret && prev != oldval);

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/sparc/include/asm/futex_64.h b/arch/sparc/include/asm/futex_64.h
index 0865ce77ec00..72de967318d7 100644
--- a/arch/sparc/include/asm/futex_64.h
+++ b/arch/sparc/include/asm/futex_64.h
@@ -38,8 +38,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
if (unlikely((((unsigned long) uaddr) & 0x3UL)))
return -EINVAL;

- pagefault_disable();
-
switch (op) {
case FUTEX_OP_SET:
__futex_cas_op("mov\t%4, %1", ret, oldval, uaddr, oparg);
@@ -60,8 +58,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/x86/include/asm/futex.h b/arch/x86/include/asm/futex.h
index 13c83fe97988..6bcd1c1486d9 100644
--- a/arch/x86/include/asm/futex.h
+++ b/arch/x86/include/asm/futex.h
@@ -47,7 +47,8 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
{
int oldval = 0, ret, tem;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -70,8 +71,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/arch/xtensa/include/asm/futex.h b/arch/xtensa/include/asm/futex.h
index 0c4457ca0a85..271cfcf8a841 100644
--- a/arch/xtensa/include/asm/futex.h
+++ b/arch/xtensa/include/asm/futex.h
@@ -72,7 +72,8 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
#if XCHAL_HAVE_S32C1I || XCHAL_HAVE_EXCLUSIVE
int oldval = 0, ret;

- pagefault_disable();
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;

switch (op) {
case FUTEX_OP_SET:
@@ -99,8 +100,6 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
ret = -ENOSYS;
}

- pagefault_enable();
-
if (!ret)
*oval = oldval;

diff --git a/include/asm-generic/futex.h b/include/asm-generic/futex.h
index 02970b11f71f..f4c3470480c7 100644
--- a/include/asm-generic/futex.h
+++ b/include/asm-generic/futex.h
@@ -34,7 +34,6 @@ arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr)
u32 tmp;

preempt_disable();
- pagefault_disable();

ret = -EFAULT;
if (unlikely(get_user(oldval, uaddr) != 0))
@@ -67,7 +66,6 @@ arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr)
ret = -EFAULT;

out_pagefault_enable:
- pagefault_enable();
preempt_enable();

if (ret == 0)
diff --git a/kernel/futex.c b/kernel/futex.c
index bd18f60e4c6c..2cc8a35109da 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1662,10 +1662,9 @@ static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
oparg = 1 << oparg;
}

- if (!access_ok(uaddr, sizeof(u32)))
- return -EFAULT;
-
+ pagefault_disable();
ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
+ pagefault_enable();
if (ret)
return ret;

2019-10-16 15:29:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC] change of calling conventions for arch_futex_atomic_op_inuser()

On Wed, 16 Oct 2019, Al Viro wrote:
> On Tue, Oct 15, 2019 at 07:08:46PM +0100, Al Viro wrote:
> > [futex folks and linux-arch Cc'd]
>
> > Another question: right now we have
> > if (!access_ok(uaddr, sizeof(u32)))
> > return -EFAULT;
> >
> > ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr);
> > if (ret)
> > return ret;
> > in kernel/futex.c. Would there be any objections to moving access_ok()
> > inside the instances and moving pagefault_disable()/pagefault_enable() outside?
> >
> > Reasons:
> > * on x86 that would allow folding access_ok() with STAC into
> > user_access_begin(). The same would be doable on other usual suspects
> > (arm, arm64, ppc, riscv, s390), bringing access_ok() next to their
> > STAC counterparts.
> > * pagefault_disable()/pagefault_enable() pair is universal on
> > all architectures, really meant to by the nature of the beast and
> > lifting it into kernel/futex.c would get the same situation as with
> > futex_atomic_cmpxchg_inatomic(). Which also does access_ok() inside
> > the primitive (also foldable into user_access_begin(), at that).
> > * access_ok() would be closer to actual memory access (and
> > out of the generic code).
> >
> > Comments?
>
> FWIW, completely untested patch follows; just the (semimechanical) conversion
> of calling conventions, no per-architecture followups included. Could futex
> folks ACK/NAK that in principle?

Makes sense and does not change any of the futex semantics. Go wild.

Thanks,

tglx

2019-10-17 13:36:38

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 12:10 PM Al Viro <[email protected]> wrote:
> >
> > No arguments re put_user_ex side of things... Below is a completely
> > untested patch for get_user_ex elimination (it seems to build, but that's
> > it); in any case, I would really like to see comments from x86 folks
> > before it goes anywhere.
>
> Please don't do this:
>
> > + if (unlikely(__copy_from_user(&sc, usc, sizeof(sc))))
> > + goto Efault;
>
> Why would you use __copy_from_user()? Just don't.
>
> > + if (unlikely(__copy_from_user(&v, user_vm86,
> > + offsetof(struct vm86_struct, int_revectored))))
>
> Same here.
>
> There's no excuse for __copy_from_user().

FWIW, callers of __copy_from_user() remaining in the generic code:

1) regset.h:user_regset_copyin(). Switch to copy_from_user(); the calling
conventions of regset ->set() (as well as the method name) are atrocious,
but there are too many instances to mix any work in that direction into
this series. Yes, nominally it's an inline, but IRL it's too large and
has many callers in the same file(s), so any optimizations of inlining
__copy_from_user() will be lost and there's more than enough work done
there to make access_ok() a noise. And in this case it doesn't pay
to try and lift user_access_begin() into the callers - the work done
between the calls is often too non-trivial to be done in such area.
The same goes for other regset.h stuff; eventually we might want to
try and come up with saner API, but that's a separate story.

2) default csum_partial_copy_from_user(). What we need to do is
turn it into default csum_and_copy_from_user(). This
#ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
static inline
__wsum csum_and_copy_from_user (const void __user *src, void *dst,
int len, __wsum sum, int *err_ptr)
{
if (access_ok(src, len))
return csum_partial_copy_from_user(src, dst, len, sum, err_ptr);

if (len)
*err_ptr = -EFAULT;

return sum;
}
#endif
in checksum.h is the only thing that calls that sucker and we can bloody
well combine them and make the users of lib/checksum.h define
_HAVE_ARCH_COPY_AND_CSUM_FROM_USER. That puts us reasonably close
to having _HAVE_ARCH_COPY_AND_CSUM_FROM_USER unconditional and in any
case, __copy_from_user() in lib/checksum.h turns into copy_from_user().

3) firewire ioctl_queue_iso(). Convert to copy_from_user(), lose the
access_ok() before the loop. Definitely not an unsafe_... situation
(we call fw_iso_context_queue() after each chunk; _not_ something
we want under user_access_begin()/user_access_end()) and it's really
not worth trying to save on access_ok() checks there.

4) pstore persistent_ram_update_user(). Obvious copy_from_user(); definitely
lose access_ok() in the caller (persistent_ram_write_user()), along with
the one in write_pmsg() (several calls back by the callchain).

5) test_kasan: lose the function, lose the tests...

6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise
(i.e. fold with immediately preceding kmalloc()s). sg_new_write() -
fold with access_ok() into copy_from_user() (for both call sites).
sg_write() - lose access_ok(), use copy_from_user() (both call sites)
and get_user() (instead of the solitary __get_user() there).

7) i915 ones are, frankly, terrifying. Consider e.g. this one:
relocs = kvmalloc_array(size, 1, GFP_KERNEL);
if (!relocs) {
err = -ENOMEM;
goto err;
}

/* copy_from_user is limited to < 4GiB */
copied = 0;
do {
unsigned int len =
min_t(u64, BIT_ULL(31), size - copied);

if (__copy_from_user((char *)relocs + copied,
(char __user *)urelocs + copied,
len))
goto end;

copied += len;
} while (copied < size);
Is that for real? Are they *really* trying to allocate and copy >2Gb of
userland data? That's eb_copy_relocations() and that crap is itself in
a loop. Sizes come from user-supplied data. WTF? It's some weird
kmemdup lookalike and I'd rather heard from maintainers of that thing
before doing anything with it.

8) vhost_copy_from_user(). Need comments from mst - it's been a while since I crawled
through that code and I'd need his ACK anyway. The logics with positioning of
access_ok() in there is non-trivial and I'm not sure how much of that serves
as early input validation and how much can be taken out and replaced by use of
place copy_from_user() and friends.

9) KVM. There I'm not sure that access_ok() would be the right thing to
do. kvm_is_error_hva() tends to serve as the range check in that and similar
places; it's not the same situation as with NMI, but...

And that's it - everything else is in arch/*. Looking at arch/x86, we have
* insanity in math_emu (unchecked return value, for example)
* a bunch sigframe-related code. Some want to use unsafe_...
(or raw_...) variant, some should probably go for copy_from_user().
FPU-related stuff is particularly interesting in that respect - there
we have several inline functions nearby that contain nothing but
stac + instruction + clac + exception handling. And in quite a few
cases it would've been cleaner to lift stac/clac into the callers, since
they combine nicely.
* regset_tls_set(): use copy_from_user().
* one in kvm walk_addr_generic stuff. If nothing else,
that one smells like __get_user() - we seem to be copying a single
PTE. And again, it's using kvm_is_error_hva().

2019-10-18 22:16:52

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCHES] drivers/scsi/sg.c uaccess cleanups/fixes

On Wed, Oct 16, 2019 at 09:25:40PM +0100, Al Viro wrote:

> FWIW, callers of __copy_from_user() remaining in the generic code:

> 6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise
> (i.e. fold with immediately preceding kmalloc()s). sg_new_write() -
> fold with access_ok() into copy_from_user() (for both call sites).
> sg_write() - lose access_ok(), use copy_from_user() (both call sites)
> and get_user() (instead of the solitary __get_user() there).

Turns out that there'd been outright redundant access_ok() calls (not
even warranted by __copy_...) *and* several __put_user()/__get_user()
with no checking of return value (access_ok() was there, handling of
unmapped addresses wasn't). The latter go back at least to 2.1.early...

I've got a series that presumably fixes and cleans the things up
in that area; it didn't get any serious testing (the kernel builds
and boots, smartctl works as well as it used to, but that's not
worth much - all it says is that SG_IO doesn't fail terribly;
I don't have any test setup for really working with /dev/sg*).

IOW, it needs more review and testing - this is _not_ a pull request.
It's in vfs.git#work.sg; individual patches are in followups.
Shortlog/diffstat:
Al Viro (8):
sg_ioctl(): fix copyout handling
sg_new_write(): replace access_ok() + __copy_from_user() with copy_from_user()
sg_write(): __get_user() can fail...
sg_read(): simplify reading ->pack_id of userland sg_io_hdr_t
sg_new_write(): don't bother with access_ok
sg_read(): get rid of access_ok()/__copy_..._user()
sg_write(): get rid of access_ok()/__copy_from_user()/__get_user()
SG_IO: get rid of access_ok()

drivers/scsi/sg.c | 98 ++++++++++++++++++++++++++++++++----------------------------------------------------------------
1 file changed, 32 insertions(+), 66 deletions(-)

2019-10-18 22:17:04

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 1/8] sg_ioctl(): fix copyout handling

From: Al Viro <[email protected]>

First of all, __put_user() can fail with access_ok() succeeding.
And access_ok() + __copy_to_user() is spelled copy_to_user()...

__put_user() *can* fail with access_ok() succeeding...

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 43 ++++++++++++++++---------------------------
1 file changed, 16 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index cce757506383..634460421ce4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -963,26 +963,21 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
case SG_GET_LOW_DMA:
return put_user((int) sdp->device->host->unchecked_isa_dma, ip);
case SG_GET_SCSI_ID:
- if (!access_ok(p, sizeof (sg_scsi_id_t)))
- return -EFAULT;
- else {
- sg_scsi_id_t __user *sg_idp = p;
+ {
+ sg_scsi_id_t v;

if (atomic_read(&sdp->detaching))
return -ENODEV;
- __put_user((int) sdp->device->host->host_no,
- &sg_idp->host_no);
- __put_user((int) sdp->device->channel,
- &sg_idp->channel);
- __put_user((int) sdp->device->id, &sg_idp->scsi_id);
- __put_user((int) sdp->device->lun, &sg_idp->lun);
- __put_user((int) sdp->device->type, &sg_idp->scsi_type);
- __put_user((short) sdp->device->host->cmd_per_lun,
- &sg_idp->h_cmd_per_lun);
- __put_user((short) sdp->device->queue_depth,
- &sg_idp->d_queue_depth);
- __put_user(0, &sg_idp->unused[0]);
- __put_user(0, &sg_idp->unused[1]);
+ memset(&v, 0, sizeof(v));
+ v.host_no = sdp->device->host->host_no;
+ v.channel = sdp->device->channel;
+ v.scsi_id = sdp->device->id;
+ v.lun = sdp->device->lun;
+ v.scsi_type = sdp->device->type;
+ v.h_cmd_per_lun = sdp->device->host->cmd_per_lun;
+ v.d_queue_depth = sdp->device->queue_depth;
+ if (copy_to_user(p, &v, sizeof(sg_scsi_id_t)))
+ return -EFAULT;
return 0;
}
case SG_SET_FORCE_PACK_ID:
@@ -992,20 +987,16 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
sfp->force_packid = val ? 1 : 0;
return 0;
case SG_GET_PACK_ID:
- if (!access_ok(ip, sizeof (int)))
- return -EFAULT;
read_lock_irqsave(&sfp->rq_list_lock, iflags);
list_for_each_entry(srp, &sfp->rq_list, entry) {
if ((1 == srp->done) && (!srp->sg_io_owned)) {
read_unlock_irqrestore(&sfp->rq_list_lock,
iflags);
- __put_user(srp->header.pack_id, ip);
- return 0;
+ return put_user(srp->header.pack_id, ip);
}
}
read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
- __put_user(-1, ip);
- return 0;
+ return put_user(-1, ip);
case SG_GET_NUM_WAITING:
read_lock_irqsave(&sfp->rq_list_lock, iflags);
val = 0;
@@ -1073,9 +1064,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
val = (sdp->device ? 1 : 0);
return put_user(val, ip);
case SG_GET_REQUEST_TABLE:
- if (!access_ok(p, SZ_SG_REQ_INFO * SG_MAX_QUEUE))
- return -EFAULT;
- else {
+ {
sg_req_info_t *rinfo;

rinfo = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
@@ -1085,7 +1074,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
read_lock_irqsave(&sfp->rq_list_lock, iflags);
sg_fill_request_table(sfp, rinfo);
read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
- result = __copy_to_user(p, rinfo,
+ result = copy_to_user(p, rinfo,
SZ_SG_REQ_INFO * SG_MAX_QUEUE);
result = result ? -EFAULT : 0;
kfree(rinfo);
--
2.11.0

2019-10-18 22:17:07

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 3/8] sg_write(): __get_user() can fail...

From: Al Viro <[email protected]>

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 026628aa556d..4c62237cdf37 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -640,13 +640,15 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
if (count < (SZ_SG_HEADER + 6))
return -EIO; /* The minimum scsi command length is 6 bytes. */

+ buf += SZ_SG_HEADER;
+ if (__get_user(opcode, buf))
+ return -EFAULT;
+
if (!(srp = sg_add_request(sfp))) {
SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sdp,
"sg_write: queue full\n"));
return -EDOM;
}
- buf += SZ_SG_HEADER;
- __get_user(opcode, buf);
mutex_lock(&sfp->f_mutex);
if (sfp->next_cmd_len > 0) {
cmd_size = sfp->next_cmd_len;
--
2.11.0

2019-10-18 22:17:08

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 6/8] sg_read(): get rid of access_ok()/__copy_..._user()

From: Al Viro <[email protected]>

Use copy_..._user() instead, both in sg_read() and in sg_read_oxfer().
And don't open-code memdup_user()...

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3702f66493f7..9f6534a025cd 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -429,16 +429,10 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
"sg_read: count=%d\n", (int) count));

- if (!access_ok(buf, count))
- return -EFAULT;
if (sfp->force_packid && (count >= SZ_SG_HEADER)) {
- old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
- if (!old_hdr)
- return -ENOMEM;
- if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
- retval = -EFAULT;
- goto free_old_hdr;
- }
+ old_hdr = memdup_user(buf, SZ_SG_HEADER);
+ if (IS_ERR(old_hdr))
+ return PTR_ERR(old_hdr);
if (old_hdr->reply_len < 0) {
if (count >= SZ_SG_IO_HDR) {
sg_io_hdr_t __user *p = (void __user *)buf;
@@ -529,7 +523,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)

/* Now copy the result back to the user buffer. */
if (count >= SZ_SG_HEADER) {
- if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
+ if (copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
retval = -EFAULT;
goto free_old_hdr;
}
@@ -1960,12 +1954,12 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
num = 1 << (PAGE_SHIFT + schp->page_order);
for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
if (num > num_read_xfer) {
- if (__copy_to_user(outp, page_address(schp->pages[k]),
+ if (copy_to_user(outp, page_address(schp->pages[k]),
num_read_xfer))
return -EFAULT;
break;
} else {
- if (__copy_to_user(outp, page_address(schp->pages[k]),
+ if (copy_to_user(outp, page_address(schp->pages[k]),
num))
return -EFAULT;
num_read_xfer -= num;
--
2.11.0

2019-10-18 22:17:10

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 8/8] SG_IO: get rid of access_ok()

From: Al Viro <[email protected]>

simply not needed there - neither sg_new_read() nor sg_new_write() need
it.

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index f3d090b93cdf..0940abd91d3c 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -896,8 +896,6 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
return -ENODEV;
if (!scsi_block_when_processing_errors(sdp->device))
return -ENXIO;
- if (!access_ok(p, SZ_SG_IO_HDR))
- return -EFAULT;
result = sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
1, read_only, 1, &srp);
if (result < 0)
--
2.11.0

2019-10-18 22:17:16

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 4/8] sg_read(): simplify reading ->pack_id of userland sg_io_hdr_t

From: Al Viro <[email protected]>

We don't need to allocate a temporary buffer and read the entire
structure in it, only to fetch a single field and free what we'd
allocated. Just use get_user() and be done with it...

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 4c62237cdf37..2d30e89075e9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -441,17 +441,8 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
}
if (old_hdr->reply_len < 0) {
if (count >= SZ_SG_IO_HDR) {
- sg_io_hdr_t *new_hdr;
- new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
- if (!new_hdr) {
- retval = -ENOMEM;
- goto free_old_hdr;
- }
- retval =__copy_from_user
- (new_hdr, buf, SZ_SG_IO_HDR);
- req_pack_id = new_hdr->pack_id;
- kfree(new_hdr);
- if (retval) {
+ sg_io_hdr_t __user *p = (void __user *)buf;
+ if (get_user(req_pack_id, &p->pack_id)) {
retval = -EFAULT;
goto free_old_hdr;
}
--
2.11.0

2019-10-18 22:17:18

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 5/8] sg_new_write(): don't bother with access_ok

From: Al Viro <[email protected]>

... just use copy_from_user(). We copy only SZ_SG_IO_HDR bytes,
so that would, strictly speaking, loosen the check. However,
for call chains via ->write() the caller has actually checked
the entire range and SG_IO passes exactly SZ_SG_IO_HDR for count.
So no visible behaviour changes happen if we check only what
we really need for copyin.

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 2d30e89075e9..3702f66493f7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -717,8 +717,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,

if (count < SZ_SG_IO_HDR)
return -EINVAL;
- if (!access_ok(buf, count))
- return -EFAULT; /* protects following copy_from_user()s + get_user()s */

sfp->cmd_q = 1; /* when sg_io_hdr seen, set command queuing on */
if (!(srp = sg_add_request(sfp))) {
@@ -728,7 +726,7 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
}
srp->sg_io_owned = sg_io_owned;
hp = &srp->header;
- if (__copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
+ if (copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
sg_remove_request(sfp, srp);
return -EFAULT;
}
--
2.11.0

2019-10-18 22:19:15

by Al Viro

[permalink] [raw]
Subject: [RFC PATCH 7/8] sg_write(): get rid of access_ok()/__copy_from_user()/__get_user()

From: Al Viro <[email protected]>

Just use plain copy_from_user() and get_user(). Note that while
a buf-derived pointer gets stored into ->dxferp, all places that
actually use the resulting value feed it either to import_iovec()
or to import_single_range(), and both will do validation.

Signed-off-by: Al Viro <[email protected]>
---
drivers/scsi/sg.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 9f6534a025cd..f3d090b93cdf 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -612,11 +612,9 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
scsi_block_when_processing_errors(sdp->device)))
return -ENXIO;

- if (!access_ok(buf, count))
- return -EFAULT; /* protects following copy_from_user()s + get_user()s */
if (count < SZ_SG_HEADER)
return -EIO;
- if (__copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
+ if (copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
return -EFAULT;
blocking = !(filp->f_flags & O_NONBLOCK);
if (old_hdr.reply_len < 0)
@@ -626,7 +624,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
return -EIO; /* The minimum scsi command length is 6 bytes. */

buf += SZ_SG_HEADER;
- if (__get_user(opcode, buf))
+ if (get_user(opcode, buf))
return -EFAULT;

if (!(srp = sg_add_request(sfp))) {
@@ -676,7 +674,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
hp->flags = input_size; /* structure abuse ... */
hp->pack_id = old_hdr.pack_id;
hp->usr_ptr = NULL;
- if (__copy_from_user(cmnd, buf, cmd_size))
+ if (copy_from_user(cmnd, buf, cmd_size))
return -EFAULT;
/*
* SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
--
2.11.0

2019-10-18 22:21:05

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [RFC][PATCHES] drivers/scsi/sg.c uaccess cleanups/fixes

On 2019-10-17 9:36 p.m., Al Viro wrote:
> On Wed, Oct 16, 2019 at 09:25:40PM +0100, Al Viro wrote:
>
>> FWIW, callers of __copy_from_user() remaining in the generic code:
>
>> 6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise
>> (i.e. fold with immediately preceding kmalloc()s). sg_new_write() -
>> fold with access_ok() into copy_from_user() (for both call sites).
>> sg_write() - lose access_ok(), use copy_from_user() (both call sites)
>> and get_user() (instead of the solitary __get_user() there).
>
> Turns out that there'd been outright redundant access_ok() calls (not
> even warranted by __copy_...) *and* several __put_user()/__get_user()
> with no checking of return value (access_ok() was there, handling of
> unmapped addresses wasn't). The latter go back at least to 2.1.early...
>
> I've got a series that presumably fixes and cleans the things up
> in that area; it didn't get any serious testing (the kernel builds
> and boots, smartctl works as well as it used to, but that's not
> worth much - all it says is that SG_IO doesn't fail terribly;
> I don't have any test setup for really working with /dev/sg*).
>
> IOW, it needs more review and testing - this is _not_ a pull request.
> It's in vfs.git#work.sg; individual patches are in followups.
> Shortlog/diffstat:
> Al Viro (8):
> sg_ioctl(): fix copyout handling
> sg_new_write(): replace access_ok() + __copy_from_user() with copy_from_user()
> sg_write(): __get_user() can fail...
> sg_read(): simplify reading ->pack_id of userland sg_io_hdr_t
> sg_new_write(): don't bother with access_ok
> sg_read(): get rid of access_ok()/__copy_..._user()
> sg_write(): get rid of access_ok()/__copy_from_user()/__get_user()
> SG_IO: get rid of access_ok()
>
> drivers/scsi/sg.c | 98 ++++++++++++++++++++++++++++++++----------------------------------------------------------------
> 1 file changed, 32 insertions(+), 66 deletions(-)

Al,
I am aware of these and have a 23 part patchset on the linux-scsi list
for review (see https://marc.info/?l=linux-scsi&m=157052102631490&w=2 )
that amongst other things fixes all of these. It also re-adds the
functionality removed from the bsg driver last year. Unfortunately that
review process is going very slowly, so I have no objections if you
apply these now.

It is unlikely that these changes will introduce any bugs (they didn't in
my testing). If you want to do more testing you may find the sg3_utils
package helpful, especially in the testing directory:
https://github.com/hreinecke/sg3_utils

Doug Gilbert


2019-10-18 22:24:16

by Al Viro

[permalink] [raw]
Subject: [RFC] csum_and_copy_from_user() semantics

On Wed, Oct 16, 2019 at 09:25:40PM +0100, Al Viro wrote:

> 2) default csum_partial_copy_from_user(). What we need to do is
> turn it into default csum_and_copy_from_user(). This
> #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
> static inline
> __wsum csum_and_copy_from_user (const void __user *src, void *dst,
> int len, __wsum sum, int *err_ptr)
> {
> if (access_ok(src, len))
> return csum_partial_copy_from_user(src, dst, len, sum, err_ptr);
>
> if (len)
> *err_ptr = -EFAULT;
>
> return sum;
> }
> #endif
> in checksum.h is the only thing that calls that sucker and we can bloody
> well combine them and make the users of lib/checksum.h define
> _HAVE_ARCH_COPY_AND_CSUM_FROM_USER. That puts us reasonably close
> to having _HAVE_ARCH_COPY_AND_CSUM_FROM_USER unconditional and in any
> case, __copy_from_user() in lib/checksum.h turns into copy_from_user().

Actually, that gets interesting. First of all, csum_partial_copy_from_user()
has almost no callers other than csum_and_copy_from_user() - the only
exceptions are alpha and itanic, where csum_partial_copy_nocheck() instances
are using it.

Everything else goes through csum_and_copy_from_user(). And _that_ has
only two callers - csum_and_copy_from_iter() and csum_and_copy_from_iter_full().
Both treat any failures as "discard the thing", for a good reason. Namely,
neither csum_and_copy_from_user() nor csum_partial_copy_from_user() have any
means to tell the caller *where* has the fault happened. So anything
that calls them has to treat a fault as "nothing copied". That, of course,
goes both for data and csum.

Moreover, behaviour of instances on different architectures differs -
some zero the uncopied-over part of destination, some do not, some
just keep going treating every failed fetch as "got zero" (and returning
the error in the end).

We could, in theory, teach that thing to report the exact amount
copied, so that new users (when and if such appear) could make use
of that. However, it means a lot of unpleasant work on e.g. sparc.
For raw_copy_from_user() we had to do that, but here I don't see
the point.

As it is, it's only suitable for "discard if anything fails, treat
the entire destination area as garbage in such case" uses. Which is
all we have for it at the moment.

IOW, it might make sense to get rid of all the "memset the tail to
zero on failure" logics in there - it's not consistently done and
the callers have no way to make use of it anyway.

In any case, there's no point keeping csum_and_copy_from_user()
separate from csum_partial_copy_from_user(). As it is, the
only real difference is that the former does access_ok(), while
the latter might not (some instances do, in which case there's
no difference at all).

Questions from reviewing the instances:
* mips csum_and_partial_copy_from_user() tries to check
if we are under KERNEL_DS, in which case it goes for kernel-to-kernel
copy. That's pointless - the callers are reading from an
iovec-backed iov_iter, which can't be created under KERNEL_DS.
So we would have to have set iovec-backed iov_iter while under
USER_DS, then do set_fs(KERNEL_DS), then pass that iov_iter to
->sendmsg(). Which doesn't happen. IOW, the calls of
__csum_partial_copy_kernel() never happen - neither for
csum_and_copy_from_kernel() for csum_and_copy_to_kernel().

* ppc does something odd:
csum = csum_partial_copy_generic((void __force *)src, dst,
len, sum, err_ptr, NULL);

if (unlikely(*err_ptr)) {
int missing = __copy_from_user(dst, src, len);

if (missing) {
memset(dst + len - missing, 0, missing);
*err_ptr = -EFAULT;
} else {
*err_ptr = 0;
}

csum = csum_partial(dst, len, sum);
}
and since that happens under their stac equivalent, we get it nested -
__copy_from_user() takes and drops it. I would've said "don't bother
trying to be smart on failures", if I'd been certain that it's not
a fallback for e.g. csum_and_partial_copy_from_user() in misaligned
case. Could ppc folks clarify that?

2019-10-25 20:31:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

Al,

On Sun, 13 Oct 2019, Al Viro wrote:
> On Sun, Oct 13, 2019 at 11:43:57AM -0700, Linus Torvalds wrote:
> > > And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
> > > are the only get_user_ex() users anywhere...
> >
> > Yeah, that sounds like a solid strategy for getting rid of them.
> >
> > Particularly since we can't really make get_user_ex() generate
> > particularly good code (at least for now).
> >
> > Now, put_user_ex() is a different thing - converting it to
> > unsafe_put_user() actually does make it generate very good code - much
> > better than copying data twice.
>
> No arguments re put_user_ex side of things... Below is a completely
> untested patch for get_user_ex elimination (it seems to build, but that's
> it); in any case, I would really like to see comments from x86 folks
> before it goes anywhere.

I'm fine with the approach, but I'd like to see the macro mess gone as
well. Reworked patch below.

Can you please split that up into several patches (signal, ia32/signal,
vm86 and removal) ?

Thanks,

tglx

8<------------
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 1cee10091b9f..00bf8ac1d42a 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -35,70 +35,57 @@
#include <asm/sighandling.h>
#include <asm/smap.h>

+static inline void reload_segments(struct sigcontext_32 *sc)
+{
+ unsigned int cur;
+
+ savesegment(gs, cur);
+ if ((sc->gs | 0x03) != cur)
+ load_gs_index(sc->gs | 0x03);
+ savesegment(fs, cur);
+ if ((sc->fs | 0x03) != cur)
+ loadsegment(fs, sc->fs | 0x03);
+ savesegment(ds, cur);
+ if ((sc->ds | 0x03) != cur)
+ loadsegment(ds, sc->ds | 0x03);
+ savesegment(es, cur);
+ if ((sc->es | 0x03) != cur)
+ loadsegment(es, sc->es | 0x03);
+}
+
/*
* Do a signal return; undo the signal stack.
*/
-#define loadsegment_gs(v) load_gs_index(v)
-#define loadsegment_fs(v) loadsegment(fs, v)
-#define loadsegment_ds(v) loadsegment(ds, v)
-#define loadsegment_es(v) loadsegment(es, v)
-
-#define get_user_seg(seg) ({ unsigned int v; savesegment(seg, v); v; })
-#define set_user_seg(seg, v) loadsegment_##seg(v)
-
-#define COPY(x) { \
- get_user_ex(regs->x, &sc->x); \
-}
-
-#define GET_SEG(seg) ({ \
- unsigned short tmp; \
- get_user_ex(tmp, &sc->seg); \
- tmp; \
-})
-
-#define COPY_SEG_CPL3(seg) do { \
- regs->seg = GET_SEG(seg) | 3; \
-} while (0)
-
-#define RELOAD_SEG(seg) { \
- unsigned int pre = (seg) | 3; \
- unsigned int cur = get_user_seg(seg); \
- if (pre != cur) \
- set_user_seg(seg, pre); \
-}
-
static int ia32_restore_sigcontext(struct pt_regs *regs,
- struct sigcontext_32 __user *sc)
+ struct sigcontext_32 __user *usc)
{
- unsigned int tmpflags, err = 0;
- u16 gs, fs, es, ds;
- void __user *buf;
- u32 tmp;
+ struct sigcontext_32 sc;
+ int ret = -EFAULT;

/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;

- get_user_try {
- gs = GET_SEG(gs);
- fs = GET_SEG(fs);
- ds = GET_SEG(ds);
- es = GET_SEG(es);
+ if (unlikely(__copy_from_user(&sc, usc, sizeof(sc))))
+ goto out;

- COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
- COPY(dx); COPY(cx); COPY(ip); COPY(ax);
- /* Don't touch extended registers */
+ /* Get only the ia32 registers. */
+ regs->bx = sc.bx;
+ regs->cx = sc.cx;
+ regs->dx = sc.dx;
+ regs->si = sc.si;
+ regs->di = sc.di;
+ regs->bp = sc.bp;
+ regs->ax = sc.ax;
+ regs->sp = sc.sp;
+ regs->ip = sc.ip;

- COPY_SEG_CPL3(cs);
- COPY_SEG_CPL3(ss);
+ /* Get CS/SS and force CPL3 */
+ regs->cs = sc.cs | 0x03;
+ regs->ss = sc.ss | 0x03;

- get_user_ex(tmpflags, &sc->flags);
- regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
- /* disable syscall checks */
- regs->orig_ax = -1;
-
- get_user_ex(tmp, &sc->fpstate);
- buf = compat_ptr(tmp);
- } get_user_catch(err);
+ regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+ /* disable syscall checks */
+ regs->orig_ax = -1;

/*
* Reload fs and gs if they have changed in the signal
@@ -106,16 +93,12 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
* the handler, but does not clobber them at least in the
* normal case.
*/
- RELOAD_SEG(gs);
- RELOAD_SEG(fs);
- RELOAD_SEG(ds);
- RELOAD_SEG(es);
-
- err |= fpu__restore_sig(buf, 1);
+ reload_segments(&sc);

+ ret = fpu__restore_sig(compat_ptr(sc.fpstate), 1);
+out:
force_iret();
-
- return err;
+ return ret;
}

asmlinkage long sys32_sigreturn(void)
@@ -176,6 +159,8 @@ asmlinkage long sys32_rt_sigreturn(void)
* Set up a signal frame.
*/

+#define get_user_seg(seg) ({ unsigned int v; savesegment(seg, v); v; })
+
static int ia32_setup_sigcontext(struct sigcontext_32 __user *sc,
void __user *fpstate,
struct pt_regs *regs, unsigned int mask)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61d93f062a36..ac81f06f8358 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -335,12 +335,9 @@ do { \
"i" (errret), "0" (retval)); \
})

-#define __get_user_asm_ex_u64(x, ptr) (x) = __get_user_bad()
#else
#define __get_user_asm_u64(x, ptr, retval, errret) \
__get_user_asm(x, ptr, retval, "q", "", "=r", errret)
-#define __get_user_asm_ex_u64(x, ptr) \
- __get_user_asm_ex(x, ptr, "q", "", "=r")
#endif

#define __get_user_size(x, ptr, size, retval, errret) \
@@ -390,41 +387,6 @@ do { \
: "=r" (err), ltype(x) \
: "m" (__m(addr)), "i" (errret), "0" (err))

-/*
- * This doesn't do __uaccess_begin/end - the exception handling
- * around it must do that.
- */
-#define __get_user_size_ex(x, ptr, size) \
-do { \
- __chk_user_ptr(ptr); \
- switch (size) { \
- case 1: \
- __get_user_asm_ex(x, ptr, "b", "b", "=q"); \
- break; \
- case 2: \
- __get_user_asm_ex(x, ptr, "w", "w", "=r"); \
- break; \
- case 4: \
- __get_user_asm_ex(x, ptr, "l", "k", "=r"); \
- break; \
- case 8: \
- __get_user_asm_ex_u64(x, ptr); \
- break; \
- default: \
- (x) = __get_user_bad(); \
- } \
-} while (0)
-
-#define __get_user_asm_ex(x, addr, itype, rtype, ltype) \
- asm volatile("1: mov"itype" %1,%"rtype"0\n" \
- "2:\n" \
- ".section .fixup,\"ax\"\n" \
- "3:xor"itype" %"rtype"0,%"rtype"0\n" \
- " jmp 2b\n" \
- ".previous\n" \
- _ASM_EXTABLE_EX(1b, 3b) \
- : ltype(x) : "m" (__m(addr)))
-
#define __put_user_nocheck(x, ptr, size) \
({ \
__label__ __pu_label; \
@@ -552,22 +514,6 @@ struct __large_struct { unsigned long buf[100]; };
#define __put_user(x, ptr) \
__put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))

-/*
- * {get|put}_user_try and catch
- *
- * get_user_try {
- * get_user_ex(...);
- * } get_user_catch(err)
- */
-#define get_user_try uaccess_try_nospec
-#define get_user_catch(err) uaccess_catch(err)
-
-#define get_user_ex(x, ptr) do { \
- unsigned long __gue_val; \
- __get_user_size_ex((__gue_val), (ptr), (sizeof(*(ptr)))); \
- (x) = (__force __typeof__(*(ptr)))__gue_val; \
-} while (0)
-
#define put_user_try uaccess_try
#define put_user_catch(err) uaccess_catch(err)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193e158d..c5c24cee3868 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -47,24 +47,6 @@
#include <asm/sigframe.h>
#include <asm/signal.h>

-#define COPY(x) do { \
- get_user_ex(regs->x, &sc->x); \
-} while (0)
-
-#define GET_SEG(seg) ({ \
- unsigned short tmp; \
- get_user_ex(tmp, &sc->seg); \
- tmp; \
-})
-
-#define COPY_SEG(seg) do { \
- regs->seg = GET_SEG(seg); \
-} while (0)
-
-#define COPY_SEG_CPL3(seg) do { \
- regs->seg = GET_SEG(seg) | 3; \
-} while (0)
-
#ifdef CONFIG_X86_64
/*
* If regs->ss will cause an IRET fault, change it. Otherwise leave it
@@ -92,53 +74,59 @@ static void force_valid_ss(struct pt_regs *regs)
ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN))
regs->ss = __USER_DS;
}
+# define CONTEXT_COPY_SIZE offsetof(struct sigcontext, reserved1)
+#else
+# define CONTEXT_COPY_SIZE sizeof(struct sigcontext)
#endif

static int restore_sigcontext(struct pt_regs *regs,
- struct sigcontext __user *sc,
+ struct sigcontext __user *usc,
unsigned long uc_flags)
{
- unsigned long buf_val;
- void __user *buf;
- unsigned int tmpflags;
- unsigned int err = 0;
+ struct sigcontext sc;
+ int ret = -EFAULT;

/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;

- get_user_try {
+ if (unlikely(__copy_from_user(&sc, usc, CONTEXT_COPY_SIZE)))
+ goto out;

#ifdef CONFIG_X86_32
- set_user_gs(regs, GET_SEG(gs));
- COPY_SEG(fs);
- COPY_SEG(es);
- COPY_SEG(ds);
+ set_user_gs(regs, sc.gs);
+ regs->fs = sc.fs;
+ regs->es = sc.es;
+ regs->ds = sc.ds;
#endif /* CONFIG_X86_32 */

- COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
- COPY(dx); COPY(cx); COPY(ip); COPY(ax);
+ regs->bx = sc.bx;
+ regs->cx = sc.cx;
+ regs->dx = sc.dx;
+ regs->si = sc.si;
+ regs->di = sc.di;
+ regs->bp = sc.bp;
+ regs->ax = sc.ax;
+ regs->sp = sc.sp;
+ regs->ip = sc.ip;

#ifdef CONFIG_X86_64
- COPY(r8);
- COPY(r9);
- COPY(r10);
- COPY(r11);
- COPY(r12);
- COPY(r13);
- COPY(r14);
- COPY(r15);
+ regs->r8 = sc.r8;
+ regs->r9 = sc.r9;
+ regs->r10 = sc.r10;
+ regs->r11 = sc.r11;
+ regs->r12 = sc.r12;
+ regs->r13 = sc.r13;
+ regs->r14 = sc.r14;
+ regs->r15 = sc.r15;
#endif /* CONFIG_X86_64 */

- COPY_SEG_CPL3(cs);
- COPY_SEG_CPL3(ss);
+ /* Get CS/SS and force CPL3 */
+ regs->cs = sc.cs | 0x03;
+ regs->ss = sc.ss | 0x03;

- get_user_ex(tmpflags, &sc->flags);
- regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
- regs->orig_ax = -1; /* disable syscall checks */
-
- get_user_ex(buf_val, &sc->fpstate);
- buf = (void __user *)buf_val;
- } get_user_catch(err);
+ regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+ /* disable syscall checks */
+ regs->orig_ax = -1;

#ifdef CONFIG_X86_64
/*
@@ -149,11 +137,11 @@ static int restore_sigcontext(struct pt_regs *regs,
force_valid_ss(regs);
#endif

- err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
-
+ ret = fpu__restore_sig((void __user *)sc.fpstate,
+ IS_ENABLED(CONFIG_X86_32));
+out:
force_iret();
-
- return err;
+ return ret;
}

int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index a76c12b38e92..11ef9d3c5387 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -243,6 +243,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
struct kernel_vm86_regs vm86regs;
struct pt_regs *regs = current_pt_regs();
unsigned long err = 0;
+ struct vm86_struct v;

err = security_mmap_addr(0);
if (err) {
@@ -283,34 +284,32 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
sizeof(struct vm86plus_struct)))
return -EFAULT;

+ if (unlikely(__copy_from_user(&v, user_vm86,
+ offsetof(struct vm86_struct, int_revectored))))
+ return -EFAULT;
+
memset(&vm86regs, 0, sizeof(vm86regs));
- get_user_try {
- unsigned short seg;
- get_user_ex(vm86regs.pt.bx, &user_vm86->regs.ebx);
- get_user_ex(vm86regs.pt.cx, &user_vm86->regs.ecx);
- get_user_ex(vm86regs.pt.dx, &user_vm86->regs.edx);
- get_user_ex(vm86regs.pt.si, &user_vm86->regs.esi);
- get_user_ex(vm86regs.pt.di, &user_vm86->regs.edi);
- get_user_ex(vm86regs.pt.bp, &user_vm86->regs.ebp);
- get_user_ex(vm86regs.pt.ax, &user_vm86->regs.eax);
- get_user_ex(vm86regs.pt.ip, &user_vm86->regs.eip);
- get_user_ex(seg, &user_vm86->regs.cs);
- vm86regs.pt.cs = seg;
- get_user_ex(vm86regs.pt.flags, &user_vm86->regs.eflags);
- get_user_ex(vm86regs.pt.sp, &user_vm86->regs.esp);
- get_user_ex(seg, &user_vm86->regs.ss);
- vm86regs.pt.ss = seg;
- get_user_ex(vm86regs.es, &user_vm86->regs.es);
- get_user_ex(vm86regs.ds, &user_vm86->regs.ds);
- get_user_ex(vm86regs.fs, &user_vm86->regs.fs);
- get_user_ex(vm86regs.gs, &user_vm86->regs.gs);
-
- get_user_ex(vm86->flags, &user_vm86->flags);
- get_user_ex(vm86->screen_bitmap, &user_vm86->screen_bitmap);
- get_user_ex(vm86->cpu_type, &user_vm86->cpu_type);
- } get_user_catch(err);
- if (err)
- return err;
+
+ vm86regs.pt.bx = v.regs.ebx;
+ vm86regs.pt.cx = v.regs.ecx;
+ vm86regs.pt.dx = v.regs.edx;
+ vm86regs.pt.si = v.regs.esi;
+ vm86regs.pt.di = v.regs.edi;
+ vm86regs.pt.bp = v.regs.ebp;
+ vm86regs.pt.ax = v.regs.eax;
+ vm86regs.pt.ip = v.regs.eip;
+ vm86regs.pt.cs = v.regs.cs;
+ vm86regs.pt.flags = v.regs.eflags;
+ vm86regs.pt.sp = v.regs.esp;
+ vm86regs.pt.ss = v.regs.ss;
+ vm86regs.es = v.regs.es;
+ vm86regs.ds = v.regs.ds;
+ vm86regs.fs = v.regs.fs;
+ vm86regs.gs = v.regs.gs;
+
+ vm86->flags = v.flags;
+ vm86->screen_bitmap = v.screen_bitmap;
+ vm86->cpu_type = v.cpu_type;

if (copy_from_user(&vm86->int_revectored,
&user_vm86->int_revectored,

2019-11-05 04:57:57

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [RFC][PATCHES] drivers/scsi/sg.c uaccess cleanups/fixes


Hi Al!

> I've got a series that presumably fixes and cleans the things up
> in that area; it didn't get any serious testing (the kernel builds
> and boots, smartctl works as well as it used to, but that's not
> worth much - all it says is that SG_IO doesn't fail terribly;
> I don't have any test setup for really working with /dev/sg*).

I tested this last week without noticing any problems.

What's your plan for this series? Want me to queue it up for 5.5?

--
Martin K. Petersen Oracle Linux Engineering

2019-11-05 05:27:04

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCHES] drivers/scsi/sg.c uaccess cleanups/fixes

On Mon, Nov 04, 2019 at 11:54:20PM -0500, Martin K. Petersen wrote:
>
> Hi Al!
>
> > I've got a series that presumably fixes and cleans the things up
> > in that area; it didn't get any serious testing (the kernel builds
> > and boots, smartctl works as well as it used to, but that's not
> > worth much - all it says is that SG_IO doesn't fail terribly;
> > I don't have any test setup for really working with /dev/sg*).
>
> I tested this last week without noticing any problems.
>
> What's your plan for this series? Want me to queue it up for 5.5?

I can put it into vfs.git into a never-rebased branch or you could put it
into scsi tree - up to you...

2019-11-06 04:30:54

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [RFC][PATCHES] drivers/scsi/sg.c uaccess cleanups/fixes


Al,

>> What's your plan for this series? Want me to queue it up for 5.5?
>
> I can put it into vfs.git into a never-rebased branch or you could put
> it into scsi tree - up to you...

Applied to 5.5/scsi-queue with Doug's Acked-by. Thanks!

--
Martin K. Petersen Oracle Linux Engineering