2018-12-15 19:21:06

by Andy Lutomirski

[permalink] [raw]
Subject: Fixing MIPS delay slot emulation weakness?

Hi all-

Some security researchers pointed out that writing to the delay slot
emulation page is a great exploit technique on MIPS. It was
introduced in:

commit 432c6bacbd0c16ec210c43da411ccc3855c4c010
Author: Paul Burton <[email protected]>
Date: Fri Jul 8 11:06:19 2016 +0100

MIPS: Use per-mm page to execute branch delay slot instructions

With my vDSO hat on, I hereby offer a couple of straightforward
suggestions for fixing it. The offending code is:

base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
VM_READ|VM_WRITE|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
0, NULL);

VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address.

The really simple but possibly suboptimal fix is to get rid of
VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it.

A possibly nicer way to accomplish more or less the same thing would
be to allocate the area with _install_special_mapping() and arrange to
keep a reference to the struct page around.

The really nice but less compatible fix would be to let processes or
even the whole system opt out by promising not to put anything in FPU
branch delay slots, of course.

--Andy


2018-12-15 21:27:53

by Paul Burton

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

Hi Andy,

On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote:
> Some security researchers pointed out that writing to the delay slot
> emulation page is a great exploit technique on MIPS. It was
> introduced in:
>
> commit 432c6bacbd0c16ec210c43da411ccc3855c4c010
> Author: Paul Burton <[email protected]>
> Date: Fri Jul 8 11:06:19 2016 +0100
>
> MIPS: Use per-mm page to execute branch delay slot instructions

Are there any further details you can share? You'd still need to
persuade a program to both write to & jump to the page, right? We're
talking purely about this providing writable+executable memory?

For the record prior to this patch we had to keep the user's stack
executable & write instructions there, so this didn't make things any
worse.

> With my vDSO hat on, I hereby offer a couple of straightforward
> suggestions for fixing it. The offending code is:
>
> base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> VM_READ|VM_WRITE|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> 0, NULL);
>
> VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address.
>
> The really simple but possibly suboptimal fix is to get rid of
> VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it.
>
> A possibly nicer way to accomplish more or less the same thing would
> be to allocate the area with _install_special_mapping() and arrange to
> keep a reference to the struct page around.

Right, I can look into that.

> The really nice but less compatible fix would be to let processes or
> even the whole system opt out by promising not to put anything in FPU
> branch delay slots, of course.

The ultimate fix comes with a switch to the nanoMIPS ISA which has no
delay slots :)

Thanks,
Paul

2018-12-15 22:51:55

by Rich Felker

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote:
> Hi all-
>
> Some security researchers pointed out that writing to the delay slot
> emulation page is a great exploit technique on MIPS. It was
> introduced in:
>
> commit 432c6bacbd0c16ec210c43da411ccc3855c4c010
> Author: Paul Burton <[email protected]>
> Date: Fri Jul 8 11:06:19 2016 +0100
>
> MIPS: Use per-mm page to execute branch delay slot instructions
>
> With my vDSO hat on, I hereby offer a couple of straightforward
> suggestions for fixing it. The offending code is:
>
> base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> VM_READ|VM_WRITE|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> 0, NULL);
>
> VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address.
>
> The really simple but possibly suboptimal fix is to get rid of
> VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it.
>
> A possibly nicer way to accomplish more or less the same thing would
> be to allocate the area with _install_special_mapping() and arrange to
> keep a reference to the struct page around.
>
> The really nice but less compatible fix would be to let processes or
> even the whole system opt out by promising not to put anything in FPU
> branch delay slots, of course.

As I noted on Twitter when Mudge brought this topic back up, there's a
much more compatible, elegant, and safe fix possible that does not
involve any W+X memory. Emulate the delay slot in kernel-space. This
is trivial to do safely for pretty much everything but loads/stores.
For loads/stores, where you want them to execute with user privilege
level, what you do is compute the effective address in kernel-space,
then return to a fixed instruction in the vdso page that performs a
generic load/store using the register the kernel put the effective
address result in, then restores registers off the stack and jumps to
the branch destination.

Rich

2018-12-16 02:16:50

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sat, 15 Dec 2018, Rich Felker wrote:

> > A possibly nicer way to accomplish more or less the same thing would
> > be to allocate the area with _install_special_mapping() and arrange to
> > keep a reference to the struct page around.
> >
> > The really nice but less compatible fix would be to let processes or
> > even the whole system opt out by promising not to put anything in FPU
> > branch delay slots, of course.
>
> As I noted on Twitter when Mudge brought this topic back up, there's a
> much more compatible, elegant, and safe fix possible that does not
> involve any W+X memory. Emulate the delay slot in kernel-space. This
> is trivial to do safely for pretty much everything but loads/stores.

I think "trivial" is an understatement, you at least need to decode the
delay-slot instruction enough to tell privileged and user instructions
apart and send SIGILL where appropriate. Some user instructions send
exceptions too and you need to handle them accordingly.

OTOH, for things like ADDIUPC you need to interpret the instruction
anyway, as the value of the PC used for calculation will be wrong except
in the original location.

> For loads/stores, where you want them to execute with user privilege
> level, what you do is compute the effective address in kernel-space,
> then return to a fixed instruction in the vdso page that performs a
> generic load/store using the register the kernel put the effective
> address result in, then restores registers off the stack and jumps to
> the branch destination.

What about all the odd and especially vendor-specific load/store
instructions like ASET, SAA or SWAPW? Would we need to have all the
possible encodings provided in the VDSO?

Maciej

2018-12-16 02:35:05

by Rich Felker

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, Dec 16, 2018 at 02:15:38AM +0000, Maciej W. Rozycki wrote:
> On Sat, 15 Dec 2018, Rich Felker wrote:
>
> > > A possibly nicer way to accomplish more or less the same thing would
> > > be to allocate the area with _install_special_mapping() and arrange to
> > > keep a reference to the struct page around.
> > >
> > > The really nice but less compatible fix would be to let processes or
> > > even the whole system opt out by promising not to put anything in FPU
> > > branch delay slots, of course.
> >
> > As I noted on Twitter when Mudge brought this topic back up, there's a
> > much more compatible, elegant, and safe fix possible that does not
> > involve any W+X memory. Emulate the delay slot in kernel-space. This
> > is trivial to do safely for pretty much everything but loads/stores.
>
> I think "trivial" is an understatement, you at least need to decode the
> delay-slot instruction enough to tell privileged and user instructions
> apart and send SIGILL where appropriate. Some user instructions send
> exceptions too and you need to handle them accordingly.

I meant simply that making them safe is trivial if they're not
accessing memory, only modifying the regisster file in the signal
context. Not that emulating them is trivial.

On the other hand it might be cleaner, safer, and easier to simply
write a full mips ISA emulator, put it in the vdso, and have the
kernel immediately return-to-userspace on hitting floating point
instructions and let the emulator code there take care of it all and
then return to the normal flow of execution.

> OTOH, for things like ADDIUPC you need to interpret the instruction
> anyway, as the value of the PC used for calculation will be wrong except
> in the original location.

Indeed. Assuming arbitrary ISA extensions including stuff that does
PC-relative arithmetic, there's no way to execute it out-of-place
without knowing how to interpret it.

> > For loads/stores, where you want them to execute with user privilege
> > level, what you do is compute the effective address in kernel-space,
> > then return to a fixed instruction in the vdso page that performs a
> > generic load/store using the register the kernel put the effective
> > address result in, then restores registers off the stack and jumps to
> > the branch destination.
>
> What about all the odd and especially vendor-specific load/store
> instructions like ASET, SAA or SWAPW? Would we need to have all the
> possible encodings provided in the VDSO?

Can all kinds of weird stuff like this go in delay slots? I'm more
familiar with SH delay slots where lots of instructions are
slot-illegal. If so perhaps the full-emulator-in-userspace approach is
better.

Rich

2018-12-16 13:54:29

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sat, 15 Dec 2018, Rich Felker wrote:

> > I think "trivial" is an understatement, you at least need to decode the
> > delay-slot instruction enough to tell privileged and user instructions
> > apart and send SIGILL where appropriate. Some user instructions send
> > exceptions too and you need to handle them accordingly.
>
> I meant simply that making them safe is trivial if they're not
> accessing memory, only modifying the regisster file in the signal
> context. Not that emulating them is trivial.

OK, fair enough.

> On the other hand it might be cleaner, safer, and easier to simply
> write a full mips ISA emulator, put it in the vdso, and have the
> kernel immediately return-to-userspace on hitting floating point
> instructions and let the emulator code there take care of it all and
> then return to the normal flow of execution.

The problem is matching hardware being run on and then maintaining that
stuff. I'd call that a maintenance nightmare, I'm afraid. See what pain
we have to go through already to get FPU emulation right, and there's much
less variation.

> > OTOH, for things like ADDIUPC you need to interpret the instruction
> > anyway, as the value of the PC used for calculation will be wrong except
> > in the original location.
>
> Indeed. Assuming arbitrary ISA extensions including stuff that does
> PC-relative arithmetic, there's no way to execute it out-of-place
> without knowing how to interpret it.

Well, ADDIUPC is a standard microMIPS instruction. Then R6 has more
stuff like that in the regular MIPS instruction set, e.g. AUIPC, LWPC.

> > What about all the odd and especially vendor-specific load/store
> > instructions like ASET, SAA or SWAPW? Would we need to have all the
> > possible encodings provided in the VDSO?
>
> Can all kinds of weird stuff like this go in delay slots? I'm more
> familiar with SH delay slots where lots of instructions are
> slot-illegal. If so perhaps the full-emulator-in-userspace approach is
> better.

I've double-checked and ASET is actually not allowed in a delay slot,
because it uses multiple bus cycles for data access. This is also why
LWP, LWM, etc. are not allowed either. Also control transfer instructions
are not allowed (unlike with SPARC), such as branches, ERET or YIELD (not
that the two latter instructions matter for us). Most of stuff is allowed
in delay slots though.

It doesn't help that information about that is scattered across many
documents. You can check for the NODS flag in the opcodes library from
binutils though, which is almost 100% accurate, except for the SYNC
instructions, for semantic reasons (i.e. they are allowed, but we don't
want GAS to reorder them). Most of the disallowed stuff is in the
microMIPS instruction set, due to encodings that execute as hardware
macros.

Maciej

2018-12-16 18:13:59

by Rich Felker

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sat, Dec 15, 2018 at 09:26:45PM +0000, Paul Burton wrote:
> > The really nice but less compatible fix would be to let processes or
> > even the whole system opt out by promising not to put anything in FPU
> > branch delay slots, of course.
>
> The ultimate fix comes with a switch to the nanoMIPS ISA which has no
> delay slots :)

I don't understand the MIPS position that introducing new ISAs
(including silently-new like r6) and not supporting or deprecating
support for the old one is a solution to anything. If one doesn't care
about the ability to run existing binaries for your platform, one
might as well switch to RISC-V or ARM or whatever. The whole advantage
of an ISA as a "platform" is the ability to run existing software and
use existing tooling (not just compilers; think also JITs, FFI
frameworks, etc), not any particular design advantage the ISA has.

Rich

2018-12-16 18:16:14

by Rich Felker

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote:
> On Sat, 15 Dec 2018, Rich Felker wrote:
>
> > > I think "trivial" is an understatement, you at least need to decode the
> > > delay-slot instruction enough to tell privileged and user instructions
> > > apart and send SIGILL where appropriate. Some user instructions send
> > > exceptions too and you need to handle them accordingly.
> >
> > I meant simply that making them safe is trivial if they're not
> > accessing memory, only modifying the regisster file in the signal
> > context. Not that emulating them is trivial.
>
> OK, fair enough.
>
> > On the other hand it might be cleaner, safer, and easier to simply
> > write a full mips ISA emulator, put it in the vdso, and have the
> > kernel immediately return-to-userspace on hitting floating point
> > instructions and let the emulator code there take care of it all and
> > then return to the normal flow of execution.
>
> The problem is matching hardware being run on and then maintaining that
> stuff. I'd call that a maintenance nightmare, I'm afraid. See what pain
> we have to go through already to get FPU emulation right, and there's much
> less variation.
>
> > > OTOH, for things like ADDIUPC you need to interpret the instruction
> > > anyway, as the value of the PC used for calculation will be wrong except
> > > in the original location.
> >
> > Indeed. Assuming arbitrary ISA extensions including stuff that does
> > PC-relative arithmetic, there's no way to execute it out-of-place
> > without knowing how to interpret it.
>
> Well, ADDIUPC is a standard microMIPS instruction. Then R6 has more
> stuff like that in the regular MIPS instruction set, e.g. AUIPC, LWPC.
>
> > > What about all the odd and especially vendor-specific load/store
> > > instructions like ASET, SAA or SWAPW? Would we need to have all the
> > > possible encodings provided in the VDSO?
> >
> > Can all kinds of weird stuff like this go in delay slots? I'm more
> > familiar with SH delay slots where lots of instructions are
> > slot-illegal. If so perhaps the full-emulator-in-userspace approach is
> > better.
>
> I've double-checked and ASET is actually not allowed in a delay slot,
> because it uses multiple bus cycles for data access. This is also why
> LWP, LWM, etc. are not allowed either. Also control transfer instructions
> are not allowed (unlike with SPARC), such as branches, ERET or YIELD (not
> that the two latter instructions matter for us). Most of stuff is allowed
> in delay slots though.
>
> It doesn't help that information about that is scattered across many
> documents. You can check for the NODS flag in the opcodes library from
> binutils though, which is almost 100% accurate, except for the SYNC
> instructions, for semantic reasons (i.e. they are allowed, but we don't
> want GAS to reorder them). Most of the disallowed stuff is in the
> microMIPS instruction set, due to encodings that execute as hardware
> macros.

I think it suffices to emulate what compilers generate in delay slots,
which should be fairly minimal and stable. At the very least we could
enumerate everything GCC and LLVM already emit there, and get them to
upstream a policy of not adding new insns as fpu-delay-slot-allowed.
If someone is writing asm by hand to do ridiculous things in the delay
slot with random ISA extensions, they shouldn't expect it to work.

Rich

2018-12-16 18:58:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, Dec 16, 2018 at 1:22 AM Paul Burton <[email protected]> wrote:
>
> Hi Andy,
>
> On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote:
> > Some security researchers pointed out that writing to the delay slot
> > emulation page is a great exploit technique on MIPS. It was
> > introduced in:
> >
> > commit 432c6bacbd0c16ec210c43da411ccc3855c4c010
> > Author: Paul Burton <[email protected]>
> > Date: Fri Jul 8 11:06:19 2016 +0100
> >
> > MIPS: Use per-mm page to execute branch delay slot instructions
>
> Are there any further details you can share? You'd still need to
> persuade a program to both write to & jump to the page, right? We're
> talking purely about this providing writable+executable memory?

Yes, exactly. You need a bug in order to take advantage of it. The
RWX page at a known location just makes exploitation considerably
easier.

I should also note that, on x86 at least, emulating loads and stores
is not so bad. The x86 vsyscall emulation code does it and has almost
fully correct fault semantics. (I say "almost" because I didn't
bother getting the semantics exactly right for non-canonical addresses
and kernel addresses.)

2018-12-16 19:06:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, Dec 16, 2018 at 10:13 AM Rich Felker <[email protected]> wrote:
>
> On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote:
> > On Sat, 15 Dec 2018, Rich Felker wrote:
> >
> >
> > It doesn't help that information about that is scattered across many
> > documents. You can check for the NODS flag in the opcodes library from
> > binutils though, which is almost 100% accurate, except for the SYNC
> > instructions, for semantic reasons (i.e. they are allowed, but we don't
> > want GAS to reorder them). Most of the disallowed stuff is in the
> > microMIPS instruction set, due to encodings that execute as hardware
> > macros.
>
> I think it suffices to emulate what compilers generate in delay slots,
> which should be fairly minimal and stable. At the very least we could
> enumerate everything GCC and LLVM already emit there, and get them to
> upstream a policy of not adding new insns as fpu-delay-slot-allowed.
> If someone is writing asm by hand to do ridiculous things in the delay
> slot with random ISA extensions, they shouldn't expect it to work.
>

I feel like I have to ask: the real thing preventing emulation is that
new nonstandard instructions might get used in FPU delay slots on
non-FPU-supporting hardware? This seems utterly nuts. If you're
using custom ISA extensions, why on Earth are you also using emulated
floating point instructions? You're targetting a specific known CPU
if you do this, so you should use only instructions that actually work
on that CPU.

2018-12-16 19:47:25

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, 16 Dec 2018, Andy Lutomirski wrote:

> > I think it suffices to emulate what compilers generate in delay slots,
> > which should be fairly minimal and stable. At the very least we could
> > enumerate everything GCC and LLVM already emit there, and get them to
> > upstream a policy of not adding new insns as fpu-delay-slot-allowed.
> > If someone is writing asm by hand to do ridiculous things in the delay
> > slot with random ISA extensions, they shouldn't expect it to work.
> >
>
> I feel like I have to ask: the real thing preventing emulation is that
> new nonstandard instructions might get used in FPU delay slots on
> non-FPU-supporting hardware? This seems utterly nuts. If you're
> using custom ISA extensions, why on Earth are you also using emulated
> floating point instructions? You're targetting a specific known CPU
> if you do this, so you should use only instructions that actually work
> on that CPU.

The FPU is a part of the MIPS/Linux psABI and as far as CPU hardware is
concerned it is typically an RTL option for the customer to control when
synthesising hardware, just like say the sizes of the caches. IOW you'll
have some hardware with FPU and some without that is otherwise identical,
and maintaining two sets of binaries for what is a part of the psABI
anyway is often seen as not technically or commercially justified.

E.g. the (somewhat dated now) 24KEf and 24KEc are complementing standard
MIPS32r2+DSP processor cores with and without the FPU respectively. Of
course you can stick to the soft-float ABI instead, but then you'll be
wasting the FPU resource on FPU cores, so using the hard-float ABI and
having instructions emulated on non-FPU cores is usually considered a good
compromise.

Of course the FPU emulator should have been left to the userland rather
than put in the kernel, but that mistake was made many years ago and we
need to maintain compatibility. Also someone would actually have to
implement that userland emulator.

FAOD both GCC and GAS will happily schedule delay slots themselves as
long as the candidate instruction is recognised as valid in a delay slot,
so there's no need for anyone to do anything manually for the less common
instructions to end up in a delay slot. They just need to appear right
before a branch or a jump for that to happen. I can't speak for LLVM.

Maciej

2018-12-17 01:11:39

by Rich Felker

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, Dec 16, 2018 at 10:59:19AM -0800, Andy Lutomirski wrote:
> On Sun, Dec 16, 2018 at 10:13 AM Rich Felker <[email protected]> wrote:
> >
> > On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote:
> > > On Sat, 15 Dec 2018, Rich Felker wrote:
> > >
> > >
> > > It doesn't help that information about that is scattered across many
> > > documents. You can check for the NODS flag in the opcodes library from
> > > binutils though, which is almost 100% accurate, except for the SYNC
> > > instructions, for semantic reasons (i.e. they are allowed, but we don't
> > > want GAS to reorder them). Most of the disallowed stuff is in the
> > > microMIPS instruction set, due to encodings that execute as hardware
> > > macros.
> >
> > I think it suffices to emulate what compilers generate in delay slots,
> > which should be fairly minimal and stable. At the very least we could
> > enumerate everything GCC and LLVM already emit there, and get them to
> > upstream a policy of not adding new insns as fpu-delay-slot-allowed.
> > If someone is writing asm by hand to do ridiculous things in the delay
> > slot with random ISA extensions, they shouldn't expect it to work.
>
> I feel like I have to ask: the real thing preventing emulation is that
> new nonstandard instructions might get used in FPU delay slots on
> non-FPU-supporting hardware? This seems utterly nuts. If you're
> using custom ISA extensions, why on Earth are you also using emulated
> floating point instructions? You're targetting a specific known CPU
> if you do this, so you should use only instructions that actually work
> on that CPU.

Floating point is in the standard ABI, despite some models not having
fpu. This is what mandates floating point emulation. The reason you
have to be able to emulate or execute-out-of-line other instructions
is that there are floating point branch instructions bc1f and bc1t
(maybe others too?) with a delay slot, and if the branch is being
taken, you need some mechanism to cause the instruction in the delay
slot to still get executed. (If the branch is not taken you can just
increment PC and let it happen as a non-delay-slot.)

So in theory it's possible that there's a cpu model with fancy new
core instructions but no fpu. In this case, you would need the
capability to emulate or execute-out-of-line these instructions. But I
have no idea if such cpu models actually exist. If not, the concern
can probably be ignored and it suffices to emulate just the parts of
the base ISA that are valid in delay slots.

Rich

2018-12-17 01:57:30

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Sun, 16 Dec 2018, Rich Felker wrote:

> So in theory it's possible that there's a cpu model with fancy new
> core instructions but no fpu. In this case, you would need the
> capability to emulate or execute-out-of-line these instructions. But I
> have no idea if such cpu models actually exist. If not, the concern
> can probably be ignored and it suffices to emulate just the parts of
> the base ISA that are valid in delay slots.

What do you call "a cpu model with fancy new core instructions"?

We've gone through 4 legacy MIPS base ISA revisions (I to IV) and then 4
modern ones that matter (R1 to R5; R4 was left out and R6 actually does
not have FPU branch delay slots), plus a bunch of ASEs (Application
Specific Extensions), such as DSP, MDMX, MIPS-3D, MSA, etc., each defining
further instructions. And then the microMIPS R3 and R5 ISAs (R6 uses a
different instruction encoding and does not have delay slots at all).
The MIPS16 ISA does not count however, even though it has delay slots and
we support it, because it does not have FPU instructions, let alone ones
that require delay slot emulation.

Some of the ASEs do not matter, e.g. we don't support MDMX in Linux as it
has user state we don't handle with context switches, and MIPS-3D and MSA
both imply an FPU, so software making use of them won't run with our FPU
emulation anyway as these ASEs' instructions are not emulated. Anything
else is potentially required.

As to actual implementations I believe all the Cavium Octeon line CPUs
(David, please correct me if I am wrong) have no FPU and they have vendor
extensions beyond the base ISA + ASE instruction set. Arguably you could
say that their additional instructions should not be scheduled into FPU
branch delay slots then, however the toolchain will happily do that, as I
wrote before.

I don't fully remember what the situation is WRT NetLogic/Broadcom XLR
and XLP chips. They do have vendor extensions, though IIRC they do have
an FPU too.

But then we have the "nofpu" kernel parameter anyway, which forces FPU
emulation for any hardware, so we need to emulate delay slots in that mode
with any hardware.

I'm afraid the problem is complex to solve overall, which is why we still
have issues, 18 years on from the inclusion of the FPU emulator:

commit 4c55adaa6d06e5533aebaceea7640ecf10952231
Author: Ralf Baechle <[email protected]>
Date: Sat Nov 25 04:49:46 2000 +0000

Kernel FPU emulator, chain saw edition.

(in the LMO GIT repo) and I think actually running the delay-slot
instruction (with a possible exception for things like ADDIUPC) rather
than interpreting it is the only feasible solution.

I'm not involved with MIPS architecture development anymore though and at
this point I only care about the few legacy platforms I have been taking
care of since forever, such as the DECstation port, for which our current
emulation solution suffices, so I am not going to commit myself to making
any inventions in this area. I hope my input is valuable though and will
help someone working on this.

Maciej

2018-12-18 01:14:54

by Aaro Koskinen

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

Hi,

On Mon, Dec 17, 2018 at 01:55:28AM +0000, Maciej W. Rozycki wrote:
> As to actual implementations I believe all the Cavium Octeon line CPUs
> (David, please correct me if I am wrong) have no FPU and they have vendor
> extensions beyond the base ISA + ASE instruction set. Arguably you could
> say that their additional instructions should not be scheduled into FPU
> branch delay slots then, however the toolchain will happily do that, as I
> wrote before.

Octeon III added/introduced FPU.

A.

2018-12-19 05:17:19

by Paul Burton

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

Hello,

On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote:
> The really simple but possibly suboptimal fix is to get rid of
> VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it.

I actually wound up trying this route because it seemed like it would
produce a nice small patch that would be simple to backport, and we
could clean up mainline afterwards.

Unfortunately though things fail because get_user_pages() returns
-EFAULT for the delay slot emulation page, due to the !is_cow_mapping()
check in check_vma_flags(). This was introduced by commit cda540ace6a1
("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a
little confused as to its behaviour...

is_cow_mapping() returns true if the VM_MAYWRITE flag is set and
VM_SHARED is not set - this suggests a private & potentially-writable
area, right? That fits in nicely with an area we'd want to COW. Why then
does check_vma_flags() use the inverse of this to indicate a shared
area? This fails if we have a private mapping where VM_MAYWRITE is not
set, but where FOLL_FORCE would otherwise provide a means of writing to
the memory.

If I remove this check in check_vma_flags() then I have a nice simple
patch which seems to work well, leaving the user mapping of the delay
slot emulation page non-writeable. I'm not sure I'm following the mm
innards here though - is there something I should change about the delay
slot page instead? Should I be marking it shared, even though it isn't
really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should
set that - would that allow a user to use mprotect() to make the region
writeable..?

The work-in-progress patch can be seen below if it's helpful (and yes, I
realise that the modified condition in check_vma_flags() became
impossible & that removing it would be equivalent).

Or perhaps this is only confusing because it's 4:25am & I'm massively
jetlagged... :)

> A possibly nicer way to accomplish more or less the same thing would
> be to allocate the area with _install_special_mapping() and arrange to
> keep a reference to the struct page around.

I looked at this, but it ends up being a much bigger patch. Perhaps it
could be something to look into as a follow-on cleanup, though it
complicates things a little because we need to actually allocate the
page, preferrably only on demand, which is handled for us with the
current mmap_region() code.

Thanks,
Paul

---
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 48a9c6b90e07..9476efb54d18 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)

/* Map delay slot emulation page */
base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
- VM_READ|VM_WRITE|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,
0, NULL);
if (IS_ERR_VALUE(base)) {
ret = base;
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 5450f4d1c920..3aa8e3b90efb 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -67,11 +67,6 @@ struct emuframe {

static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe);

-static inline __user struct emuframe *dsemul_page(void)
-{
- return (__user struct emuframe *)STACK_TOP;
-}
-
static int alloc_emuframe(void)
{
mm_context_t *mm_ctx = &current->mm->context;
@@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm)

static bool within_emuframe(struct pt_regs *regs)
{
- unsigned long base = (unsigned long)dsemul_page();
+ unsigned long base = STACK_TOP;

if (regs->cp0_epc < base)
return false;
@@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk)

bool dsemul_thread_rollback(struct pt_regs *regs)
{
- struct emuframe __user *fr;
- int fr_idx;
+ struct emuframe fr;
+ int fr_idx, ret;

/* Do nothing if we're not executing from a frame */
if (!within_emuframe(regs))
@@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs)
fr_idx = atomic_read(&current->thread.bd_emu_frame);
if (fr_idx == BD_EMUFRAME_NONE)
return false;
- fr = &dsemul_page()[fr_idx];
+
+ ret = access_process_vm(current,
+ STACK_TOP + (fr_idx * sizeof(fr)),
+ &fr, sizeof(fr), FOLL_FORCE);
+ if (WARN_ON(ret != sizeof(fr)))
+ return false;

/*
* If the PC is at the emul instruction, roll back to the branch. If
@@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs)
* then something is amiss & the user has branched into some other area
* of the emupage - we'll free the allocated frame anyway.
*/
- if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul)
+ if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul)
regs->cp0_epc = current->thread.bd_emu_branch_pc;
- else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst)
+ else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst)
regs->cp0_epc = current->thread.bd_emu_cont_pc;

atomic_set(&current->thread.bd_emu_frame, BD_EMUFRAME_NONE);
@@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
{
int isa16 = get_isa16_mode(regs->cp0_epc);
mips_instruction break_math;
- struct emuframe __user *fr;
- int err, fr_idx;
+ struct emuframe fr;
+ int fr_idx, ret;

/* NOP is easy */
if (ir == 0)
@@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
fr_idx = alloc_emuframe();
if (fr_idx == BD_EMUFRAME_NONE)
return SIGBUS;
- fr = &dsemul_page()[fr_idx];

/* Retrieve the appropriately encoded break instruction */
break_math = BREAK_MATH(isa16);

/* Write the instructions to the frame */
if (isa16) {
- err = __put_user(ir >> 16,
- (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff,
- (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(break_math >> 16,
- (u16 __user *)(&fr->badinst));
- err |= __put_user(break_math & 0xffff,
- (u16 __user *)((long)(&fr->badinst) + 2));
+ union mips_instruction _emul = {
+ .halfword = { ir >> 16, ir }
+ };
+ union mips_instruction _badinst = {
+ .halfword = { break_math >> 16, break_math }
+ };
+
+ fr.emul = _emul.word;
+ fr.badinst = _badinst.word;
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user(break_math, &fr->badinst);
+ fr.emul = ir;
+ fr.badinst = break_math;
}

- if (unlikely(err)) {
+ /* Write the frame to user memory */
+ ret = access_process_vm(current,
+ STACK_TOP + (fr_idx * sizeof(fr)),
+ &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE);
+ if (WARN_ON(ret != sizeof(fr))) {
MIPS_FPU_EMU_INC_STATS(errors);
free_emuframe(fr_idx, current->mm);
return SIGBUS;
@@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
atomic_set(&current->thread.bd_emu_frame, fr_idx);

/* Change user register context to execute the frame */
- regs->cp0_epc = (unsigned long)&fr->emul | isa16;
-
- /* Ensure the icache observes our newly written frame */
- flush_cache_sigtramp((unsigned long)&fr->emul);
+ regs->cp0_epc = (unsigned long)&fr.emul | isa16;

return 0;
}
diff --git a/mm/gup.c b/mm/gup.c
index f76e77a2d34b..9a1bc941dcb9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* Anon pages in shared mappings are surprising: now
* just reject it.
*/
- if (!is_cow_mapping(vm_flags))
+ if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags))
return -EFAULT;
}
} else if (!(vm_flags & VM_READ)) {

2018-12-19 22:16:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

On Wed, 19 Dec 2018, Paul Burton wrote:
> On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote:
> > The really simple but possibly suboptimal fix is to get rid of
> > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it.
>
> I actually wound up trying this route because it seemed like it would
> produce a nice small patch that would be simple to backport, and we
> could clean up mainline afterwards.
>
> Unfortunately though things fail because get_user_pages() returns
> -EFAULT for the delay slot emulation page, due to the !is_cow_mapping()
> check in check_vma_flags(). This was introduced by commit cda540ace6a1
> ("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a
> little confused as to its behaviour...
>
> is_cow_mapping() returns true if the VM_MAYWRITE flag is set and
> VM_SHARED is not set - this suggests a private & potentially-writable
> area, right? That fits in nicely with an area we'd want to COW. Why then
> does check_vma_flags() use the inverse of this to indicate a shared
> area? This fails if we have a private mapping where VM_MAYWRITE is not
> set, but where FOLL_FORCE would otherwise provide a means of writing to
> the memory.
>
> If I remove this check in check_vma_flags() then I have a nice simple
> patch which seems to work well, leaving the user mapping of the delay
> slot emulation page non-writeable. I'm not sure I'm following the mm
> innards here though - is there something I should change about the delay
> slot page instead? Should I be marking it shared, even though it isn't
> really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should
> set that - would that allow a user to use mprotect() to make the region
> writeable..?

Exactly, in that last sentence above you come to the right understanding
of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think
your issue in setting up the mmap, is that you're (rightly) doing it with
VM_flags to mmap_region(), but giving it a combination of flags that an
mmap() syscall from userspace would never arrive at, so does not match
expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c:
you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then
removing it just from the case of a MAP_SHARED without FMODE_WRITE.

>
> The work-in-progress patch can be seen below if it's helpful (and yes, I
> realise that the modified condition in check_vma_flags() became
> impossible & that removing it would be equivalent).
>
> Or perhaps this is only confusing because it's 4:25am & I'm massively
> jetlagged... :)
>
> > A possibly nicer way to accomplish more or less the same thing would
> > be to allocate the area with _install_special_mapping() and arrange to
> > keep a reference to the struct page around.
>
> I looked at this, but it ends up being a much bigger patch. Perhaps it
> could be something to look into as a follow-on cleanup, though it
> complicates things a little because we need to actually allocate the
> page, preferrably only on demand, which is handled for us with the
> current mmap_region() code.
>
> Thanks,
> Paul
>
> ---
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 48a9c6b90e07..9476efb54d18 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>
> /* Map delay slot emulation page */
> base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> - VM_READ|VM_WRITE|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,

So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE.

> 0, NULL);
> if (IS_ERR_VALUE(base)) {
> ret = base;
> diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
> index 5450f4d1c920..3aa8e3b90efb 100644
> --- a/arch/mips/math-emu/dsemul.c
> +++ b/arch/mips/math-emu/dsemul.c
> @@ -67,11 +67,6 @@ struct emuframe {
>
> static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe);
>
> -static inline __user struct emuframe *dsemul_page(void)
> -{
> - return (__user struct emuframe *)STACK_TOP;
> -}
> -
> static int alloc_emuframe(void)
> {
> mm_context_t *mm_ctx = &current->mm->context;
> @@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm)
>
> static bool within_emuframe(struct pt_regs *regs)
> {
> - unsigned long base = (unsigned long)dsemul_page();
> + unsigned long base = STACK_TOP;
>
> if (regs->cp0_epc < base)
> return false;
> @@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk)
>
> bool dsemul_thread_rollback(struct pt_regs *regs)
> {
> - struct emuframe __user *fr;
> - int fr_idx;
> + struct emuframe fr;
> + int fr_idx, ret;
>
> /* Do nothing if we're not executing from a frame */
> if (!within_emuframe(regs))
> @@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs)
> fr_idx = atomic_read(&current->thread.bd_emu_frame);
> if (fr_idx == BD_EMUFRAME_NONE)
> return false;
> - fr = &dsemul_page()[fr_idx];
> +
> + ret = access_process_vm(current,
> + STACK_TOP + (fr_idx * sizeof(fr)),
> + &fr, sizeof(fr), FOLL_FORCE);
> + if (WARN_ON(ret != sizeof(fr)))
> + return false;
>
> /*
> * If the PC is at the emul instruction, roll back to the branch. If
> @@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs)
> * then something is amiss & the user has branched into some other area
> * of the emupage - we'll free the allocated frame anyway.
> */
> - if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul)
> + if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul)
> regs->cp0_epc = current->thread.bd_emu_branch_pc;
> - else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst)
> + else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst)
> regs->cp0_epc = current->thread.bd_emu_cont_pc;
>
> atomic_set(&current->thread.bd_emu_frame, BD_EMUFRAME_NONE);
> @@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
> {
> int isa16 = get_isa16_mode(regs->cp0_epc);
> mips_instruction break_math;
> - struct emuframe __user *fr;
> - int err, fr_idx;
> + struct emuframe fr;
> + int fr_idx, ret;
>
> /* NOP is easy */
> if (ir == 0)
> @@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
> fr_idx = alloc_emuframe();
> if (fr_idx == BD_EMUFRAME_NONE)
> return SIGBUS;
> - fr = &dsemul_page()[fr_idx];
>
> /* Retrieve the appropriately encoded break instruction */
> break_math = BREAK_MATH(isa16);
>
> /* Write the instructions to the frame */
> if (isa16) {
> - err = __put_user(ir >> 16,
> - (u16 __user *)(&fr->emul));
> - err |= __put_user(ir & 0xffff,
> - (u16 __user *)((long)(&fr->emul) + 2));
> - err |= __put_user(break_math >> 16,
> - (u16 __user *)(&fr->badinst));
> - err |= __put_user(break_math & 0xffff,
> - (u16 __user *)((long)(&fr->badinst) + 2));
> + union mips_instruction _emul = {
> + .halfword = { ir >> 16, ir }
> + };
> + union mips_instruction _badinst = {
> + .halfword = { break_math >> 16, break_math }
> + };
> +
> + fr.emul = _emul.word;
> + fr.badinst = _badinst.word;
> } else {
> - err = __put_user(ir, &fr->emul);
> - err |= __put_user(break_math, &fr->badinst);
> + fr.emul = ir;
> + fr.badinst = break_math;
> }
>
> - if (unlikely(err)) {
> + /* Write the frame to user memory */
> + ret = access_process_vm(current,
> + STACK_TOP + (fr_idx * sizeof(fr)),
> + &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE);
> + if (WARN_ON(ret != sizeof(fr))) {
> MIPS_FPU_EMU_INC_STATS(errors);
> free_emuframe(fr_idx, current->mm);
> return SIGBUS;
> @@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
> atomic_set(&current->thread.bd_emu_frame, fr_idx);
>
> /* Change user register context to execute the frame */
> - regs->cp0_epc = (unsigned long)&fr->emul | isa16;
> -
> - /* Ensure the icache observes our newly written frame */
> - flush_cache_sigtramp((unsigned long)&fr->emul);
> + regs->cp0_epc = (unsigned long)&fr.emul | isa16;
>
> return 0;
> }
> diff --git a/mm/gup.c b/mm/gup.c
> index f76e77a2d34b..9a1bc941dcb9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> * Anon pages in shared mappings are surprising: now
> * just reject it.
> */
> - if (!is_cow_mapping(vm_flags))
> + if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags))

Then please drop this patch to mm/gup.c: does the result then work
for you? (I won't pretend to have reviewed the rest of the patch.)

Hugh

> return -EFAULT;
> }
> } else if (!(vm_flags & VM_READ)) {

2018-12-21 00:30:47

by Paul Burton

[permalink] [raw]
Subject: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

Mapping the delay slot emulation page as both writeable & executable
presents a security risk, in that if an exploit can write to & jump into
the page then it can be used as an easy way to execute arbitrary code.

Prevent this by mapping the page read-only for userland, and using
access_process_vm() with the FOLL_FORCE flag to write to it from
mips_dsemul().

This will likely be less efficient due to copy_to_user_page() performing
cache maintenance on a whole page, rather than a single line as in the
previous use of flush_cache_sigtramp(). However this delay slot
emulation code ought not to be running in any performance critical paths
anyway so this isn't really a problem, and we can probably do better in
copy_to_user_page() anyway in future.

A major advantage of this approach is that the fix is small & simple to
backport to stable kernels.

Reported-by: Andy Lutomirski <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: [email protected] # v4.8+
---
arch/mips/kernel/vdso.c | 4 ++--
arch/mips/math-emu/dsemul.c | 38 +++++++++++++++++++------------------
2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 48a9c6b90e07..9df3ebdc7b0f 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -126,8 +126,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)

/* Map delay slot emulation page */
base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
- VM_READ|VM_WRITE|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_READ | VM_EXEC |
+ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
0, NULL);
if (IS_ERR_VALUE(base)) {
ret = base;
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 5450f4d1c920..e2d46cb93ca9 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -214,8 +214,9 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
{
int isa16 = get_isa16_mode(regs->cp0_epc);
mips_instruction break_math;
- struct emuframe __user *fr;
- int err, fr_idx;
+ unsigned long fr_uaddr;
+ struct emuframe fr;
+ int fr_idx, ret;

/* NOP is easy */
if (ir == 0)
@@ -250,27 +251,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
fr_idx = alloc_emuframe();
if (fr_idx == BD_EMUFRAME_NONE)
return SIGBUS;
- fr = &dsemul_page()[fr_idx];

/* Retrieve the appropriately encoded break instruction */
break_math = BREAK_MATH(isa16);

/* Write the instructions to the frame */
if (isa16) {
- err = __put_user(ir >> 16,
- (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff,
- (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(break_math >> 16,
- (u16 __user *)(&fr->badinst));
- err |= __put_user(break_math & 0xffff,
- (u16 __user *)((long)(&fr->badinst) + 2));
+ union mips_instruction _emul = {
+ .halfword = { ir >> 16, ir }
+ };
+ union mips_instruction _badinst = {
+ .halfword = { break_math >> 16, break_math }
+ };
+
+ fr.emul = _emul.word;
+ fr.badinst = _badinst.word;
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user(break_math, &fr->badinst);
+ fr.emul = ir;
+ fr.badinst = break_math;
}

- if (unlikely(err)) {
+ /* Write the frame to user memory */
+ fr_uaddr = (unsigned long)&dsemul_page()[fr_idx];
+ ret = access_process_vm(current, fr_uaddr, &fr, sizeof(fr),
+ FOLL_FORCE | FOLL_WRITE);
+ if (unlikely(ret != sizeof(fr))) {
MIPS_FPU_EMU_INC_STATS(errors);
free_emuframe(fr_idx, current->mm);
return SIGBUS;
@@ -282,10 +287,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir,
atomic_set(&current->thread.bd_emu_frame, fr_idx);

/* Change user register context to execute the frame */
- regs->cp0_epc = (unsigned long)&fr->emul | isa16;
-
- /* Ensure the icache observes our newly written frame */
- flush_cache_sigtramp((unsigned long)&fr->emul);
+ regs->cp0_epc = fr_uaddr | isa16;

return 0;
}
--
2.20.0


2018-12-21 00:33:08

by Paul Burton

[permalink] [raw]
Subject: Re: Fixing MIPS delay slot emulation weakness?

Hi Hugh,

On Wed, Dec 19, 2018 at 01:12:58PM -0800, Hugh Dickins wrote:
> > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and
> > VM_SHARED is not set - this suggests a private & potentially-writable
> > area, right? That fits in nicely with an area we'd want to COW. Why then
> > does check_vma_flags() use the inverse of this to indicate a shared
> > area? This fails if we have a private mapping where VM_MAYWRITE is not
> > set, but where FOLL_FORCE would otherwise provide a means of writing to
> > the memory.
> >
> > If I remove this check in check_vma_flags() then I have a nice simple
> > patch which seems to work well, leaving the user mapping of the delay
> > slot emulation page non-writeable. I'm not sure I'm following the mm
> > innards here though - is there something I should change about the delay
> > slot page instead? Should I be marking it shared, even though it isn't
> > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should
> > set that - would that allow a user to use mprotect() to make the region
> > writeable..?
>
> Exactly, in that last sentence above you come to the right understanding
> of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think
> your issue in setting up the mmap, is that you're (rightly) doing it with
> VM_flags to mmap_region(), but giving it a combination of flags that an
> mmap() syscall from userspace would never arrive at, so does not match
> expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c:
> you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then
> removing it just from the case of a MAP_SHARED without FMODE_WRITE.
>
> > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > index 48a9c6b90e07..9476efb54d18 100644
> > --- a/arch/mips/kernel/vdso.c
> > +++ b/arch/mips/kernel/vdso.c
> > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> >
> > /* Map delay slot emulation page */
> > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> > - VM_READ|VM_WRITE|VM_EXEC|
> > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,
>
> So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE.

Thanks Hugh - it works fine when I leave in VM_MAYWRITE.

My 4am self had become convinced that it would be problematic if a user
program could mprotect() the memory & make it writable... But in reality
if a user program wants to do that then by all means, the kernel isn't
going to be able to prevent it doing silly things.

For anyone looking for the outcome, the patch I wound up with is here:

https://lore.kernel.org/linux-mips/[email protected]/

Thanks,
Paul

2018-12-23 09:15:16

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

Hi Sasha,

On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions.
>
> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146,

Neat! I like the idea of this automation :)

> v4.19.10: Build OK!
> v4.14.89: Build OK!
> v4.9.146: Failed to apply! Possible dependencies:
> 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request")
> 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS")
> 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success")
> 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event")
> 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event")
> 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps")
> 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor")
> d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE")

This list includes the correct soft dependency - commit 897ab3e0c49e
("userfaultfd: non-cooperative: add event for memory unmaps") which
added an extra argument to mmap_region().

> How should we proceed with this patch?

The backport to v4.9 should simply drop the last argument (NULL) in the
call to mmap_region().

Is there some way I can indicate this sort of thing in future patches so
that the automation can spot that I already know it won't apply cleanly
to a particular range of kernel versions? Or even better, that I could
indicate what needs to be changed when backporting to those kernel
versions?

Thanks,
Paul

2018-12-23 16:29:28

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

On Fri, Dec 21, 2018 at 09:16:37PM +0000, Paul Burton wrote:
>Hi Sasha,
>
>On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote:
>> Hi,
>>
>> [This is an automated email]
>>
>> This commit has been processed because it contains a "Fixes:" tag,
>> fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions.
>>
>> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146,
>
>Neat! I like the idea of this automation :)

Thank you! :)

>> v4.19.10: Build OK!
>> v4.14.89: Build OK!
>> v4.9.146: Failed to apply! Possible dependencies:
>> 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request")
>> 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS")
>> 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success")
>> 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event")
>> 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event")
>> 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps")
>> 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor")
>> d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE")
>
>This list includes the correct soft dependency - commit 897ab3e0c49e
>("userfaultfd: non-cooperative: add event for memory unmaps") which
>added an extra argument to mmap_region().
>
>> How should we proceed with this patch?
>
>The backport to v4.9 should simply drop the last argument (NULL) in the
>call to mmap_region().
>
>Is there some way I can indicate this sort of thing in future patches so
>that the automation can spot that I already know it won't apply cleanly
>to a particular range of kernel versions? Or even better, that I could
>indicate what needs to be changed when backporting to those kernel
>versions?

The "official" way of doing that is described here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst#n101

However, most people just either add a comment in the commit message, or
reply to the patch mail (or the "FAILED:" mail from Greg) saying how to
fix this. Either way works really.

Greg will also usually look up these automated mails when he adds stuff
to -stable, so if you describe how to deal with it here (like you did
above) is enough as well.

--
Thanks,
Sasha

2018-12-23 16:45:51

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

Hello,

Paul Burton wrote:
> Mapping the delay slot emulation page as both writeable & executable
> presents a security risk, in that if an exploit can write to & jump into
> the page then it can be used as an easy way to execute arbitrary code.
>
> Prevent this by mapping the page read-only for userland, and using
> access_process_vm() with the FOLL_FORCE flag to write to it from
> mips_dsemul().
>
> This will likely be less efficient due to copy_to_user_page() performing
> cache maintenance on a whole page, rather than a single line as in the
> previous use of flush_cache_sigtramp(). However this delay slot
> emulation code ought not to be running in any performance critical paths
> anyway so this isn't really a problem, and we can probably do better in
> copy_to_user_page() anyway in future.
>
> A major advantage of this approach is that the fix is small & simple to
> backport to stable kernels.
>
> Reported-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Paul Burton <[email protected]>
> Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
> Cc: [email protected] # v4.8+

Applied to mips-fixes.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]