2018-12-30 20:24:43

by Sergei Trofimovich

[permalink] [raw]
Subject: [PATCH] alpha: fix page fault handling for r16-r18 targets

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include <err.h>
#include <unistd.h>
#include <sys/mman.h>
#include <asm/unistd.h>
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, &ctx))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address ffffffffffff9468
CPU 3
aio(26027): Oops 0
pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
s6 = fffffc00c58e6300
a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
gp = 0000000000000000 sp = 00000000265fd174
Disabling lock debugging due to kernel taint
Trace:
[<fffffc0000311404>] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

CC: Dmitry V. Levin <[email protected]>
CC: Richard Henderson <[email protected]>
CC: Ivan Kokshaysky <[email protected]>
CC: Matt Turner <[email protected]>
CC: [email protected]
CC: [email protected]
Reported-by: Dmitry V. Levin
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich <[email protected]>
---
arch/alpha/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index d73dc473fbb9..188fc9256baf 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm)
/* Macro for exception fixup code to access integer registers. */
#define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \
- (r) <= 18 ? (r)+8 : (r)-10])
+ (r) <= 18 ? (r)+10 : (r)-10])

asmlinkage void
do_page_fault(unsigned long address, unsigned long mmcsr,
--
2.20.1



2018-12-31 01:48:50

by Dmitry V. Levin

[permalink] [raw]
Subject: Re: [PATCH] alpha: fix page fault handling for r16-r18 targets

Hi,

On Sun, Dec 30, 2018 at 08:23:12PM +0000, Sergei Trofimovich wrote:
> Fix page fault handling code to fixup r16-r18 registers.
> Before the patch code had off-by-two registers bug.
> This bug caused overwriting of ps,pc,gp registers instead
> of fixing intended r16,r17,r18 (see `struct pt_regs`).
>
> More details:
>
> Initially Dmitry noticed a kernel bug as a failure
> on strace test suite. Test passes unmapped userspace
> pointer to io_submit:
>
> ```c
> #include <err.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <asm/unistd.h>
> int main(void)
> {
> unsigned long ctx = 0;
> if (syscall(__NR_io_setup, 1, &ctx))
> err(1, "io_setup");
> const size_t page_size = sysconf(_SC_PAGESIZE);
> const size_t size = page_size * 2;
> void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> if (MAP_FAILED == ptr)
> err(1, "mmap(%zu)", size);
> if (munmap(ptr, size))
> err(1, "munmap");
> syscall(__NR_io_submit, ctx, 1, ptr + page_size);
> syscall(__NR_io_destroy, ctx);
> return 0;
> }
> ```
>
> Running this test causes kernel to crash when handling page fault:
>
> ```
> Unable to handle kernel paging request at virtual address ffffffffffff9468
> CPU 3
> aio(26027): Oops 0
> pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted
> pc is at sys_io_submit+0x108/0x200
> ra is at sys_io_submit+0x6c/0x200
> v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
> t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
> t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
> s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
> s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
> s6 = fffffc00c58e6300
> a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
> a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
> t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
> t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
> gp = 0000000000000000 sp = 00000000265fd174
> Disabling lock debugging due to kernel taint
> Trace:
> [<fffffc0000311404>] entSys+0xa4/0xc0
> ```
>
> Here `gp` has invalid value. `gp is s overwritten by a fixup for the
> following page fault handler in `io_submit` syscall handler:
>
> ```
> __se_sys_io_submit
> ...
> ldq a1,0(t1)
> bne t0,4280 <__se_sys_io_submit+0x180>
> ```
>
> After a page fault `t0` should contain -EFALUT and `a1` is 0.
> Instead `gp` was overwritten in place of `a1`.
>
> This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
> (aka `a0-a2`).
>
> I think the bug went unnoticed for a long time as `gp` is one
> of scratch registers. Any kernel function call would re-calculate `gp`.

Thanks, that's impressive!

According to the history git, the off-by-two bug was introduced in linux
2.1.32 when trap_a{0,1,2} fields were inserted into struct pt_regs on
alpha without an appropriate dpf_reg() update.

Before 2.1.32 (back to 2.1.7 when dpf_reg() was introduced)
there was another off-by-one bug in dpf_reg(): r16 was written
into struct pt_regs.r17.

In other words, the bug is quite old indeed.

You can add
Reported-and-reviewed-by: "Dmitry V. Levin" <[email protected]>
Cc: [email protected] # v2.1.32+

> CC: Dmitry V. Levin <[email protected]>

This is a technical address, please remove it.

> CC: Richard Henderson <[email protected]>
> CC: Ivan Kokshaysky <[email protected]>
> CC: Matt Turner <[email protected]>
> CC: [email protected]
> CC: [email protected]
> Reported-by: Dmitry V. Levin
> Bug: https://bugs.gentoo.org/672040
> Signed-off-by: Sergei Trofimovich <[email protected]>
> ---
> arch/alpha/mm/fault.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
> index d73dc473fbb9..188fc9256baf 100644
> --- a/arch/alpha/mm/fault.c
> +++ b/arch/alpha/mm/fault.c
> @@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm)
> /* Macro for exception fixup code to access integer registers. */
> #define dpf_reg(r) \
> (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \
> - (r) <= 18 ? (r)+8 : (r)-10])
> + (r) <= 18 ? (r)+10 : (r)-10])
>
> asmlinkage void
> do_page_fault(unsigned long address, unsigned long mmcsr,
> --
> 2.20.1

--
ldv


Attachments:
(No filename) (4.70 kB)
signature.asc (817.00 B)
Download all attachments

2018-12-31 11:56:44

by Sergei Trofimovich

[permalink] [raw]
Subject: [PATCH v2] alpha: fix page fault handling for r16-r18 targets

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include <err.h>
#include <unistd.h>
#include <sys/mman.h>
#include <asm/unistd.h>
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, &ctx))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address ffffffffffff9468
CPU 3
aio(26027): Oops 0
pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
s6 = fffffc00c58e6300
a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
gp = 0000000000000000 sp = 00000000265fd174
Disabling lock debugging due to kernel taint
Trace:
[<fffffc0000311404>] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reported-and-reviewed-by: "Dmitry V. Levin" <[email protected]>
Cc: [email protected] # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich <[email protected]>
---
Changes since V1:
- expanded bug origin tracked down by Dmitry
- added proper Dmitry's email and reviwed by tags

arch/alpha/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index d73dc473fbb9..188fc9256baf 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm)
/* Macro for exception fixup code to access integer registers. */
#define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \
- (r) <= 18 ? (r)+8 : (r)-10])
+ (r) <= 18 ? (r)+10 : (r)-10])

asmlinkage void
do_page_fault(unsigned long address, unsigned long mmcsr,
--
2.20.1


2019-01-22 07:38:26

by Matt Turner

[permalink] [raw]
Subject: Re: [PATCH v2] alpha: fix page fault handling for r16-r18 targets

On Mon, Dec 31, 2018 at 3:54 AM Sergei Trofimovich <[email protected]> wrote:
>
> Fix page fault handling code to fixup r16-r18 registers.
> Before the patch code had off-by-two registers bug.
> This bug caused overwriting of ps,pc,gp registers instead
> of fixing intended r16,r17,r18 (see `struct pt_regs`).
>
> More details:
>
> Initially Dmitry noticed a kernel bug as a failure
> on strace test suite. Test passes unmapped userspace
> pointer to io_submit:
>
> ```c
> #include <err.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <asm/unistd.h>
> int main(void)
> {
> unsigned long ctx = 0;
> if (syscall(__NR_io_setup, 1, &ctx))
> err(1, "io_setup");
> const size_t page_size = sysconf(_SC_PAGESIZE);
> const size_t size = page_size * 2;
> void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> if (MAP_FAILED == ptr)
> err(1, "mmap(%zu)", size);
> if (munmap(ptr, size))
> err(1, "munmap");
> syscall(__NR_io_submit, ctx, 1, ptr + page_size);
> syscall(__NR_io_destroy, ctx);
> return 0;
> }
> ```
>
> Running this test causes kernel to crash when handling page fault:
>
> ```
> Unable to handle kernel paging request at virtual address ffffffffffff9468
> CPU 3
> aio(26027): Oops 0
> pc = [<fffffc00004eddf8>] ra = [<fffffc00004edd5c>] ps = 0000 Not tainted
> pc is at sys_io_submit+0x108/0x200
> ra is at sys_io_submit+0x6c/0x200
> v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
> t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
> t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
> s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
> s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
> s6 = fffffc00c58e6300
> a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
> a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
> t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
> t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
> gp = 0000000000000000 sp = 00000000265fd174
> Disabling lock debugging due to kernel taint
> Trace:
> [<fffffc0000311404>] entSys+0xa4/0xc0
> ```
>
> Here `gp` has invalid value. `gp is s overwritten by a fixup for the
> following page fault handler in `io_submit` syscall handler:
>
> ```
> __se_sys_io_submit
> ...
> ldq a1,0(t1)
> bne t0,4280 <__se_sys_io_submit+0x180>
> ```
>
> After a page fault `t0` should contain -EFALUT and `a1` is 0.
> Instead `gp` was overwritten in place of `a1`.
>
> This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
> (aka `a0-a2`).
>
> I think the bug went unnoticed for a long time as `gp` is one
> of scratch registers. Any kernel function call would re-calculate `gp`.
>
> Dmitry tracked down the bug origin back to 2.1.32 kernel version
> where trap_a{0,1,2} fields were inserted into struct pt_regs.
> And even before that `dpf_reg()` contained off-by-one error.

Wow, nice work.

I've vacuumed the patch up and will include it in my next pull req.