2020-08-28 03:14:56

by Muchun Song

[permalink] [raw]
Subject: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

CPU0: CPU1:
proc_sys_write
hugetlb_sysctl_handler proc_sys_call_handler
hugetlb_sysctl_handler_common hugetlb_sysctl_handler
table->data = &tmp; hugetlb_sysctl_handler_common
table->data = &tmp;
proc_doulongvec_minmax
do_proc_doulongvec_minmax sysctl_head_finish
__do_proc_doulongvec_minmax unuse_table
i = table->data;
*i = val; // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Code: Bad RIP value.
...
Call Trace:
? set_max_huge_pages+0x3da/0x4f0
? alloc_pool_huge_page+0x150/0x150
? proc_doulongvec_minmax+0x46/0x60
? hugetlb_sysctl_handler_common+0x1c7/0x200
? nr_hugepages_store+0x20/0x20
? copy_fd_bitmaps+0x170/0x170
? hugetlb_sysctl_handler+0x1e/0x20
? proc_sys_call_handler+0x2f1/0x300
? unregister_sysctl_table+0xb0/0xb0
? __fd_install+0x78/0x100
? proc_sys_write+0x14/0x20
? __vfs_write+0x4d/0x90
? vfs_write+0xef/0x240
? ksys_write+0xc0/0x160
? __ia32_sys_read+0x50/0x50
? __close_fd+0x129/0x150
? __x64_sys_write+0x43/0x50
? do_syscall_64+0x6c/0x200
? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <[email protected]>
---
chagelogs in v2:
1. Add more details about how the race happened to the commit message.
2. Remove unnecessary assignment of table->maxlen.

mm/hugetlb.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a301c2d672bf..4c2a2620eeed 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3454,6 +3454,22 @@ static unsigned int allowed_mems_nr(struct hstate *h)
}

#ifdef CONFIG_SYSCTL
+static int proc_hugetlb_doulongvec_minmax(struct ctl_table *table, int write,
+ void *buffer, size_t *length,
+ loff_t *ppos, unsigned long *out)
+{
+ struct ctl_table dup_table;
+
+ /*
+ * In order to avoid races with __do_proc_doulongvec_minmax(), we
+ * can duplicate the @table and alter the duplicate of it.
+ */
+ dup_table = *table;
+ dup_table.data = out;
+
+ return proc_doulongvec_minmax(&dup_table, write, buffer, length, ppos);
+}
+
static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
struct ctl_table *table, int write,
void *buffer, size_t *length, loff_t *ppos)
@@ -3465,9 +3481,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
if (!hugepages_supported())
return -EOPNOTSUPP;

- table->data = &tmp;
- table->maxlen = sizeof(unsigned long);
- ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+ ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+ &tmp);
if (ret)
goto out;

@@ -3510,9 +3525,8 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
if (write && hstate_is_gigantic(h))
return -EINVAL;

- table->data = &tmp;
- table->maxlen = sizeof(unsigned long);
- ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+ ret = proc_hugetlb_doulongvec_minmax(table, write, buffer, length, ppos,
+ &tmp);
if (ret)
goto out;

--
2.11.0


2020-08-28 14:47:46

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

On 8/27/20 8:11 PM, Muchun Song wrote:
> There is a race between the assignment of `table->data` and write value
> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> the other thread.
>
> CPU0: CPU1:
> proc_sys_write
> hugetlb_sysctl_handler proc_sys_call_handler
> hugetlb_sysctl_handler_common hugetlb_sysctl_handler
> table->data = &tmp; hugetlb_sysctl_handler_common
> table->data = &tmp;
> proc_doulongvec_minmax
> do_proc_doulongvec_minmax sysctl_head_finish
> __do_proc_doulongvec_minmax unuse_table
> i = table->data;
> *i = val; // corrupt CPU1's stack
>
> Fix this by duplicating the `table`, and only update the duplicate of
> it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
> simplify the code.
>
> The following oops was seen:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor instruction fetch in kernel mode
> #PF: error_code(0x0010) - not-present page
> Code: Bad RIP value.
> ...
> Call Trace:
> ? set_max_huge_pages+0x3da/0x4f0
> ? alloc_pool_huge_page+0x150/0x150
> ? proc_doulongvec_minmax+0x46/0x60
> ? hugetlb_sysctl_handler_common+0x1c7/0x200
> ? nr_hugepages_store+0x20/0x20
> ? copy_fd_bitmaps+0x170/0x170
> ? hugetlb_sysctl_handler+0x1e/0x20
> ? proc_sys_call_handler+0x2f1/0x300
> ? unregister_sysctl_table+0xb0/0xb0
> ? __fd_install+0x78/0x100
> ? proc_sys_write+0x14/0x20
> ? __vfs_write+0x4d/0x90
> ? vfs_write+0xef/0x240
> ? ksys_write+0xc0/0x160
> ? __ia32_sys_read+0x50/0x50
> ? __close_fd+0x129/0x150
> ? __x64_sys_write+0x43/0x50
> ? do_syscall_64+0x6c/0x200
> ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
> Signed-off-by: Muchun Song <[email protected]>

Thank you!

Reviewed-by: Mike Kravetz <[email protected]>

--
Mike Kravetz

2020-08-28 16:55:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")

I believe the Fixes line is still not correct. The original patch
didn't have that problem. Please identify which patch added
the problematic code.

-Andi

2020-08-28 17:17:42

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

On 8/28/20 7:59 AM, Andi Kleen wrote:
>> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
>
> I believe the Fixes line is still not correct. The original patch
> didn't have that problem. Please identify which patch added
> the problematic code.

Hi Andi,

I agree with Muchun's assessment that the issue was caused by e5ff215941d5.
Why?

Commit e5ff215941d5 changed initialization of the .data field in the
ctl_table structure for hugetlb_sysctl_handler. This was required because
the commit introduced multiple hstates. As a result, it can not be known
until runtime the value for .data. This code was added to
hugetlb_sysctl_handler to set the value at runtime.

@@ -1037,8 +1109,19 @@ int hugetlb_sysctl_handler(struct ctl_table *table, int write,
struct file *file, void __user *buffer,
size_t *length, loff_t *ppos)
{
+ struct hstate *h = &default_hstate;
+ unsigned long tmp;
+
+ if (!write)
+ tmp = h->max_huge_pages;
+
+ table->data = &tmp;
+ table->maxlen = sizeof(unsigned long);

The problem is that two threads running in parallel can modify table->data
in the global data structure without any synchronization. Worse yet, is
that that value is local to their stacks. That was the root cause of the
issue addressed by Muchun's patch.

Does that analysis make sense? Or, are we missing something.
--
Mike Kravetz

2020-08-28 20:56:02

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

On Fri, Aug 28, 2020 at 10:14:16AM -0700, Mike Kravetz wrote:
> On 8/28/20 7:59 AM, Andi Kleen wrote:
> >> Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
> >
> > I believe the Fixes line is still not correct. The original patch
> > didn't have that problem. Please identify which patch added
> > the problematic code.
>
> Hi Andi,
>
> I agree with Muchun's assessment that the issue was caused by e5ff215941d5.
> Why?

Yes when checking the code again I agree. It was introduced with that
patch. Patches look ok to me now.

-Andi