2017-03-29 21:09:20

by Mike Kravetz

[permalink] [raw]
Subject: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

Resending because of typo in Andrew's e-mail when first sent

Changes to hugetlbfs reservation maps is a two step process. The first
step is a call to region_chg to determine what needs to be changed, and
prepare that change. This should be followed by a call to call to
region_add to commit the change, or region_abort to abort the change.

The error path in hugetlb_reserve_pages called region_abort after a
failed call to region_chg. As a result, the adds_in_progress counter
in the reservation map is off by 1. This is caught by a VM_BUG_ON
in resv_map_release when the reservation map is freed.

syzkaller fuzzer found this bug, that resulted in the following:

kernel BUG at mm/hugetlb.c:742!
Call Trace:
hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
evict+0x481/0x920 fs/inode.c:553
iput_final fs/inode.c:1515 [inline]
iput+0x62b/0xa20 fs/inode.c:1542
hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
newseg+0x422/0xd30 ipc/shm.c:575
ipcget_new ipc/util.c:285 [inline]
ipcget+0x21e/0x580 ipc/util.c:639
SYSC_shmget ipc/shm.c:673 [inline]
SyS_shmget+0x158/0x230 ipc/shm.c:657
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c7025c1..c65d45c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
return 0;
out_err:
if (!vma || vma->vm_flags & VM_MAYSHARE)
- region_abort(resv_map, from, to);
+ /* Don't call region_abort if region_chg failed */
+ if (chg >= 0)
+ region_abort(resv_map, from, to);
if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
kref_put(&resv_map->refs, resv_map_release);
return ret;
--
2.7.4


2017-03-29 21:17:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <[email protected]> wrote:

> Resending because of typo in Andrew's e-mail when first sent
>
> Changes to hugetlbfs reservation maps is a two step process. The first
> step is a call to region_chg to determine what needs to be changed, and
> prepare that change. This should be followed by a call to call to
> region_add to commit the change, or region_abort to abort the change.
>
> The error path in hugetlb_reserve_pages called region_abort after a
> failed call to region_chg. As a result, the adds_in_progress counter
> in the reservation map is off by 1. This is caught by a VM_BUG_ON
> in resv_map_release when the reservation map is freed.
>
> syzkaller fuzzer found this bug, that resulted in the following:

I'll change the above to

: syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
: that resulted in the following:

it's important, because this bug won't be triggered (at all easily, at
least) in real-world workloads.

> kernel BUG at mm/hugetlb.c:742!
> Call Trace:
> hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
> evict+0x481/0x920 fs/inode.c:553
> iput_final fs/inode.c:1515 [inline]
> iput+0x62b/0xa20 fs/inode.c:1542
> hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
> newseg+0x422/0xd30 ipc/shm.c:575
> ipcget_new ipc/util.c:285 [inline]
> ipcget+0x21e/0x580 ipc/util.c:639
> SYSC_shmget ipc/shm.c:673 [inline]
> SyS_shmget+0x158/0x230 ipc/shm.c:657
> entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

2017-03-30 12:29:22

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton
<[email protected]> wrote:
> On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <[email protected]> wrote:
>
>> Resending because of typo in Andrew's e-mail when first sent
>>
>> Changes to hugetlbfs reservation maps is a two step process. The first
>> step is a call to region_chg to determine what needs to be changed, and
>> prepare that change. This should be followed by a call to call to
>> region_add to commit the change, or region_abort to abort the change.
>>
>> The error path in hugetlb_reserve_pages called region_abort after a
>> failed call to region_chg. As a result, the adds_in_progress counter
>> in the reservation map is off by 1. This is caught by a VM_BUG_ON
>> in resv_map_release when the reservation map is freed.
>>
>> syzkaller fuzzer found this bug, that resulted in the following:
>
> I'll change the above to
>
> : syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
> : that resulted in the following:
>
> it's important, because this bug won't be triggered (at all easily, at
> least) in real-world workloads.

I wonder if memory-constrained cgroups make such bugs much easier to trigger.

2017-03-30 20:20:28

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

On 03/30/2017 05:28 AM, Dmitry Vyukov wrote:
> On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton
> <[email protected]> wrote:
>> On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <[email protected]> wrote:
>>
>>>
>>> syzkaller fuzzer found this bug, that resulted in the following:
>>
>> I'll change the above to
>>
>> : syzkaller fuzzer (when using an injected kmalloc failure) found this bug,
>> : that resulted in the following:
>>
>> it's important, because this bug won't be triggered (at all easily, at
>> least) in real-world workloads.
>
> I wonder if memory-constrained cgroups make such bugs much easier to trigger.
>

I think you might expose some bugs with memory-constrained cgroups. However,
it is unlikely you could trigger this bug using that method.

In this bug the injected kmalloc failure was for a 32 byte allocation. My
guess is that it would be very very unlikely/lucky to have the allocations
done by other routines on the stack succeed, and have this 32 byte allocation
fail.

--
Mike Kravetz

2017-04-10 21:38:17

by Vegard Nossum

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

On 29 March 2017 at 23:08, Mike Kravetz <[email protected]> wrote:
> Changes to hugetlbfs reservation maps is a two step process. The first
> step is a call to region_chg to determine what needs to be changed, and
> prepare that change. This should be followed by a call to call to
> region_add to commit the change, or region_abort to abort the change.
>
> The error path in hugetlb_reserve_pages called region_abort after a
> failed call to region_chg. As a result, the adds_in_progress counter
> in the reservation map is off by 1. This is caught by a VM_BUG_ON
> in resv_map_release when the reservation map is freed.
>
> syzkaller fuzzer found this bug, that resulted in the following:
>
> kernel BUG at mm/hugetlb.c:742!
> Call Trace:
> hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
> evict+0x481/0x920 fs/inode.c:553
> iput_final fs/inode.c:1515 [inline]
> iput+0x62b/0xa20 fs/inode.c:1542
> hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
> newseg+0x422/0xd30 ipc/shm.c:575
> ipcget_new ipc/util.c:285 [inline]
> ipcget+0x21e/0x580 ipc/util.c:639
> SYSC_shmget ipc/shm.c:673 [inline]
> SyS_shmget+0x158/0x230 ipc/shm.c:657
> entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
>
> Reported-by: Dmitry Vyukov <[email protected]>
> Signed-off-by: Mike Kravetz <[email protected]>
> Acked-by: Hillf Danton <[email protected]>
> ---
> mm/hugetlb.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c7025c1..c65d45c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
> return 0;
> out_err:
> if (!vma || vma->vm_flags & VM_MAYSHARE)
> - region_abort(resv_map, from, to);
> + /* Don't call region_abort if region_chg failed */
> + if (chg >= 0)
> + region_abort(resv_map, from, to);
> if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> kref_put(&resv_map->refs, resv_map_release);
> return ret;

Hi guys,

I'm running into this on latest linus/master:

kernel BUG at mm/hugetlb.c:742!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff880064f30dc0 task.stack: ffff880065b38000
RIP: 0010:resv_map_release+0x1cb/0x270
RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287
RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000
RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410
RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418
FS: 00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0
Call Trace:
hugetlbfs_evict_inode+0x80/0xa0
? hugetlbfs_setattr+0x3c0/0x3c0
evict+0x24a/0x620
iput+0x48f/0x8c0
dentry_unlink_inode+0x31f/0x4d0
__dentry_kill+0x292/0x5e0
dput+0x730/0x830
__fput+0x438/0x720
____fput+0x1a/0x20
task_work_run+0xfe/0x180
exit_to_usermode_loop+0x133/0x150
syscall_return_slowpath+0x184/0x1c0
entry_SYSCALL_64_fastpath+0xab/0xad

To reproduce:

mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Curiously enough, it's the patch from this thread (i.e. commit
ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it,
according to git bisect. Reverting the commit from linus/master fixes
the problem.

Also found by syzcaller (no fault injections this time).


Vegard

2017-04-10 23:38:50

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails

On 04/10/2017 02:38 PM, Vegard Nossum wrote:
> On 29 March 2017 at 23:08, Mike Kravetz <[email protected]> wrote:
>> Changes to hugetlbfs reservation maps is a two step process. The first
>> step is a call to region_chg to determine what needs to be changed, and
>> prepare that change. This should be followed by a call to call to
>> region_add to commit the change, or region_abort to abort the change.
>>
>> The error path in hugetlb_reserve_pages called region_abort after a
>> failed call to region_chg. As a result, the adds_in_progress counter
>> in the reservation map is off by 1. This is caught by a VM_BUG_ON
>> in resv_map_release when the reservation map is freed.
>>
>> syzkaller fuzzer found this bug, that resulted in the following:
>>
>> kernel BUG at mm/hugetlb.c:742!
>> Call Trace:
>> hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
>> evict+0x481/0x920 fs/inode.c:553
>> iput_final fs/inode.c:1515 [inline]
>> iput+0x62b/0xa20 fs/inode.c:1542
>> hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
>> newseg+0x422/0xd30 ipc/shm.c:575
>> ipcget_new ipc/util.c:285 [inline]
>> ipcget+0x21e/0x580 ipc/util.c:639
>> SYSC_shmget ipc/shm.c:673 [inline]
>> SyS_shmget+0x158/0x230 ipc/shm.c:657
>> entry_SYSCALL_64_fastpath+0x1f/0xc2
>> RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
>>
>> Reported-by: Dmitry Vyukov <[email protected]>
>> Signed-off-by: Mike Kravetz <[email protected]>
>> Acked-by: Hillf Danton <[email protected]>
>> ---
>> mm/hugetlb.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index c7025c1..c65d45c 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode,
>> return 0;
>> out_err:
>> if (!vma || vma->vm_flags & VM_MAYSHARE)
>> - region_abort(resv_map, from, to);
>> + /* Don't call region_abort if region_chg failed */
>> + if (chg >= 0)
>> + region_abort(resv_map, from, to);
>> if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>> kref_put(&resv_map->refs, resv_map_release);
>> return ret;
>
> Hi guys,
>
> I'm running into this on latest linus/master:
>
> kernel BUG at mm/hugetlb.c:742!
> invalid opcode: 0000 [#1] SMP KASAN
> CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> task: ffff880064f30dc0 task.stack: ffff880065b38000
> RIP: 0010:resv_map_release+0x1cb/0x270
> RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287
> RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000
> RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410
> RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418
> FS: 00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0
> Call Trace:
> hugetlbfs_evict_inode+0x80/0xa0
> ? hugetlbfs_setattr+0x3c0/0x3c0
> evict+0x24a/0x620
> iput+0x48f/0x8c0
> dentry_unlink_inode+0x31f/0x4d0
> __dentry_kill+0x292/0x5e0
> dput+0x730/0x830
> __fput+0x438/0x720
> ____fput+0x1a/0x20
> task_work_run+0xfe/0x180
> exit_to_usermode_loop+0x133/0x150
> syscall_return_slowpath+0x184/0x1c0
> entry_SYSCALL_64_fastpath+0xab/0xad
>
> To reproduce:
>
> mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL);
>
> Curiously enough, it's the patch from this thread (i.e. commit
> ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it,
> according to git bisect. Reverting the commit from linus/master fixes
> the problem.

Thanks for finding this.

I do not think commit ff8c0c53 is the root cause of this BUG/issue.

Due to the very high offset (0x8000000000000000ULL) passed to mmap, there
is some overflow and/or truncation of values happening before getting to
the hugetlbfs reservation code. The routine hugetlb_reserve_pages() is
passed a negative page offset value (from=4398046511104, to=-4398046511103).
Bad!!! The routine region_chg() takes these values to determine how many
reservations are needed and calculates/returns a negative value. This
appears as an error. So, the code from commit ff8c0c53 prevents the call
to region_abort(), adds_in_progress does not get decremented and we hit the
BUG.

We should have never calculated and acted upon negative page offsets. It
was just 'lucky' that things appeared to work before this commit. I have
not yet determined all the things that could have gone wrong when passing
around these incorrect values.

I believe commit ff8c0c53 should remain. I will start working on a fix
to this overflow and/or truncation of page offsets.

--
Mike Kravetz

>
> Also found by syzcaller (no fault injections this time).
>
>
> Vegard
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>