2018-02-01 01:32:33

by Eric Biggers

[permalink] [raw]
Subject: [PATCH] KVM/x86: remove WARN_ON() for when vm_munmap() fails

From: Eric Biggers <[email protected]>

On x86, special KVM memslots such as the TSS region have anonymous
memory mappings created on behalf of userspace, and these mappings are
removed when the VM is destroyed.

It is however possible for removing these mappings via vm_munmap() to
fail. This can most easily happen if the thread receives SIGKILL while
it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
in __x86_set_memory_region(). syzkaller was able to hit this, using
'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
results in the mapping not being removed immediately, it is not leaked
forever but rather will be freed when the process exits.

It's not really possible to handle this failure properly, so almost
every other caller of vm_munmap() doesn't check the return value. It's
a limitation of having the kernel manage these mappings rather than
userspace.

So just remove the WARN_ON() so that users can't spam the kernel log
with this warning.

Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298dfbf50..53b57f18baec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8272,10 +8272,8 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
return r;
}

- if (!size) {
- r = vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
- WARN_ON(r < 0);
- }
+ if (!size)
+ vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);

return 0;
}
--
2.16.0.rc1.238.g530d649a79-goog



2018-02-01 15:34:07

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] KVM/x86: remove WARN_ON() for when vm_munmap() fails

2018-01-31 17:30-0800, Eric Biggers:
> From: Eric Biggers <[email protected]>
>
> On x86, special KVM memslots such as the TSS region have anonymous
> memory mappings created on behalf of userspace, and these mappings are
> removed when the VM is destroyed.
>
> It is however possible for removing these mappings via vm_munmap() to
> fail. This can most easily happen if the thread receives SIGKILL while
> it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
> in __x86_set_memory_region(). syzkaller was able to hit this, using
> 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
> results in the mapping not being removed immediately, it is not leaked
> forever but rather will be freed when the process exits.
>
> It's not really possible to handle this failure properly, so almost

We could check "r < 0 && r != -EINTR" to get rid of the easily
triggerable warning.

> every other caller of vm_munmap() doesn't check the return value. It's
> a limitation of having the kernel manage these mappings rather than
> userspace.
>
> So just remove the WARN_ON() so that users can't spam the kernel log
> with this warning.
>
> Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Eric Biggers <[email protected]>
> ---

Removing it altogether doesn't sound that bad, though ...
Queued, thanks.

> arch/x86/kvm/x86.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c53298dfbf50..53b57f18baec 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8272,10 +8272,8 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> return r;
> }
>
> - if (!size) {
> - r = vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
> - WARN_ON(r < 0);
> - }
> + if (!size)
> + vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
>
> return 0;
> }
> --
> 2.16.0.rc1.238.g530d649a79-goog
>

2018-02-01 17:13:18

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM/x86: remove WARN_ON() for when vm_munmap() fails

On 01/02/2018 10:33, Radim Krčmář wrote:
> 2018-01-31 17:30-0800, Eric Biggers:
>> From: Eric Biggers <[email protected]>
>>
>> On x86, special KVM memslots such as the TSS region have anonymous
>> memory mappings created on behalf of userspace, and these mappings are
>> removed when the VM is destroyed.
>>
>> It is however possible for removing these mappings via vm_munmap() to
>> fail. This can most easily happen if the thread receives SIGKILL while
>> it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
>> in __x86_set_memory_region(). syzkaller was able to hit this, using
>> 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
>> results in the mapping not being removed immediately, it is not leaked
>> forever but rather will be freed when the process exits.
>>
>> It's not really possible to handle this failure properly, so almost
>
> We could check "r < 0 && r != -EINTR" to get rid of the easily
> triggerable warning.

Considering that vm_munmap uses down_write_killable, that would be
preferrable I think.

Paolo

>> every other caller of vm_munmap() doesn't check the return value. It's
>> a limitation of having the kernel manage these mappings rather than
>> userspace.
>>
>> So just remove the WARN_ON() so that users can't spam the kernel log
>> with this warning.
>>
>> Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
>> Reported-by: syzbot <[email protected]>
>> Signed-off-by: Eric Biggers <[email protected]>
>> ---
>
> Removing it altogether doesn't sound that bad, though ...
> Queued, thanks.
>
>> arch/x86/kvm/x86.c | 6 ++----
>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c53298dfbf50..53b57f18baec 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8272,10 +8272,8 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>> return r;
>> }
>>
>> - if (!size) {
>> - r = vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
>> - WARN_ON(r < 0);
>> - }
>> + if (!size)
>> + vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
>>
>> return 0;
>> }
>> --
>> 2.16.0.rc1.238.g530d649a79-goog
>>


2018-02-01 20:08:50

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] KVM/x86: remove WARN_ON() for when vm_munmap() fails

On Thu, Feb 01, 2018 at 12:12:00PM -0500, Paolo Bonzini wrote:
> On 01/02/2018 10:33, Radim Krčmář wrote:
> > 2018-01-31 17:30-0800, Eric Biggers:
> >> From: Eric Biggers <[email protected]>
> >>
> >> On x86, special KVM memslots such as the TSS region have anonymous
> >> memory mappings created on behalf of userspace, and these mappings are
> >> removed when the VM is destroyed.
> >>
> >> It is however possible for removing these mappings via vm_munmap() to
> >> fail. This can most easily happen if the thread receives SIGKILL while
> >> it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
> >> in __x86_set_memory_region(). syzkaller was able to hit this, using
> >> 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
> >> results in the mapping not being removed immediately, it is not leaked
> >> forever but rather will be freed when the process exits.
> >>
> >> It's not really possible to handle this failure properly, so almost
> >
> > We could check "r < 0 && r != -EINTR" to get rid of the easily
> > triggerable warning.
>
> Considering that vm_munmap uses down_write_killable, that would be
> preferrable I think.
>

Don't be so sure that vm_munmap() can't fail for other reasons as well :-)
Remember, userspace can mess around with its address space.

And indeed, looking closer, I see there was a previous report of this same WARN
on an older kernel which in vm_munmap() still had down_write() instead of
down_write_killable(). The reproducer in that case concurrently called
personality(ADDR_LIMIT_3GB) to reduce its address limit after the mapping was
already created above 3 GiB. Then the vm_munmap() returned EINVAL since
'start > TASK_SIZE'.

So I don't think we should check for specific error codes. We could make it a
pr_warn_ratelimited() though, if we still want some notification that there was
a problem without implying it is a kernel bug as WARN_ON() does.

- Eric