2019-01-03 21:13:57

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 0/3] mm/vmalloc: fix size check and few cleanups

The first patch in series fixes size check for remap_vmalloc_range_partial()
which can lead to kernel crash by unfaithful userspace mapping. Others two
are minor cleanups.

Roman Penyaev (3):
mm/vmalloc: fix size check for remap_vmalloc_range_partial()
mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()

mm/vmalloc.c | 48 ++++++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 28 deletions(-)

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: [email protected]
Cc: [email protected]
--
2.19.1



2019-01-03 21:13:40

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()

vmalloc_user*() calls differ from normal vmalloc() only in that they
set VM_USERMAP flags for the area. During the whole history of
vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
directly to __vmalloc_node_range() call instead of finding the area
(which obviously takes time) after the allocation.

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
mm/vmalloc.c | 30 ++++++++----------------------
1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dc6a62bca503..83fa4c642f5e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1865,18 +1865,10 @@ EXPORT_SYMBOL(vzalloc);
*/
void *vmalloc_user(unsigned long size)
{
- struct vm_struct *area;
- void *ret;
-
- ret = __vmalloc_node(size, SHMLBA,
- GFP_KERNEL | __GFP_ZERO,
- PAGE_KERNEL, NUMA_NO_NODE,
- __builtin_return_address(0));
- if (ret) {
- area = find_vm_area(ret);
- area->flags |= VM_USERMAP;
- }
- return ret;
+ return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
+ VM_USERMAP, NUMA_NO_NODE,
+ __builtin_return_address(0));
}
EXPORT_SYMBOL(vmalloc_user);

@@ -1970,16 +1962,10 @@ EXPORT_SYMBOL(vmalloc_32);
*/
void *vmalloc_32_user(unsigned long size)
{
- struct vm_struct *area;
- void *ret;
-
- ret = __vmalloc_node(size, 1, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
- NUMA_NO_NODE, __builtin_return_address(0));
- if (ret) {
- area = find_vm_area(ret);
- area->flags |= VM_USERMAP;
- }
- return ret;
+ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
+ VM_USERMAP, NUMA_NO_NODE,
+ __builtin_return_address(0));
}
EXPORT_SYMBOL(vmalloc_32_user);

--
2.19.1


2019-01-03 21:14:00

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

area->size can include adjacent guard page but get_vm_area_size()
returns actual size of the area.

This fixes possible kernel crash when userspace tries to map area
on 1 page bigger: size check passes but the following vmalloc_to_page()
returns NULL on last guard (non-existing) page.

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 871e41c55e23..2cd24186ba84 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
if (!(area->flags & VM_USERMAP))
return -EINVAL;

- if (kaddr + size > area->addr + area->size)
+ if (kaddr + size > area->addr + get_vm_area_size(area))
return -EINVAL;

do {
--
2.19.1


2019-01-03 21:31:57

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory

__vmalloc_area_node() calls vfree() on error path, which in turn calls
kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
mm/vmalloc.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2cd24186ba84..dc6a62bca503 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
__vfree_deferred(addr);
}

+static void __vfree(const void *addr)
+{
+ if (unlikely(in_interrupt()))
+ __vfree_deferred(addr);
+ else
+ __vunmap(addr, 1);
+}
+
/**
* vfree - release memory allocated by vmalloc()
* @addr: memory base address
@@ -1591,10 +1599,8 @@ void vfree(const void *addr)

if (!addr)
return;
- if (unlikely(in_interrupt()))
- __vfree_deferred(addr);
- else
- __vunmap(addr, 1);
+
+ __vfree(addr);
}
EXPORT_SYMBOL(vfree);

@@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
warn_alloc(gfp_mask, NULL,
"vmalloc: allocation failure, allocated %ld of %ld bytes",
(area->nr_pages*PAGE_SIZE), area->size);
- vfree(area->addr);
+ __vfree(area->addr);
return NULL;
}

--
2.19.1


2019-01-03 21:46:52

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
>
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.

Can this actually happen? I am not really familiar with all the callers
of this API but VM_NO_GUARD is not really used wildly in the kernel.
All I can see is kasan na arm64 which doesn't really seem to use it
for vmalloc.

So is the problem real or this is a mere cleanup?

> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> if (!(area->flags & VM_USERMAP))
> return -EINVAL;
>
> - if (kaddr + size > area->addr + area->size)
> + if (kaddr + size > area->addr + get_vm_area_size(area))
> return -EINVAL;
>
> do {
> --
> 2.19.1

--
Michal Hocko
SUSE Labs

2019-01-03 21:46:55

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()

On Thu 03-01-19 15:59:54, Roman Penyaev wrote:
> vmalloc_user*() calls differ from normal vmalloc() only in that they
> set VM_USERMAP flags for the area. During the whole history of
> vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
> directly to __vmalloc_node_range() call instead of finding the area
> (which obviously takes time) after the allocation.

Yes, this looks correct and a nice cleanup

> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]

Acked-by: Michal Hocko <[email protected]>
> ---
> mm/vmalloc.c | 30 ++++++++----------------------
> 1 file changed, 8 insertions(+), 22 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dc6a62bca503..83fa4c642f5e 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1865,18 +1865,10 @@ EXPORT_SYMBOL(vzalloc);
> */
> void *vmalloc_user(unsigned long size)
> {
> - struct vm_struct *area;
> - void *ret;
> -
> - ret = __vmalloc_node(size, SHMLBA,
> - GFP_KERNEL | __GFP_ZERO,
> - PAGE_KERNEL, NUMA_NO_NODE,
> - __builtin_return_address(0));
> - if (ret) {
> - area = find_vm_area(ret);
> - area->flags |= VM_USERMAP;
> - }
> - return ret;
> + return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
> + VM_USERMAP, NUMA_NO_NODE,
> + __builtin_return_address(0));
> }
> EXPORT_SYMBOL(vmalloc_user);
>
> @@ -1970,16 +1962,10 @@ EXPORT_SYMBOL(vmalloc_32);
> */
> void *vmalloc_32_user(unsigned long size)
> {
> - struct vm_struct *area;
> - void *ret;
> -
> - ret = __vmalloc_node(size, 1, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
> - NUMA_NO_NODE, __builtin_return_address(0));
> - if (ret) {
> - area = find_vm_area(ret);
> - area->flags |= VM_USERMAP;
> - }
> - return ret;
> + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> + GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
> + VM_USERMAP, NUMA_NO_NODE,
> + __builtin_return_address(0));
> }
> EXPORT_SYMBOL(vmalloc_32_user);
>
> --
> 2.19.1

--
Michal Hocko
SUSE Labs

2019-01-04 00:03:29

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On 2019-01-03 16:13, Michal Hocko wrote:
> On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> area->size can include adjacent guard page but get_vm_area_size()
>> returns actual size of the area.
>>
>> This fixes possible kernel crash when userspace tries to map area
>> on 1 page bigger: size check passes but the following
>> vmalloc_to_page()
>> returns NULL on last guard (non-existing) page.
>
> Can this actually happen? I am not really familiar with all the callers
> of this API but VM_NO_GUARD is not really used wildly in the kernel.

Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
thus the area->size will be bigger. The bug is not reproduced if
VM_NO_GUARD is set.

> All I can see is kasan na arm64 which doesn't really seem to use it
> for vmalloc.
>
> So is the problem real or this is a mere cleanup?

This is the real problem, try this hunk for any file descriptor which
provides
mapping, or say modify epoll as example:

--------------------------------
diff --git a/fs/eventpoll.c b/fs/eventpoll.c

+static int ep_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ void *mem;
+
+ mem = vmalloc_user(4096);
+ BUG_ON(!mem);
+ /* Do not care about mem leak */
+
+ return remap_vmalloc_range(vma, mem, 0);
+}
+
/* File callbacks that implement the eventpoll file behaviour */
static const struct file_operations eventpoll_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = ep_show_fdinfo,
#endif
+ .mmap = ep_mmap,
.release = ep_eventpoll_release,
--------------------------------

and the following code from userspace, which maps 2 pages,
instead of 1:

--------------------------------
epfd = epoll_create1(0);
assert(epfd >= 0);

p = mmap(NULL, 2<<12, PROT_WRITE|PROT_READ, MAP_PRIVATE, epfd, 0);
assert(p != MAP_FAILED);
--------------------------------

You immediately get the following oops:

[ 38.894571] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 38.899048] #PF error: [normal kernel read fault]
[ 38.901487] PGD 0 P4D 0
[ 38.902801] Oops: 0000 [#1] PREEMPT SMP PTI
[ 38.904984] CPU: 2 PID: 399 Comm: mmap-epoll Not tainted 4.20.0-1
#238
[ 38.914064] RIP: 0010:vm_insert_page+0x3b/0x1d0
[ 38.941181] Call Trace:
[ 38.941656] remap_vmalloc_range_partial+0x8d/0xd0
[ 38.942417] mmap_region+0x3c7/0x630
[ 38.942982] do_mmap+0x38d/0x560
[ 38.943479] vm_mmap_pgoff+0x9a/0xf0
[ 38.944028] ksys_mmap_pgoff+0x18e/0x220
[ 38.944554] do_syscall_64+0x48/0xf0
[ 38.945076] entry_SYSCALL_64_after_hwframe+0x44/0xa9

--
Roman


2019-01-04 00:24:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
>
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
>
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

Forgot to add
Acked-by: Michal Hocko <[email protected]>
Although I am not really sure the stable backport is really needed as I
haven't seen any explicit example of a buggy kernel code to trigger the
issue.

> ---
> mm/vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> if (!(area->flags & VM_USERMAP))
> return -EINVAL;
>
> - if (kaddr + size > area->addr + area->size)
> + if (kaddr + size > area->addr + get_vm_area_size(area))
> return -EINVAL;
>
> do {
> --
> 2.19.1

--
Michal Hocko
SUSE Labs

2019-01-04 00:24:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> On 2019-01-03 16:13, Michal Hocko wrote:
> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > area->size can include adjacent guard page but get_vm_area_size()
> > > returns actual size of the area.
> > >
> > > This fixes possible kernel crash when userspace tries to map area
> > > on 1 page bigger: size check passes but the following
> > > vmalloc_to_page()
> > > returns NULL on last guard (non-existing) page.
> >
> > Can this actually happen? I am not really familiar with all the callers
> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
>
> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> thus the area->size will be bigger. The bug is not reproduced if
> VM_NO_GUARD is set.
>
> > All I can see is kasan na arm64 which doesn't really seem to use it
> > for vmalloc.
> >
> > So is the problem real or this is a mere cleanup?
>
> This is the real problem, try this hunk for any file descriptor which
> provides
> mapping, or say modify epoll as example:

OK, my response was more confusing than I intended. I meant to say. Is
there any in kernel code that would allow the bug have had in mind?
In other words can userspace trick any existing code?

--
Michal Hocko
SUSE Labs

2019-01-04 02:38:12

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On 2019-01-03 20:40, Michal Hocko wrote:
> On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
>> On 2019-01-03 16:13, Michal Hocko wrote:
>> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> > > area->size can include adjacent guard page but get_vm_area_size()
>> > > returns actual size of the area.
>> > >
>> > > This fixes possible kernel crash when userspace tries to map area
>> > > on 1 page bigger: size check passes but the following
>> > > vmalloc_to_page()
>> > > returns NULL on last guard (non-existing) page.
>> >
>> > Can this actually happen? I am not really familiar with all the callers
>> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
>>
>> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
>> thus the area->size will be bigger. The bug is not reproduced if
>> VM_NO_GUARD is set.
>>
>> > All I can see is kasan na arm64 which doesn't really seem to use it
>> > for vmalloc.
>> >
>> > So is the problem real or this is a mere cleanup?
>>
>> This is the real problem, try this hunk for any file descriptor which
>> provides
>> mapping, or say modify epoll as example:
>
> OK, my response was more confusing than I intended. I meant to say. Is
> there any in kernel code that would allow the bug have had in mind?
> In other words can userspace trick any existing code?

In theory any existing caller of remap_vmalloc_range() which does
not have an explicit size check should trigger an oops, e.g. this is
a good candidate:

*** drivers/media/usb/stkwebcam/stk-webcam.c:
v4l_stk_mmap[789] ret = remap_vmalloc_range(vma,
sbuf->buffer, 0);

According to the code no explicit size check, should be easy to
reproduce:
mmap the frame buffer and you are done.

Other callers are not so easy to follow. But wait, here is another
example:

(drivers/video/fbdev/core/fbmem.c)
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
...
res = fb->fb_mmap(info, vma);

(drivers/video/fbdev/vfb.c)
static int vfb_mmap(struct fb_info *info,
struct vm_area_struct *vma)
{
return remap_vmalloc_range(vma, (void *)info->fix.smem_start,
vma->vm_pgoff);
}

No checks, naked calls, should be also the candidate.


--
Roman




2019-01-04 11:39:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On Thu 03-01-19 21:31:58, Roman Penyaev wrote:
> On 2019-01-03 20:40, Michal Hocko wrote:
> > On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> > > On 2019-01-03 16:13, Michal Hocko wrote:
> > > > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > > > area->size can include adjacent guard page but get_vm_area_size()
> > > > > returns actual size of the area.
> > > > >
> > > > > This fixes possible kernel crash when userspace tries to map area
> > > > > on 1 page bigger: size check passes but the following
> > > > > vmalloc_to_page()
> > > > > returns NULL on last guard (non-existing) page.
> > > >
> > > > Can this actually happen? I am not really familiar with all the callers
> > > > of this API but VM_NO_GUARD is not really used wildly in the kernel.
> > >
> > > Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> > > thus the area->size will be bigger. The bug is not reproduced if
> > > VM_NO_GUARD is set.
> > >
> > > > All I can see is kasan na arm64 which doesn't really seem to use it
> > > > for vmalloc.
> > > >
> > > > So is the problem real or this is a mere cleanup?
> > >
> > > This is the real problem, try this hunk for any file descriptor which
> > > provides
> > > mapping, or say modify epoll as example:
> >
> > OK, my response was more confusing than I intended. I meant to say. Is
> > there any in kernel code that would allow the bug have had in mind?
> > In other words can userspace trick any existing code?
>
> In theory any existing caller of remap_vmalloc_range() which does
> not have an explicit size check should trigger an oops, e.g. this is
> a good candidate:
>
> *** drivers/media/usb/stkwebcam/stk-webcam.c:
> v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer,
> 0);

Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
that is indeed not enforced to be <= length AFAICS. So you are right!

Can we have an example in the changelog please?

Thanks!
--
Michal Hocko
SUSE Labs

2019-01-04 12:37:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On Fri 04-01-19 11:21:39, Roman Penyaev wrote:
> On 2019-01-04 10:38, Michal Hocko wrote:
>
> [...]
>
> > > >
> > > > OK, my response was more confusing than I intended. I meant to say. Is
> > > > there any in kernel code that would allow the bug have had in mind?
> > > > In other words can userspace trick any existing code?
> > >
> > > In theory any existing caller of remap_vmalloc_range() which does
> > > not have an explicit size check should trigger an oops, e.g. this is
> > > a good candidate:
> > >
> > > *** drivers/media/usb/stkwebcam/stk-webcam.c:
> > > v4l_stk_mmap[789] ret = remap_vmalloc_range(vma,
> > > sbuf->buffer,
> > > 0);
> >
> > Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> > buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> > that is indeed not enforced to be <= length AFAICS. So you are right!
> >
> > Can we have an example in the changelog please?
>
> You mean to resend this particular patch with the list of possible
> candidates for oops in a comment message? Sure thing.

I would just reply to the original patch with an updated changelog
wording (to include the above example and explain how the vma setup is
completely independent on the buffer allocation and ask Andrew to update
the changelog of the patch that is already in the mmotm tree).

Thanks!
--
Michal Hocko
SUSE Labs

2019-01-04 12:37:30

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

On 2019-01-04 10:38, Michal Hocko wrote:

[...]

>> >
>> > OK, my response was more confusing than I intended. I meant to say. Is
>> > there any in kernel code that would allow the bug have had in mind?
>> > In other words can userspace trick any existing code?
>>
>> In theory any existing caller of remap_vmalloc_range() which does
>> not have an explicit size check should trigger an oops, e.g. this is
>> a good candidate:
>>
>> *** drivers/media/usb/stkwebcam/stk-webcam.c:
>> v4l_stk_mmap[789] ret = remap_vmalloc_range(vma,
>> sbuf->buffer,
>> 0);
>
> Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> that is indeed not enforced to be <= length AFAICS. So you are right!
>
> Can we have an example in the changelog please?

You mean to resend this particular patch with the list of possible
candidates for oops in a comment message? Sure thing.

--
Roman


2019-01-04 13:10:04

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()

Hi Andrew,

Could you please update the commit message with something as
the following (I hope it will become a bit clearer now):

-------------
When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used,
but not area->size.

This fixes possible kernel oops when userspace tries to mmap an area
on 1 page bigger than was allocated by vmalloc_user() call: the size
check inside remap_vmalloc_range_partial() accounts non-existing
guard page also, so check successfully passes but vmalloc_to_page()
returns NULL (guard page does not physically exist).

The following code pattern example should trigger an oops:

static int oops_mmap(struct file *file, struct vm_area_struct *vma)
{
void *mem;

mem = vmalloc_user(4096);
BUG_ON(!mem);
/* Do not care about mem leak */

return remap_vmalloc_range(vma, mem, 0);
}

And userspace simply mmaps size + PAGE_SIZE:

mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);

Possible candidates for oops which do not have any explicit size
checks:

*** drivers/media/usb/stkwebcam/stk-webcam.c:
v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);

Or the following one:

*** drivers/video/fbdev/core/fbmem.c
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
...
res = fb->fb_mmap(info, vma);

Where fb_mmap callback calls remap_vmalloc_range() directly without
any explicit checks:

*** drivers/video/fbdev/vfb.c
static int vfb_mmap(struct fb_info *info,
struct vm_area_struct *vma)
{
return remap_vmalloc_range(vma, (void *)info->fix.smem_start,
vma->vm_pgoff);
}
--------

--
Roman


On 2019-01-03 15:59, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
>
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
>
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct
> vm_area_struct *vma, unsigned long uaddr,
> if (!(area->flags & VM_USERMAP))
> return -EINVAL;
>
> - if (kaddr + size > area->addr + area->size)
> + if (kaddr + size > area->addr + get_vm_area_size(area))
> return -EINVAL;
>
> do {


2019-01-11 21:02:30

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> __vmalloc_area_node() calls vfree() on error path, which in turn calls
> kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().
>
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/vmalloc.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2cd24186ba84..dc6a62bca503 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
> __vfree_deferred(addr);
> }
>
> +static void __vfree(const void *addr)
> +{
> + if (unlikely(in_interrupt()))
> + __vfree_deferred(addr);
> + else
> + __vunmap(addr, 1);
> +}
> +
> /**
> * vfree - release memory allocated by vmalloc()
> * @addr: memory base address
> @@ -1591,10 +1599,8 @@ void vfree(const void *addr)
>
> if (!addr)
> return;
> - if (unlikely(in_interrupt()))
> - __vfree_deferred(addr);
> - else
> - __vunmap(addr, 1);
> +
> + __vfree(addr);
> }
> EXPORT_SYMBOL(vfree);
>
> @@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> warn_alloc(gfp_mask, NULL,
> "vmalloc: allocation failure, allocated %ld of %ld bytes",
> (area->nr_pages*PAGE_SIZE), area->size);
> - vfree(area->addr);
> + __vfree(area->addr);

This can't be an interrupt context for a several reasons. One of them is BUG_ON(in_interrupt()) in __get_vm_area_node()
which is called right before __vmalloc_are_node().

So you can just do __vunmap(area->addr, 1); instead of __vfree().


> return NULL;
> }
>
>

2019-01-11 21:47:47

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
>
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
>
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---

Fixes: e69e9d4aee71 ("vmalloc: introduce remap_vmalloc_range_partial")
Acked-by: Andrey Ryabinin <[email protected]>


2019-01-11 21:47:51

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> vmalloc_user*() calls differ from normal vmalloc() only in that they
> set VM_USERMAP flags for the area. During the whole history of
> vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
> directly to __vmalloc_node_range() call instead of finding the area
> (which obviously takes time) after the allocation.
>
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: "Luis R. Rodriguez" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]

Acked-by: Andrey Ryabinin <[email protected]>

2019-01-12 18:02:08

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory

On 2019-01-11 20:26, Andrey Ryabinin wrote:
> On 1/3/19 5:59 PM, Roman Penyaev wrote:
>> __vmalloc_area_node() calls vfree() on error path, which in turn calls
>> kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().
>>
>> Signed-off-by: Roman Penyaev <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Andrey Ryabinin <[email protected]>
>> Cc: Joe Perches <[email protected]>
>> Cc: "Luis R. Rodriguez" <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> mm/vmalloc.c | 16 +++++++++++-----
>> 1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 2cd24186ba84..dc6a62bca503 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
>> __vfree_deferred(addr);
>> }
>>
>> +static void __vfree(const void *addr)
>> +{
>> + if (unlikely(in_interrupt()))
>> + __vfree_deferred(addr);
>> + else
>> + __vunmap(addr, 1);
>> +}
>> +
>> /**
>> * vfree - release memory allocated by vmalloc()
>> * @addr: memory base address
>> @@ -1591,10 +1599,8 @@ void vfree(const void *addr)
>>
>> if (!addr)
>> return;
>> - if (unlikely(in_interrupt()))
>> - __vfree_deferred(addr);
>> - else
>> - __vunmap(addr, 1);
>> +
>> + __vfree(addr);
>> }
>> EXPORT_SYMBOL(vfree);
>>
>> @@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct
>> vm_struct *area, gfp_t gfp_mask,
>> warn_alloc(gfp_mask, NULL,
>> "vmalloc: allocation failure, allocated %ld of %ld bytes",
>> (area->nr_pages*PAGE_SIZE), area->size);
>> - vfree(area->addr);
>> + __vfree(area->addr);
>
> This can't be an interrupt context for a several reasons. One of them
> is BUG_ON(in_interrupt()) in __get_vm_area_node()
> which is called right before __vmalloc_are_node().
>
> So you can just do __vunmap(area->addr, 1); instead of __vfree().

Thanks, I missed that BUG_ON and could not prove, that we can call only
from a task context, thus decided not to make it strict. Of course
simple __vunmap() is much better. The other reason is that we call a
spin_lock without disabling the interrupts. Now I see.

Andrew, may I resend just an updated version of this patch?

--
Roman