2013-09-14 23:46:07

by Wanpeng Li

[permalink] [raw]
Subject: [RESEND PATCH v5 1/4] mm/vmalloc: don't set area->caller twice

Changelog:
*v1 -> v2: rebase against mmotm tree

The caller address has already been set in set_vmalloc_vm(), there's no need
to set it again in __vmalloc_area_node.

Reviewed-by: Zhang Yanfei <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/vmalloc.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1074543..d78d117 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1566,7 +1566,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pages = kmalloc_node(array_size, nested_gfp, node);
}
area->pages = pages;
- area->caller = caller;
if (!area->pages) {
remove_vm_area(area->addr);
kfree(area);
--
1.8.1.2


2013-09-14 23:45:57

by Wanpeng Li

[permalink] [raw]
Subject: [RESEND PATCH v5 3/4] mm/vmalloc: revert "mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_info"

Changelog:
*v2 -> v3: revert commit d157a558 directly

The VM_UNINITIALIZED/VM_UNLIST flag introduced by commit f5252e00(mm: avoid
null pointer access in vm_struct via /proc/vmallocinfo) is used to avoid
accessing the pages field with unallocated page when show_numa_info() is
called. This patch move the check just before show_numa_info in order that
some messages still can be dumped via /proc/vmallocinfo. This patch revert
commit d157a558 (mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead
of show_numa_info);

Reviewed-by: Zhang Yanfei <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/vmalloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e3ec8b4..5368b17 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2562,6 +2562,11 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
if (!counters)
return;

+ /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
+ smp_rmb();
+ if (v->flags & VM_UNINITIALIZED)
+ return;
+
memset(counters, 0, nr_node_ids * sizeof(unsigned int));

for (nr = 0; nr < v->nr_pages; nr++)
@@ -2590,11 +2595,6 @@ static int s_show(struct seq_file *m, void *p)

v = va->vm;

- /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
- smp_rmb();
- if (v->flags & VM_UNINITIALIZED)
- return 0;
-
seq_printf(m, "0x%pK-0x%pK %7ld",
v->addr, v->addr + v->size, v->size);

--
1.8.1.2

2013-09-14 23:46:31

by Wanpeng Li

[permalink] [raw]
Subject: [RESEND PATCH v5 4/4] mm/vmalloc: fix show vmap_area information race with vmap_area tear down

Changelog:
*v4 -> v5: return directly for !VM_VM_AREA case and remove (VM_LAZY_FREE | VM_LAZY_FREEING) check

There is a race window between vmap_area tear down and show vmap_area information.

A B

remove_vm_area
spin_lock(&vmap_area_lock);
va->vm = NULL;
va->flags &= ~VM_VM_AREA;
spin_unlock(&vmap_area_lock);
spin_lock(&vmap_area_lock);
if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEZING))
return 0;
if (!(va->flags & VM_VM_AREA)) {
seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
va->va_end - va->va_start);
return 0;
}
free_unmap_vmap_area(va);
flush_cache_vunmap
free_unmap_vmap_area_noflush
unmap_vmap_area
free_vmap_area_noflush
va->flags |= VM_LAZY_FREE

The assumption !VM_VM_AREA represents vm_map_ram allocation is introduced by
commit: d4033afd(mm, vmalloc: iterate vmap_area_list, instead of vmlist, in
vmallocinfo()). However, !VM_VM_AREA also represents vmap_area is being tear
down in race window mentioned above. This patch fix it by don't dump any
information for !VM_VM_AREA case and also remove (VM_LAZY_FREE | VM_LAZY_FREEING)
check since they are not possible for !VM_VM_AREA case.

Suggested-by: Joonsoo Kim <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/vmalloc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5368b17..9b75028 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2582,16 +2582,13 @@ static int s_show(struct seq_file *m, void *p)
{
struct vmap_area *va = p;
struct vm_struct *v;
-
- if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEING))
- return 0;
-
- if (!(va->flags & VM_VM_AREA)) {
- seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
- (void *)va->va_start, (void *)va->va_end,
- va->va_end - va->va_start);
+
+ /*
+ * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
+ * behalf of vmap area is being tear down or vm_map_ram allocation.
+ */
+ if (!(va->flags & VM_VM_AREA))
return 0;
- }

v = va->vm;

--
1.8.1.2

2013-09-14 23:46:28

by Wanpeng Li

[permalink] [raw]
Subject: [RESEND PATCH v5 2/4] mm/vmalloc: revert "mm/vmalloc.c: emit the failure message before return"

Changelog:
*v2 -> v3: revert commit 46c001a2 directly

Don't warning twice in __vmalloc_area_node and __vmalloc_node_range if
__vmalloc_area_node allocation failure. This patch revert commit 46c001a2
(mm/vmalloc.c: emit the failure message before return).

Reviewed-by: Zhang Yanfei <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d78d117..e3ec8b4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1635,7 +1635,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,

addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);
if (!addr)
- goto fail;
+ return NULL;

/*
* In this function, newly allocated vm_struct has VM_UNINITIALIZED
--
1.8.1.2

2013-09-16 21:28:05

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 4/4] mm/vmalloc: fix show vmap_area information race with vmap_area tear down

On 9/14/2013 7:45 PM, Wanpeng Li wrote:
> Changelog:
> *v4 -> v5: return directly for !VM_VM_AREA case and remove (VM_LAZY_FREE | VM_LAZY_FREEING) check
>
> There is a race window between vmap_area tear down and show vmap_area information.
>
> A B
>
> remove_vm_area
> spin_lock(&vmap_area_lock);
> va->vm = NULL;
> va->flags &= ~VM_VM_AREA;
> spin_unlock(&vmap_area_lock);
> spin_lock(&vmap_area_lock);
> if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEZING))
> return 0;
> if (!(va->flags & VM_VM_AREA)) {
> seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
> (void *)va->va_start, (void *)va->va_end,
> va->va_end - va->va_start);
> return 0;
> }
> free_unmap_vmap_area(va);
> flush_cache_vunmap
> free_unmap_vmap_area_noflush
> unmap_vmap_area
> free_vmap_area_noflush
> va->flags |= VM_LAZY_FREE
>
> The assumption !VM_VM_AREA represents vm_map_ram allocation is introduced by
> commit: d4033afd(mm, vmalloc: iterate vmap_area_list, instead of vmlist, in
> vmallocinfo()). However, !VM_VM_AREA also represents vmap_area is being tear
> down in race window mentioned above. This patch fix it by don't dump any
> information for !VM_VM_AREA case and also remove (VM_LAZY_FREE | VM_LAZY_FREEING)
> check since they are not possible for !VM_VM_AREA case.
>
> Suggested-by: Joonsoo Kim <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>

Acked-by: KOSAKI Motohiro <[email protected]>

2013-09-16 21:28:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] mm/vmalloc: don't set area->caller twice

On 9/14/2013 7:45 PM, Wanpeng Li wrote:
> Changelog:
> *v1 -> v2: rebase against mmotm tree
>
> The caller address has already been set in set_vmalloc_vm(), there's no need

setup_vmalloc_vm()

> to set it again in __vmalloc_area_node.
>
> Reviewed-by: Zhang Yanfei <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/vmalloc.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1074543..d78d117 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1566,7 +1566,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> pages = kmalloc_node(array_size, nested_gfp, node);
> }
> area->pages = pages;
> - area->caller = caller;
> if (!area->pages) {
> remove_vm_area(area->addr);
> kfree(area);

Then, __vmalloc_area_node() no longer need "caller" argument. It can use area->caller instead.

2013-09-16 21:28:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 3/4] mm/vmalloc: revert "mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_info"

On 9/14/2013 7:45 PM, Wanpeng Li wrote:
> Changelog:
> *v2 -> v3: revert commit d157a558 directly
>
> The VM_UNINITIALIZED/VM_UNLIST flag introduced by commit f5252e00(mm: avoid
> null pointer access in vm_struct via /proc/vmallocinfo) is used to avoid
> accessing the pages field with unallocated page when show_numa_info() is
> called. This patch move the check just before show_numa_info in order that
> some messages still can be dumped via /proc/vmallocinfo. This patch revert
> commit d157a558 (mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead
> of show_numa_info);

Both d157a558 and your patch don't explain why your one is better. Yes, some
messages _can_ be dumped. But why should we do so?

And No. __get_vm_area_node() doesn't use __GFP_ZERO for allocating vm_area_struct.
dumped partial dump is not only partial, but also may be garbage.

I wonder why we need to call setup_vmalloc_vm() _after_ insert_vmap_area.

2013-09-16 21:28:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 2/4] mm/vmalloc: revert "mm/vmalloc.c: emit the failure message before return"

On 9/14/2013 7:45 PM, Wanpeng Li wrote:
> Changelog:
> *v2 -> v3: revert commit 46c001a2 directly
>
> Don't warning twice in __vmalloc_area_node and __vmalloc_node_range if
> __vmalloc_area_node allocation failure. This patch revert commit 46c001a2
> (mm/vmalloc.c: emit the failure message before return).
>
> Reviewed-by: Zhang Yanfei <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> mm/vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d78d117..e3ec8b4 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1635,7 +1635,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>
> addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);
> if (!addr)
> - goto fail;
> + return NULL;

This is not right fix. Now we have following call stack.

__vmalloc_node
__vmalloc_node_range
__vmalloc_node

Even if we remove a warning of __vmalloc_node_range, we still be able to see double warning
because we call __vmalloc_node recursively.

I haven't catch your point why twice warning is unacceptable though.








2013-09-17 05:39:38

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 2/4] mm/vmalloc: revert "mm/vmalloc.c: emit the failure message before return"

On Mon, Sep 16, 2013 at 7:41 PM, Wanpeng Li <[email protected]> wrote:
> Hi KOSAKI,
> On Mon, Sep 16, 2013 at 04:15:29PM -0400, KOSAKI Motohiro wrote:
>>On 9/14/2013 7:45 PM, Wanpeng Li wrote:
>>> Changelog:
>>> *v2 -> v3: revert commit 46c001a2 directly
>>>
>>> Don't warning twice in __vmalloc_area_node and __vmalloc_node_range if
>>> __vmalloc_area_node allocation failure. This patch revert commit 46c001a2
>>> (mm/vmalloc.c: emit the failure message before return).
>>>
>>> Reviewed-by: Zhang Yanfei <[email protected]>
>>> Signed-off-by: Wanpeng Li <[email protected]>
>>> ---
>>> mm/vmalloc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index d78d117..e3ec8b4 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -1635,7 +1635,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>>>
>>> addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);
>>> if (!addr)
>>> - goto fail;
>>> + return NULL;
>
> The goto fail is introduced by commit (mm/vmalloc.c: emit the failure message
> before return), and the commit author ignore there has already have warning in
> __vmalloc_area_node.
>
> http://marc.info/?l=linux-mm&m=137818671125209&w=2

But, module_alloc() directly calls __vmalloc_node_range(). Your fix
makes another regression.


>>This is not right fix. Now we have following call stack.
>>
>> __vmalloc_node
>> __vmalloc_node_range
>> __vmalloc_node
>>
>>Even if we remove a warning of __vmalloc_node_range, we still be able to see double warning
>>because we call __vmalloc_node recursively.
>
> Different size allocation failure in your example actually.

But, when we can not allocate small size memory, almost always we
can't allocate large size too.

You need some refactoring and make right fix.

2013-09-17 05:44:28

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 3/4] mm/vmalloc: revert "mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_info"

On Mon, Sep 16, 2013 at 8:18 PM, Wanpeng Li <[email protected]> wrote:
> Hi KOSAKI,
> On Mon, Sep 16, 2013 at 05:23:32PM -0400, KOSAKI Motohiro wrote:
>>On 9/14/2013 7:45 PM, Wanpeng Li wrote:
>>> Changelog:
>>> *v2 -> v3: revert commit d157a558 directly
>>>
>>> The VM_UNINITIALIZED/VM_UNLIST flag introduced by commit f5252e00(mm: avoid
>>> null pointer access in vm_struct via /proc/vmallocinfo) is used to avoid
>>> accessing the pages field with unallocated page when show_numa_info() is
>>> called. This patch move the check just before show_numa_info in order that
>>> some messages still can be dumped via /proc/vmallocinfo. This patch revert
>>> commit d157a558 (mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead
>>> of show_numa_info);
>>
>>Both d157a558 and your patch don't explain why your one is better. Yes, some
>>messages _can_ be dumped. But why should we do so?
>
> More messages can be dumped and original commit f5252e00(mm: avoid null pointer
> access in vm_struct via /proc/vmallocinfo) do that.
>
>>And No. __get_vm_area_node() doesn't use __GFP_ZERO for allocating vm_area_struct.
>>dumped partial dump is not only partial, but also may be garbage.
>
> vm_struct is allocated by kzalloc_node.

Oops, you are right. Then, your code _intentionally_ show amazing
zero. Heh, nice.
More message is pointless. zero is just zero. It doesn't have any information.


>>I wonder why we need to call setup_vmalloc_vm() _after_ insert_vmap_area.
>
> I think it's another topic.

Why?


> Fill vm_struct and set VM_VM_AREA flag. If I misunderstand your
> question?

VM_VM_AREA doesn't help. we have race between insert_vmap_area and
setup_vmalloc_vm.

2013-09-23 23:18:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] mm/vmalloc: don't set area->caller twice

On Tue, 17 Sep 2013 07:29:41 +0800 Wanpeng Li <[email protected]> wrote:

> >> to set it again in __vmalloc_area_node.
> >>
> >> Reviewed-by: Zhang Yanfei <[email protected]>
> >> Signed-off-by: Wanpeng Li <[email protected]>
> >> ---
> >> mm/vmalloc.c | 1 -
> >> 1 file changed, 1 deletion(-)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index 1074543..d78d117 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -1566,7 +1566,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >> pages = kmalloc_node(array_size, nested_gfp, node);
> >> }
> >> area->pages = pages;
> >> - area->caller = caller;
> >> if (!area->pages) {
> >> remove_vm_area(area->addr);
> >> kfree(area);
> >
> >Then, __vmalloc_area_node() no longer need "caller" argument. It can use area->caller instead.
> >
>
> Thanks for pointing out, I will update it in next version.

I've seen so many versions of this patchset that my head has spun right
off. I'm not at all confident that I have the latest version and I'm
certainly not confident that I've kept up with the ack/nack trail.

So I think I'll drop everything and will await that "next version".
Please be careful to Cc everyone who was involved and that the
acked/reviewed-by paperwork is up to date.

Thanks.