2009-06-17 00:21:38

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: + page_alloc-oops-when-setting-percpu_pagelist_fraction.patch added to -mm tree

(switch to lkml)

Sorry for late review.

>
> The patch titled
> page_alloc: Oops when setting percpu_pagelist_fraction
> has been added to the -mm tree. Its filename is
> page_alloc-oops-when-setting-percpu_pagelist_fraction.patch
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
> See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
> out what to do about this
>
> The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
>
> ------------------------------------------------------
> Subject: page_alloc: Oops when setting percpu_pagelist_fraction
> From: Dimitri Sivanich <[email protected]>
>
> After downing/upping a cpu, an attempt to set
> /proc/sys/vm/percpu_pagelist_fraction results in an oops in
> percpu_pagelist_fraction_sysctl_handler().
>
> To reproduce this:
> localhost:/sys/devices/system/cpu/cpu6 # echo 0 >online
> localhost:/sys/devices/system/cpu/cpu6 # echo 1 >online
> localhost:/sys/devices/system/cpu/cpu6 # cd /proc/sys/vm
> localhost:/proc/sys/vm # echo 100000 >percpu_pagelist_fraction
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [<ffffffff80286946>] percpu_pagelist_fraction_sysctl_handler+0x4a/0x96
>
> This is because the zone->pageset[cpu] value has not been set when the cpu
> has been brought back up for unpopulated zones (the "Movable" zone in the
> case I'm running into). Prior to downing/upping the cpu it had been set
> to &boot_pageset[cpu].
>
> There are two possible fixes that come to mind. One is to check for an
> unpopulated zone or NULL zone pageset for that cpu in
> percpu_pagelist_fraction_sysctl_handler(), and simply not set a pagelist
> highmark for that zone/cpu combination.
>
> The other, and the one I'm proposing here, is to set the zone's pageset
> back to the boot_pageset when the cpu is brought back up if the zone is
> unpopulated.
>
> Signed-off-by: Dimitri Sivanich <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/page_alloc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff -puN mm/page_alloc.c~page_alloc-oops-when-setting-percpu_pagelist_fraction mm/page_alloc.c
> --- a/mm/page_alloc.c~page_alloc-oops-when-setting-percpu_pagelist_fraction
> +++ a/mm/page_alloc.c
> @@ -2806,7 +2806,11 @@ static int __cpuinit process_zones(int c
>
> node_set_state(node, N_CPU); /* this node has a cpu */
>
> - for_each_populated_zone(zone) {
> + for_each_zone(zone) {
> + if (!populated_zone(zone)) {
> + zone_pcp(zone, cpu) = &boot_pageset[cpu];
> + continue;
> + }
> zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
> GFP_KERNEL, node);
> if (!zone_pcp(zone, cpu))

I don't think this code works.
pcp is only protected local_irq_save(), not spin lock. it assume
each cpu have different own pcp. but this patch break this assumption.
Now, we can share boot_pageset by multiple cpus.




> _
>
> Patches currently in -mm which might be from [email protected] are
>
> page_alloc-oops-when-setting-percpu_pagelist_fraction.patch
>



2009-06-17 14:01:00

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: + page_alloc-oops-when-setting-percpu_pagelist_fraction.patch added to -mm tree

On Wed, Jun 17, 2009 at 09:21:27AM +0900, KOSAKI Motohiro wrote:
> (switch to lkml)
>
> Sorry for late review.
>
> > mm/page_alloc.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff -puN mm/page_alloc.c~page_alloc-oops-when-setting-percpu_pagelist_fraction mm/page_alloc.c
> > --- a/mm/page_alloc.c~page_alloc-oops-when-setting-percpu_pagelist_fraction
> > +++ a/mm/page_alloc.c
> > @@ -2806,7 +2806,11 @@ static int __cpuinit process_zones(int c
> >
> > node_set_state(node, N_CPU); /* this node has a cpu */
> >
> > - for_each_populated_zone(zone) {
> > + for_each_zone(zone) {
> > + if (!populated_zone(zone)) {
> > + zone_pcp(zone, cpu) = &boot_pageset[cpu];
> > + continue;
> > + }
> > zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
> > GFP_KERNEL, node);
> > if (!zone_pcp(zone, cpu))
>
> I don't think this code works.
> pcp is only protected local_irq_save(), not spin lock. it assume
> each cpu have different own pcp. but this patch break this assumption.
> Now, we can share boot_pageset by multiple cpus.
>

I'm not quite understanding what you mean.

Prior to the cpu going down, each unpopulated zone pointed to the boot_pageset (per_cpu_pageset) for it's cpu (it's array element), so things had been set up this way already. I could be missing something, but am not sure why restoring this would be a risk?

2009-06-17 17:34:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: + page_alloc-oops-when-setting-percpu_pagelist_fraction.patch added to -mm tree

On Wed, 17 Jun 2009, Dimitri Sivanich wrote:

> > pcp is only protected local_irq_save(), not spin lock. it assume
> > each cpu have different own pcp. but this patch break this assumption.
> > Now, we can share boot_pageset by multiple cpus.
> >
>
> I'm not quite understanding what you mean.
>
> Prior to the cpu going down, each unpopulated zone pointed to the boot_pageset (per_cpu_pageset) for it's cpu (it's array element), so things had been set up this way already. I could be missing something, but am not sure why restoring this would be a risk?

The boot_pageset is supposed to be per cpu and this patch preserves it.

However, all zones for a cpu have just a single boot pageset. Maybe that
was what threw off Kosaki?