2005-03-05 04:35:48

by Andrew Morton

[permalink] [raw]
Subject: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix



If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a
divide-by-zero.

Prevent that, and fiddle with some whitespace too.

Signed-off-by: Andrew Morton <[email protected]>
---

25-akpm/mm/page_alloc.c | 21 +++++++++++++++------
1 files changed, 15 insertions(+), 6 deletions(-)

diff -puN mm/page_alloc.c~setup_per_zone_lowmem_reserve-oops-fix mm/page_alloc.c
--- 25/mm/page_alloc.c~setup_per_zone_lowmem_reserve-oops-fix 2005-03-04 13:16:10.000000000 -0800
+++ 25-akpm/mm/page_alloc.c 2005-03-04 13:16:10.000000000 -0800
@@ -37,13 +37,17 @@
#include <asm/tlbflush.h>
#include "internal.h"

-/* MCD - HACK: Find somewhere to initialize this EARLY, or make this initializer cleaner */
+/*
+ * MCD - HACK: Find somewhere to initialize this EARLY, or make this
+ * initializer cleaner
+ */
nodemask_t node_online_map = { { [0] = 1UL } };
nodemask_t node_possible_map = NODE_MASK_ALL;
struct pglist_data *pgdat_list;
unsigned long totalram_pages;
unsigned long totalhigh_pages;
long nr_swap_pages;
+
/*
* results with 256, 32 in the lowmem_reserve sysctl:
* 1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -1924,15 +1928,20 @@ static void setup_per_zone_lowmem_reserv

for_each_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
- struct zone * zone = pgdat->node_zones + j;
+ struct zone *zone = pgdat->node_zones + j;
unsigned long present_pages = zone->present_pages;

zone->lowmem_reserve[j] = 0;

for (idx = j-1; idx >= 0; idx--) {
- struct zone * lower_zone = pgdat->node_zones + idx;
+ struct zone *lower_zone;
+
+ if (sysctl_lowmem_reserve_ratio[idx] < 1)
+ sysctl_lowmem_reserve_ratio[idx] = 1;

- lower_zone->lowmem_reserve[j] = present_pages / sysctl_lowmem_reserve_ratio[idx];
+ lower_zone = pgdat->node_zones + idx;
+ lower_zone->lowmem_reserve[j] = present_pages /
+ sysctl_lowmem_reserve_ratio[idx];
present_pages += lower_zone->present_pages;
}
}
@@ -2039,7 +2048,7 @@ module_init(init_per_zone_pages_min)
* changes.
*/
int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
- struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
+ struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
{
proc_dointvec(table, write, file, buffer, length, ppos);
setup_per_zone_pages_min();
@@ -2056,7 +2065,7 @@ int min_free_kbytes_sysctl_handler(ctl_t
* if in function of the boot time zone sizes.
*/
int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
- struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
+ struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
{
proc_dointvec_minmax(table, write, file, buffer, length, ppos);
setup_per_zone_lowmem_reserve();
_


2005-03-07 06:10:38

by Greg KH

[permalink] [raw]
Subject: Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

On Fri, Mar 04, 2005 at 01:16:55PM -0800, [email protected] wrote:
>
>
> If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a
> divide-by-zero.
>
> Prevent that, and fiddle with some whitespace too.

Due to the whitespace fiddling, I'd say no to this patch, based on the
"criteria".

thanks,

greg k-h

2005-03-07 08:10:15

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

[email protected] wrote:
> If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a
> divide-by-zero.
>
> Prevent that, and fiddle with some whitespace too.
>
> Signed-off-by: Andrew Morton <[email protected]>

Can we instead have a patch that makes the value zero turn off the
lowmem reserve entirely if it is set to zero?

Just now I was just testing, and found no easy way to do this other
than to make the value large enough that the reserve is insignificant.

So the loop would be something like:

for (idx = j-1; idx >= 0; idx--) {
struct zone *lower_zone;
lower_zone = pgdat->node_zones + idx;

lower_zone->lowmem_reserve[j] = 0;
if (sysctl_lowmem_reserve_ratio[idx] > 0)
lower_zone->lowmem_reserve[j] = present_pages /
sysctl_lowmem_reserve_ratio[idx];

present_pages += lower_zone->present_pages;
}


2005-03-07 08:21:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

Nick Piggin <[email protected]> wrote:
>
> [email protected] wrote:
> > If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a
> > divide-by-zero.
> >
> > Prevent that, and fiddle with some whitespace too.
> >
> > Signed-off-by: Andrew Morton <[email protected]>
>
> Can we instead have a patch that makes the value zero turn off the
> lowmem reserve entirely if it is set to zero?

That would make sense, I guess.

> Just now I was just testing, and found no easy way to do this other
> than to make the value large enough that the reserve is insignificant.

Me too. I use 1000 to get my 50MB of pagecache back.

I haven't thought about it yet, but there must be some way to avoid leaving
huge amounts of lowmem free. It should be OK to allow lowmem to be fully
used, as long as there's sufficent reclaimable stuff in there - slab,
blockdev pagecache, etc. (Assuming nothing sane mmaps blockdevs. INND
does). Dunno....

2005-03-07 13:01:06

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

On Mon, Mar 07, 2005 at 07:10:05PM +1100, Nick Piggin wrote:
> [email protected] wrote:
> >If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a
> >divide-by-zero.
> >
> >Prevent that, and fiddle with some whitespace too.
> >
> >Signed-off-by: Andrew Morton <[email protected]>
>
> Can we instead have a patch that makes the value zero turn off the
> lowmem reserve entirely if it is set to zero?
>
> Just now I was just testing, and found no easy way to do this other
> than to make the value large enough that the reserve is insignificant.
>
> So the loop would be something like:
>
> for (idx = j-1; idx >= 0; idx--) {
> struct zone *lower_zone;
> lower_zone = pgdat->node_zones + idx;
>
> lower_zone->lowmem_reserve[j] = 0;
> if (sysctl_lowmem_reserve_ratio[idx] > 0)
> lower_zone->lowmem_reserve[j] =
> present_pages /
> sysctl_lowmem_reserve_ratio[idx];
>
> present_pages += lower_zone->present_pages;
> }

Looks good to me. I noticed the divide by zero myself once, and I
also considered changing it so that zero disables it. Could you send a
full patch to Andrew? Thanks.

2005-03-07 13:02:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

On Mon, Mar 07, 2005 at 12:20:48AM -0800, Andrew Morton wrote:
> I haven't thought about it yet, but there must be some way to avoid leaving
> huge amounts of lowmem free. It should be OK to allow lowmem to be fully
> used, as long as there's sufficent reclaimable stuff in there - slab,
> blockdev pagecache, etc. (Assuming nothing sane mmaps blockdevs. INND
> does). Dunno....

Then mlock will have to unmap and migrate the cache, it's just much more
complicated, but it's certainly doable.