2007-08-01 19:05:27

by Lee Schermerhorn

[permalink] [raw]
Subject: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

Fix missing numa_zonelist_order sysctl config

Against 2.6.23-rc1-mm1.

Found this testing Mel Gorman's patch for the issue with
"policy_zone" and ZONE_MOVABLE.

Misplaced #endif is hiding the numa_zonelist_order sysctl
when !SECURITY.

[But, maybe reordering the zonelists is not such a good idea
when ZONE_MOVABLE is populated?]

Signed-off-by: Lee Schermerhorn <[email protected]>

kernel/sysctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: Linux/kernel/sysctl.c
===================================================================
--- Linux.orig/kernel/sysctl.c 2007-07-25 09:29:50.000000000 -0400
+++ Linux/kernel/sysctl.c 2007-08-01 13:29:18.000000000 -0400
@@ -1068,6 +1068,7 @@ static ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &proc_doulongvec_minmax,
},
+#endif
#ifdef CONFIG_NUMA
{
.ctl_name = CTL_UNNUMBERED,
@@ -1079,7 +1080,6 @@ static ctl_table vm_table[] = {
.strategy = &sysctl_string,
},
#endif
-#endif
#if defined(CONFIG_X86_32) || \
(defined(CONFIG_SUPERH) && defined(CONFIG_VSYSCALL))
{



2007-08-02 00:43:31

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

On Wed, 01 Aug 2007 15:02:51 -0400
Lee Schermerhorn <[email protected]> wrote:
> [But, maybe reordering the zonelists is not such a good idea
> when ZONE_MOVABLE is populated?]
>

It's case-by-case I think. In zone order with ZONE_MOVABLE case,
user's page cache will not use ZONE_NORMAL until ZONE_MOVABLE in all node
is exhausted. This is an expected behavior, I think.

I think the real problem is the scheme for "How to set zone movable size to
appropriate value for the system". This needs more study and documentation.
(but maybe depends on system configuration to some extent.)

Thanks,
-Kame

2007-08-02 15:19:19

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

On Thu, 2007-08-02 at 09:44 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 01 Aug 2007 15:02:51 -0400
> Lee Schermerhorn <[email protected]> wrote:
> > [But, maybe reordering the zonelists is not such a good idea
> > when ZONE_MOVABLE is populated?]
> >
>
> It's case-by-case I think. In zone order with ZONE_MOVABLE case,
> user's page cache will not use ZONE_NORMAL until ZONE_MOVABLE in all node
> is exhausted. This is an expected behavior, I think.
>
> I think the real problem is the scheme for "How to set zone movable size to
> appropriate value for the system". This needs more study and documentation.
> (but maybe depends on system configuration to some extent.)

Yes. Having thought about it a bit more, maybe zone order IS what we
want if we desire the remainder of the zone from which is was taken
[ZONE_MOVABLE-1] to be reserved for non-movable kernel use as long as
possible--similar to the dma zone. I had made the non-movable zone very
large for testing, so that I could create a segment that used all of the
movable zones on all the nodes and then dip into the non-movable/normal
zone. If I used a more reasonable [much smaller] amount of kernelcore,
the interleave would have worked as "expected".

Of course, I don't have any idea of what is a "reasonable amount".
Guess I could look at non-movable zone memory usage in a system at
typical or peak load to get an idea. Anyone have any data in this
regard?

Lee


2007-08-02 16:14:51

by mel

[permalink] [raw]
Subject: Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

On (02/08/07 09:44), KAMEZAWA Hiroyuki didst pronounce:
> On Wed, 01 Aug 2007 15:02:51 -0400
> Lee Schermerhorn <[email protected]> wrote:
> > [But, maybe reordering the zonelists is not such a good idea
> > when ZONE_MOVABLE is populated?]
> >
>
> It's case-by-case I think. In zone order with ZONE_MOVABLE case,
> user's page cache will not use ZONE_NORMAL until ZONE_MOVABLE in all node
> is exhausted. This is an expected behavior, I think.
>

This is expected behaviour. I see no reason for lower zones to be used
for allocations that use memory from a higher zone with free memory.

> I think the real problem is the scheme for "How to set zone movable size to
> appropriate value for the system". This needs more study and documentation.
> (but maybe depends on system configuration to some extent.)
>

It depends on the system configuration and the workload requirements.
Right now, there isn't exact information available on what size the zone
should be. It'll need to be studied over a period of time.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-08-02 16:18:39

by mel

[permalink] [raw]
Subject: Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

On (02/08/07 17:14), Mel Gorman didst pronounce:
> On (02/08/07 09:44), KAMEZAWA Hiroyuki didst pronounce:
> > On Wed, 01 Aug 2007 15:02:51 -0400
> > Lee Schermerhorn <[email protected]> wrote:
> > > [But, maybe reordering the zonelists is not such a good idea
> > > when ZONE_MOVABLE is populated?]
> > >
> >
> > It's case-by-case I think. In zone order with ZONE_MOVABLE case,
> > user's page cache will not use ZONE_NORMAL until ZONE_MOVABLE in all node
> > is exhausted. This is an expected behavior, I think.
> >
>
> This is expected behaviour. I see no reason for lower zones to be used
> for allocations that use memory from a higher zone with free memory.
>

Bah. I should have thought of this better.

If you are using ZONE_MOVABLE and the zonelist is in zone order, one would
use memory from remote nodes when suitable local memory was available. I don't
have a quick answer on how this should be handled. The answer may be
something like;

o When ordering zonelists by nodes, order them so that the movable zone
is paired with the next highest zones in a zonelist before moving to the
next node

> > I think the real problem is the scheme for "How to set zone movable size to
> > appropriate value for the system". This needs more study and documentation.
> > (but maybe depends on system configuration to some extent.)
> >
>
> It depends on the system configuration and the workload requirements.
> Right now, there isn't exact information available on what size the zone
> should be. It'll need to be studied over a period of time.
>
> --
> Mel Gorman
> Part-time Phd Student Linux Technology Center
> University of Limerick IBM Dublin Software Lab
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-08-03 00:26:43

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

On Thu, 02 Aug 2007 11:07:38 -0400
Lee Schermerhorn <[email protected]> wrote:

> Of course, I don't have any idea of what is a "reasonable amount".
> Guess I could look at non-movable zone memory usage in a system at
> typical or peak load to get an idea. Anyone have any data in this
> regard?
>
I'm sorry that I have no data and idea.
ZONE_MOVABLE is too young to be used under business workload...

just I feel...
Considering i686 which divides memory into NORMAL and HIGHMEM, it seems
that 4G to 8G servers looks stable under various workload in my experience.

Then, at least, 12.5% to 25% of "Total Memory - Hugepages" memory should be
under ZONE_NORMAL. But this is from experience of 32bit/SMP :(

Thanks,
-Kame