2008-03-08 01:26:19

by David Rientjes

[permalink] [raw]
Subject: [patch -mm 1/2] mempolicy: disallow static or relative flags for local preferred mode

MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
MPOL_PREFERRED policies that were created with an empty nodemask (for
purely local allocations). They'll never be invalidated because the
allowed mems of a task changes or need to be rebound relative to a
cpuset's placement.

Also fixes a bug identified by Lee Schermerhorn that disallowed empty
nodemasks to be passed to MPOL_PREFERRED to specify local allocations.

Cc: Paul Jackson <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
Documentation/vm/numa_memory_policy.txt | 16 ++++++++++++++--
mm/mempolicy.c | 17 ++++++++++++-----
2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -210,6 +210,12 @@ Components of Memory Policies
local allocation for a specific range of addresses--i.e. for
VMA policies.

+ It is possible for the user to specify that local allocation is
+ always preferred by passing an empty nodemask with this mode.
+ If an empty nodemask is passed, the policy cannot use the
+ MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described
+ below.
+
MPOL_INTERLEAVED: This mode specifies that page allocations be
interleaved, on a page granularity, across the nodes specified in
the policy. This mode also behaves slightly differently, based on
@@ -259,7 +265,10 @@ Components of Memory Policies
occurs over that node. If no nodes from the user's nodemask are
now allowed, the Default behavior is used.

- MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES.
+ MPOL_F_STATIC_NODES cannot be combined with the
+ MPOL_F_RELATIVE_NODES flag. It also cannot be used for
+ MPOL_PREFERRED policies that were created with an empty nodemask
+ (local allocation).

MPOL_F_RELATIVE_NODES: This flag specifies that the nodemask passed
by the user will be mapped relative to the set of the task or VMA's
@@ -306,7 +315,10 @@ Components of Memory Policies
set of memory nodes allowed by the task's cpuset, as that may
change over time.

- MPOL_F_RELATIVE_NODES cannot be used with MPOL_F_STATIC_NODES.
+ MPOL_F_RELATIVE_NODES cannot be combined with the
+ MPOL_F_STATIC_NODES flag. It also cannot be used for
+ MPOL_PREFERRED policies that were created with an empty nodemask
+ (local allocation).

MEMORY POLICY APIs

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -151,9 +151,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)

static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
{
- if (nodes_empty(*nodes))
- return -EINVAL;
- pol->v.preferred_node = first_node(*nodes);
+ pol->v.preferred_node = !nodes_empty(*nodes) ? first_node(*nodes) : -1;
return 0;
}

@@ -176,7 +174,16 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
pr_debug("setting mode %d flags %d nodes[0] %lx\n",
mode, flags, nodes ? nodes_addr(*nodes)[0] : -1);

- if (nodes && nodes_empty(*nodes) && mode != MPOL_PREFERRED)
+ /*
+ * MPOL_PREFERRED cannot be used with MPOL_F_STATIC_NODES or
+ * MPOL_F_RELATIVE_NODES if the nodemask is empty (local allocation).
+ * All other modes require a valid pointer to a non-empty nodemask.
+ */
+ if (mode == MPOL_PREFERRED) {
+ if (nodes_empty(*nodes) && ((flags & MPOL_F_STATIC_NODES) ||
+ (flags & MPOL_F_RELATIVE_NODES)))
+ return ERR_PTR(-EINVAL);
+ } else if (nodes && nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
if (mode == MPOL_DEFAULT)
return NULL;
@@ -250,7 +257,7 @@ static void mpol_rebind_preferred(struct mempolicy *pol,
} else if (pol->flags & MPOL_F_RELATIVE_NODES) {
mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
pol->v.preferred_node = first_node(tmp);
- } else {
+ } else if (pol->v.preferred_node != -1) {
pol->v.preferred_node = node_remap(pol->v.preferred_node,
pol->w.cpuset_mems_allowed,
*nodes);


2008-03-08 01:25:55

by David Rientjes

[permalink] [raw]
Subject: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

default_policy is defined in mm/mempolicy.c as the default system policy:

struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1),
.policy = MPOL_DEFAULT,
};

So instead of checking for comparisons against a mempolicy's mode to
MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
we should use the mode that is defined in this struct.

Cc: Paul Jackson <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
mm/mempolicy.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -185,7 +185,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
return ERR_PTR(-EINVAL);
} else if (nodes && nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
- if (mode == MPOL_DEFAULT)
+ if (mode == default_policy.policy)
return NULL;
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
if (!policy)
@@ -1232,7 +1232,7 @@ static struct mempolicy * get_vma_policy(struct task_struct *task,
pol = vma->vm_ops->get_policy(vma, addr);
shared_pol = 1; /* if pol non-NULL, add ref below */
} else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ vma->vm_policy->policy != default_policy.policy)
pol = vma->vm_policy;
}
if (!pol)
@@ -1588,7 +1588,7 @@ void __mpol_free(struct mempolicy *p)
{
if (!atomic_dec_and_test(&p->refcnt))
return;
- p->policy = MPOL_DEFAULT;
+ p->policy = default_policy.policy;
kmem_cache_free(policy_cache, p);
}

@@ -1752,10 +1752,10 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy,
info->root = RB_ROOT;
spin_lock_init(&info->lock);

- if (policy != MPOL_DEFAULT) {
+ if (policy != default_policy.policy) {
struct mempolicy *newpol;

- /* Falls back to MPOL_DEFAULT on any error */
+ /* Falls back to default_policy on any error */
newpol = mpol_new(policy, flags, policy_nodes);
if (!IS_ERR(newpol)) {
/* Create pseudo-vma that contains just the policy */
@@ -1860,7 +1860,7 @@ void __init numa_policy_init(void)
/* Reset policy of current process to default */
void numa_default_policy(void)
{
- do_set_mempolicy(MPOL_DEFAULT, 0, NULL);
+ do_set_mempolicy(default_policy.policy, 0, NULL);
}

/*

2008-03-08 01:29:03

by Paul Jackson

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

David wrote:
> So instead of checking for comparisons against a mempolicy's mode to
> MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
> we should use the mode that is defined in this struct.

Is this just a stylistic choice? That is, is the same machine
code produced either way?

If that's the case, could you briefly explain why you prefer
one style (default_policy.policy) over the other (MPOL_DEFAULT)?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-03-08 01:33:34

by David Rientjes

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

On Fri, 7 Mar 2008, Paul Jackson wrote:

> > So instead of checking for comparisons against a mempolicy's mode to
> > MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
> > we should use the mode that is defined in this struct.
>
> Is this just a stylistic choice? That is, is the same machine
> code produced either way?
>
> If that's the case, could you briefly explain why you prefer
> one style (default_policy.policy) over the other (MPOL_DEFAULT)?
>

Very fast response!

Yes, the same code is generated since default_policy.policy is statically
declared as MPOL_DEFAULT.

I chose this because checks in mpol_new() to return NULL if the mode is
MPOL_DEFAULT is purely based on the fact that, as the default system
policy, policy task or VMA pointers are set to &default_policy.

2008-03-08 01:35:54

by Paul Jackson

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

David wrote:
> I chose this because checks in mpol_new() to return NULL if the mode is
> MPOL_DEFAULT is purely based on the fact that, as the default system
> policy, policy task or VMA pointers are set to &default_policy.

You're just as fast!

Reasonable answer - thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-03-08 02:00:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

And where is the patch to set the system (or cpuset....) default policy?
;-)

Note that the system default policy changes during bootup. Could that be
done with the default policy? We had some issues a while back with
processes spawned at boot inheriting the interleave policy. If they could
refer to default_policy instead then a change of default_policy would
switch all spawned processes to use MPOL_DEFAULT?

2008-03-08 02:15:23

by David Rientjes

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

On Fri, 7 Mar 2008, Christoph Lameter wrote:

> And where is the patch to set the system (or cpuset....) default policy?
> ;-)
>

The system default policy is inherently defined in get_vma_policy() and
alloc_pages_current().

> Note that the system default policy changes during bootup. Could that be
> done with the default policy? We had some issues a while back with
> processes spawned at boot inheriting the interleave policy. If they could
> refer to default_policy instead then a change of default_policy would
> switch all spawned processes to use MPOL_DEFAULT?
>

With the patch, get_vma_policy() would return the VMA's policy if it has
its own get_policy() function that returns a valid policy or differs from
the default, otherwise it would refer to whatever default_policy is
statically allocated to represent.

For task mempolicies, alloc_pages_current() has always used default_policy
if current->mempolicy is NULL. So you can change default_policy.policy to
be anything you want (and its v.nodes or v.preferred_node members) and
that will be the effected policy for any task that doesn't have a valid
->mempolicy.

2008-03-08 19:13:18

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

On Fri, 2008-03-07 at 17:33 -0800, David Rientjes wrote:
> On Fri, 7 Mar 2008, Paul Jackson wrote:
>
> > > So instead of checking for comparisons against a mempolicy's mode to
> > > MPOL_DEFAULT or falling back stricly to MPOL_DEFAULT throughout the code,
> > > we should use the mode that is defined in this struct.
> >
> > Is this just a stylistic choice? That is, is the same machine
> > code produced either way?
> >
> > If that's the case, could you briefly explain why you prefer
> > one style (default_policy.policy) over the other (MPOL_DEFAULT)?
> >
>
> Very fast response!
>
> Yes, the same code is generated since default_policy.policy is statically
> declared as MPOL_DEFAULT.
>
> I chose this because checks in mpol_new() to return NULL if the mode is
> MPOL_DEFAULT is purely based on the fact that, as the default system
> policy, policy task or VMA pointers are set to &default_policy.

Do I understand that you're saying that when mode is specified as
MPOL_DEFAULT, the current task policy [in the case of set_mempolicy()]
or the indicated vma policy [in the case of mbind()] is set to
&default_policy? If that's not what you're saying, please disregard the
rest of this message.

If that is what you're saying: This is not the case. When mode is
specified as MPOL_DEFAULT, mpol_new() returns NULL and
do_set_mempolicy() or do_mbind() [via it's helper functions] replaces
the existing task or vma policy with the NULL, freeing [releasing a
reference on] the existing policy, if any. This results in the target
policy--task or vma--"falling back" to the surrounding policy scope.
This seems to have been the behavior as far back as I have looked.

In fact, the only place that MPOL_DEFAULT occurs in the policy [mode]
member of struct mempolicy is in the system default_policy. [As
Christoph noted, this is not the case during system initialization.] In
this context, it means "local allocation". This requires us to have the
MPOL_DEFAULT case in the switches throughout mempolicy.c.

I have a patch queued up, waiting for things to settle down in
mempolicy.c, to replace the policy/mode in default_policy with
MPOL_PREFERRED with preferred_node = -1. Then, we can remove all of the
MPOL_DEFAULT cases out of the switches in the allocation paths and
"clean up" the documentation, including man pages. MPOL_DEFAULT becomes
simply an API mechanism to request fall back to the surrounding policy
scope which, to my mind, is what "default policy" means.

I don't have a problem with this patch in the context of the current
implementation, altho' it does seem a bit pointless to me, unless you
have something further that you're trying to to accomplish.

I'll move the aforementioned patch to the head of my queue and send it
out early this coming week for review atop whatever mempolicy patches
Andrew has added to the tree at that time.

Later,
Lee

2008-03-08 22:20:43

by David Rientjes

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

On Sat, 8 Mar 2008, Lee Schermerhorn wrote:

> I have a patch queued up, waiting for things to settle down in
> mempolicy.c, to replace the policy/mode in default_policy with
> MPOL_PREFERRED with preferred_node = -1. Then, we can remove all of the
> MPOL_DEFAULT cases out of the switches in the allocation paths and
> "clean up" the documentation, including man pages. MPOL_DEFAULT becomes
> simply an API mechanism to request fall back to the surrounding policy
> scope which, to my mind, is what "default policy" means.
>

Ok, I'll await your patch that switches default_policy.policy to
MPOL_PREFERRED.

Using MPOL_DEFAULT purely for falling back to the task or system-wide
policy, however, seems confusing. The semantics seem to indicate that
MPOL_DEFAULT represents the system-wide default policy without any
preferred node or set of nodes to bind or interleave. So if a VMA has a
policy of MPOL_DEFAULT then, to me, it seems like that indicates the
absence of a specific policy, not a mandate to fallback to the task
policy.

2008-03-08 23:19:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT


> Using MPOL_DEFAULT purely for falling back to the task or system-wide
> policy, however, seems confusing. The semantics seem to indicate that
> MPOL_DEFAULT represents the system-wide default policy without any
> preferred node or set of nodes to bind or interleave. So if a VMA has a
> policy of MPOL_DEFAULT then, to me, it seems like that indicates the
> absence of a specific policy, not a mandate to fallback to the task
> policy.

I designed MPOL_DEFAULT on vma originally to be a fallback to the task policy.

Absence of specific policy would be MPOL_PREFERRED with -1 node.

-Andi


2008-03-10 13:48:20

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [patch -mm 2/2] mempolicy: use default_policy mode instead of MPOL_DEFAULT

On Sun, 2008-03-09 at 00:19 +0100, Andi Kleen wrote:
> > Using MPOL_DEFAULT purely for falling back to the task or system-wide
> > policy, however, seems confusing. The semantics seem to indicate that
> > MPOL_DEFAULT represents the system-wide default policy without any
> > preferred node or set of nodes to bind or interleave. So if a VMA has a
> > policy of MPOL_DEFAULT then, to me, it seems like that indicates the
> > absence of a specific policy, not a mandate to fallback to the task
> > policy.
>
> I designed MPOL_DEFAULT on vma originally to be a fallback to the task policy.
>
> Absence of specific policy would be MPOL_PREFERRED with -1 node.

Not sure what you mean here, Andi.

"MPOL_PREFERRED with -1 node" == "local allocation", right?

Whereas, in the task mempolicy or vma policy or shared policy, the lack
of a specific policy--i.e., a null mempolicy pointer, or no policy at
that offset in a shared policy rbtree--means fall back to surrounding
context, right? As far back as I've looked, mempolicy.c implemented
MPOL_DEFAULT, passed to set_mempolicy() or mbind(), by deleting the
target policy, resulting in "fall back".

The only place that MPOL_DEFAULT actually occurs in a struct mempolicy
is in the system default policy. I think we can simplify the code and
documentation--not have to explain the context dependent meaning of
MPOL_DEFAULT--by making it simply an API request to remove the target
policy and establish "default behavior" for that context--i.e.,
fallback.

Lee
>
> -Andi
>
>
>

2008-03-10 19:10:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch -mm 1/2] mempolicy: disallow static or relative flags for local preferred mode

On Fri, 7 Mar 2008 17:24:15 -0800 (PST)
David Rientjes <[email protected]> wrote:

> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
> MPOL_PREFERRED policies that were created with an empty nodemask (for
> purely local allocations). They'll never be invalidated because the
> allowed mems of a task changes or need to be rebound relative to a
> cpuset's placement.
>
> Also fixes a bug identified by Lee Schermerhorn that disallowed empty
> nodemasks to be passed to MPOL_PREFERRED to specify local allocations.

I get a significant-looking reject from this. Can you please redo and
resend?


I put my current rollup (against -rc5) at
http://userweb.kernel.org/~akpm/dr.gz and the broken-out tree is, as always
at http://userweb.kernel.org/~akpm/mmotm/

It would be better for you to get set up for using mmotm - it is my usual
way of publishing the -mm queue between releases.