2012-10-31 09:18:36

by Wen Congyang

[permalink] [raw]
Subject: [PART6 Patch] memory-hotplug: bugfix for movable node

This patch is part6 of the following patchset:
https://lkml.org/lkml/2012/10/29/319

The patchset is based on Linus's tree with these three patches already applied:
https://lkml.org/lkml/2012/10/24/151
https://lkml.org/lkml/2012/10/26/150

Part1 is here:
https://lkml.org/lkml/2012/10/31/30

Part2 is here:
http://marc.info/?l=linux-kernel&m=135166705909544&w=2

Part3 is here:
http://marc.info/?l=linux-kernel&m=135167050510527&w=2

Part4 is here:
http://marc.info/?l=linux-kernel&m=135167344211401&w=2

Part5 is here:
http://marc.info/?l=linux-kernel&m=135167497312063&w=2

You can apply this patch without the other parts.

Issues):

mempolicy(M_BIND) don't act well when the nodemask has movable nodes only,
the kernel allocation will fail and the task can't create new task or other
kernel objects.

So we change the strategy/policy
when the bound nodemask has movable node(s) only, we only
apply mempolicy for userspace allocation, don't apply it
for kernel allocation.

CPUSET also has the same problem, but the code spread in page_alloc.c,
and we doesn't fix it yet, we can/will change allocation strategy to one of
these 3 strategies:
1) the same strategy as mempolicy
2) change cpuset, make nodemask always has at least a normal node
3) split nodemask: nodemask_user and nodemask_kernel

This patchset only fixes issue1.

Lai Jiangshan (1):
mempolicy: fix is_valid_nodemask()

mm/mempolicy.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)

--
1.8.0


2012-10-31 09:18:46

by Wen Congyang

[permalink] [raw]
Subject: [PART6 Patch] mempolicy: fix is_valid_nodemask()

From: Lai Jiangshan <[email protected]>

is_valid_nodemask() is introduced by 19770b32. but it does not match
its comments, because it does not check the zone which > policy_zone.

Also in b377fd, this commits told us, if highest zone is ZONE_MOVABLE,
we should also apply memory policies to it. so ZONE_MOVABLE should be valid zone
for policies. is_valid_nodemask() need to be changed to match it.

Fix: check all zones, even its zoneid > policy_zone.
Use nodes_intersects() instead open code to check it.

Signed-off-by: Lai Jiangshan <[email protected]>
Reported-by: Wen Congyang <[email protected]>
---
mm/mempolicy.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..de5aa24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -140,19 +140,7 @@ static const struct mempolicy_operations {
/* Check that the nodemask contains at least one populated zone */
static int is_valid_nodemask(const nodemask_t *nodemask)
{
- int nd, k;
-
- for_each_node_mask(nd, *nodemask) {
- struct zone *z;
-
- for (k = 0; k <= policy_zone; k++) {
- z = &NODE_DATA(nd)->node_zones[k];
- if (z->present_pages > 0)
- return 1;
- }
- }
-
- return 0;
+ return nodes_intersects(*nodemask, node_states[N_MEMORY]);
}

static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
@@ -1572,6 +1560,26 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
return pol;
}

+static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
+{
+ enum zone_type dynamic_policy_zone = policy_zone;
+
+ BUG_ON(dynamic_policy_zone == ZONE_MOVABLE);
+
+ /*
+ * if policy->v.nodes has movable memory only,
+ * we apply policy when gfp_zone(gfp) = ZONE_MOVABLE only.
+ *
+ * policy->v.nodes is intersect with node_states[N_MEMORY].
+ * so if the following test faile, it implies
+ * policy->v.nodes has movable memory only.
+ */
+ if (!nodes_intersects(policy->v.nodes, node_states[N_HIGH_MEMORY]))
+ dynamic_policy_zone = ZONE_MOVABLE;
+
+ return zone >= dynamic_policy_zone;
+}
+
/*
* Return a nodemask representing a mempolicy for filtering nodes for
* page allocation
@@ -1580,7 +1588,7 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
{
/* Lower zones don't get a nodemask applied for MPOL_BIND */
if (unlikely(policy->mode == MPOL_BIND) &&
- gfp_zone(gfp) >= policy_zone &&
+ apply_policy_zone(policy, gfp_zone(gfp)) &&
cpuset_nodemask_valid_mems_allowed(&policy->v.nodes))
return &policy->v.nodes;

--
1.8.0

2012-10-31 18:21:39

by David Rientjes

[permalink] [raw]
Subject: Re: [PART6 Patch] mempolicy: fix is_valid_nodemask()

On Wed, 31 Oct 2012, Wen Congyang wrote:

> From: Lai Jiangshan <[email protected]>
>
> is_valid_nodemask() is introduced by 19770b32. but it does not match
> its comments, because it does not check the zone which > policy_zone.
>
> Also in b377fd, this commits told us, if highest zone is ZONE_MOVABLE,
> we should also apply memory policies to it. so ZONE_MOVABLE should be valid zone
> for policies. is_valid_nodemask() need to be changed to match it.
>
> Fix: check all zones, even its zoneid > policy_zone.
> Use nodes_intersects() instead open code to check it.
>

This changes the semantics of MPOL_BIND to be considerably different than
what it is today: slab allocations are no longer bound by such a policy
which isn't consistent with what userspace expects or is specified by
set_mempolicy() and there's no way, with your patch, to actually specify
that we don't care about ZONE_MOVABLE and that the slab allocations
_should_ actually be allocated on movable-only zones. You have to respect
cases where people aren't interested in node hotplug and not cause a
regression.

2012-11-02 06:05:39

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PART6 Patch] mempolicy: fix is_valid_nodemask()

(2012/11/01 3:21), David Rientjes wrote:
> On Wed, 31 Oct 2012, Wen Congyang wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> is_valid_nodemask() is introduced by 19770b32. but it does not match
>> its comments, because it does not check the zone which > policy_zone.
>>
>> Also in b377fd, this commits told us, if highest zone is ZONE_MOVABLE,
>> we should also apply memory policies to it. so ZONE_MOVABLE should be valid zone
>> for policies. is_valid_nodemask() need to be changed to match it.
>>
>> Fix: check all zones, even its zoneid > policy_zone.
>> Use nodes_intersects() instead open code to check it.
>>
>
> This changes the semantics of MPOL_BIND to be considerably different than
> what it is today: slab allocations are no longer bound by such a policy
> which isn't consistent with what userspace expects or is specified by
> set_mempolicy() and there's no way, with your patch, to actually specify
> that we don't care about ZONE_MOVABLE and that the slab allocations
> _should_ actually be allocated on movable-only zones. You have to respect
> cases where people aren't interested in node hotplug and not cause a
> regression.
>

I'm sorry if I misunderstand somehing....
I think people doesn't insterested in node-hotplug will never have MOVABLE_ZONE.
What causes regression ?

Thanks,
-Kame



2012-11-02 06:22:21

by Wen Congyang

[permalink] [raw]
Subject: Re: [PART6 Patch] mempolicy: fix is_valid_nodemask()

At 11/01/2012 02:21 AM, David Rientjes Wrote:
> On Wed, 31 Oct 2012, Wen Congyang wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> is_valid_nodemask() is introduced by 19770b32. but it does not match
>> its comments, because it does not check the zone which > policy_zone.
>>
>> Also in b377fd, this commits told us, if highest zone is ZONE_MOVABLE,
>> we should also apply memory policies to it. so ZONE_MOVABLE should be valid zone
>> for policies. is_valid_nodemask() need to be changed to match it.
>>
>> Fix: check all zones, even its zoneid > policy_zone.
>> Use nodes_intersects() instead open code to check it.
>>
>
> This changes the semantics of MPOL_BIND to be considerably different than
> what it is today: slab allocations are no longer bound by such a policy
> which isn't consistent with what userspace expects or is specified by
> set_mempolicy() and there's no way, with your patch, to actually specify
> that we don't care about ZONE_MOVABLE and that the slab allocations
> _should_ actually be allocated on movable-only zones. You have to respect
> cases where people aren't interested in node hotplug and not cause a
> regression.
>

Should we allow the user to bind a task to a node which has only ZONE_MOVABLE memory?

Thanks
Wen Congyang