2022-01-11 01:03:08

by Wei Yang

[permalink] [raw]
Subject: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

Instead of use "-1", let's use NUMA_NO_NODE for consistency.

Signed-off-by: Wei Yang <[email protected]>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ed5f2a0879d..11715f7323c0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
* function.
*/
if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
+ tmp = NUMA_NO_NODE;
pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
if (!pn)
return 1;
--
2.33.1



2022-01-11 01:03:10

by Wei Yang

[permalink] [raw]
Subject: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

kzalloc_node() would set data to 0, so it's not necessary to set it
again.

Signed-off-by: Wei Yang <[email protected]>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 11715f7323c0..a504616f904a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
}

lruvec_init(&pn->lruvec);
- pn->usage_in_excess = 0;
- pn->on_tree = false;
pn->memcg = memcg;

memcg->nodeinfo[node] = pn;
--
2.33.1


2022-01-11 01:03:13

by Wei Yang

[permalink] [raw]
Subject: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

The parent we get from page_counter is correct, while this is two
different hierarchy.

Let's retrieve the parent memcg from css.parent just like parent_cs(),
blkcg_parent(), etc.

Signed-off-by: Wei Yang <[email protected]>
---
include/linux/memcontrol.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0c5c403f4be6..12bf443f7b14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -886,9 +886,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec)
*/
static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
{
- if (!memcg->memory.parent)
- return NULL;
- return mem_cgroup_from_counter(memcg->memory.parent, memory);
+ return mem_cgroup_from_css(memcg->css.parent);
}

static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
--
2.33.1


2022-01-11 01:03:14

by Wei Yang

[permalink] [raw]
Subject: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation

mem_cgroup_threshold_ary->current_threshold points to the last entry
who's threshold is less or equal to usage.

Instead of iterating entries to get the correct index, we can leverage
primary->current_threshold to get it. If the threshold added is less or
equal to usage, current_threshold should increase by one. Otherwise, it
doesn't change.

Signed-off-by: Wei Yang <[email protected]>
---
mm/memcontrol.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a504616f904a..ce7060907df2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4161,7 +4161,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
struct mem_cgroup_threshold_ary *new;
unsigned long threshold;
unsigned long usage;
- int i, size, ret;
+ int size, ret;

ret = page_counter_memparse(args, "-1", &threshold);
if (ret)
@@ -4193,9 +4193,13 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
new->size = size;

/* Copy thresholds (if any) to new array */
- if (thresholds->primary)
+ if (thresholds->primary) {
memcpy(new->entries, thresholds->primary->entries,
flex_array_size(new, entries, size - 1));
+ new->current_threshold = thresholds->primary->current_threshold;
+ } else {
+ new->current_threshold = -1;
+ }

/* Add new threshold */
new->entries[size - 1].eventfd = eventfd;
@@ -4205,18 +4209,17 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
sort(new->entries, size, sizeof(*new->entries),
compare_thresholds, NULL);

- /* Find current threshold */
- new->current_threshold = -1;
- for (i = 0; i < size; i++) {
- if (new->entries[i].threshold <= usage) {
- /*
- * new->current_threshold will not be used until
- * rcu_assign_pointer(), so it's safe to increment
- * it here.
- */
- ++new->current_threshold;
- } else
- break;
+ /*
+ * If the threshold added here is less or equal to usage, this means
+ * current_threshold need to increase by one.
+ */
+ if (threshold <= usage) {
+ /*
+ * new->current_threshold will not be used until
+ * rcu_assign_pointer(), so it's safe to increment
+ * it here.
+ */
+ new->current_threshold++;
}

/* Free old spare buffer and save old primary buffer as spare */
--
2.33.1


2022-01-11 02:24:36

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <[email protected]> wrote:
>
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.

2022-01-11 02:26:15

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <[email protected]> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.

2022-01-11 03:13:22

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <[email protected]> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.

2022-01-11 08:40:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Tue 11-01-22 01:02:59, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <[email protected]>

I am not really sure this is worth it. After the merge window I plan to
post http://lkml.kernel.org/r/[email protected].
With that in place we can drop the check and a node rewrite so the net
result will be a less and more straightforward code. If you agree I can
add this with your s-o-b into my series:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
- int tmp = node;
- /*
- * This routine is called against possible nodes.
- * But it's BUG to call kmalloc() against offline node.
- *
- * TODO: this routine can waste much memory for nodes which will
- * never be onlined. It's better to use memory hotplug callback
- * function.
- */
- if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+ pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
if (!pn)
return 1;

--
Michal Hocko
SUSE Labs

2022-01-11 08:41:36

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

On Tue 11-01-22 01:03:00, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memcontrol.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 11715f7323c0..a504616f904a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> }
>
> lruvec_init(&pn->lruvec);
> - pn->usage_in_excess = 0;
> - pn->on_tree = false;
> pn->memcg = memcg;
>
> memcg->nodeinfo[node] = pn;
> --
> 2.33.1

--
Michal Hocko
SUSE Labs

2022-01-11 08:43:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

On Tue 11-01-22 01:03:01, Wei Yang wrote:
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> include/linux/memcontrol.h | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0c5c403f4be6..12bf443f7b14 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -886,9 +886,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec)
> */
> static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
> {
> - if (!memcg->memory.parent)
> - return NULL;
> - return mem_cgroup_from_counter(memcg->memory.parent, memory);
> + return mem_cgroup_from_css(memcg->css.parent);
> }
>
> static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
> --
> 2.33.1

--
Michal Hocko
SUSE Labs

2022-01-11 08:44:49

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation

On Tue 11-01-22 01:03:02, Wei Yang wrote:
> mem_cgroup_threshold_ary->current_threshold points to the last entry
> who's threshold is less or equal to usage.
>
> Instead of iterating entries to get the correct index, we can leverage
> primary->current_threshold to get it. If the threshold added is less or
> equal to usage, current_threshold should increase by one. Otherwise, it
> doesn't change.

Why do we want/need this change?

> Signed-off-by: Wei Yang <[email protected]>
> ---
> mm/memcontrol.c | 31 +++++++++++++++++--------------
> 1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a504616f904a..ce7060907df2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4161,7 +4161,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
> struct mem_cgroup_threshold_ary *new;
> unsigned long threshold;
> unsigned long usage;
> - int i, size, ret;
> + int size, ret;
>
> ret = page_counter_memparse(args, "-1", &threshold);
> if (ret)
> @@ -4193,9 +4193,13 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
> new->size = size;
>
> /* Copy thresholds (if any) to new array */
> - if (thresholds->primary)
> + if (thresholds->primary) {
> memcpy(new->entries, thresholds->primary->entries,
> flex_array_size(new, entries, size - 1));
> + new->current_threshold = thresholds->primary->current_threshold;
> + } else {
> + new->current_threshold = -1;
> + }
>
> /* Add new threshold */
> new->entries[size - 1].eventfd = eventfd;
> @@ -4205,18 +4209,17 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
> sort(new->entries, size, sizeof(*new->entries),
> compare_thresholds, NULL);
>
> - /* Find current threshold */
> - new->current_threshold = -1;
> - for (i = 0; i < size; i++) {
> - if (new->entries[i].threshold <= usage) {
> - /*
> - * new->current_threshold will not be used until
> - * rcu_assign_pointer(), so it's safe to increment
> - * it here.
> - */
> - ++new->current_threshold;
> - } else
> - break;
> + /*
> + * If the threshold added here is less or equal to usage, this means
> + * current_threshold need to increase by one.
> + */
> + if (threshold <= usage) {
> + /*
> + * new->current_threshold will not be used until
> + * rcu_assign_pointer(), so it's safe to increment
> + * it here.
> + */
> + new->current_threshold++;
> }
>
> /* Free old spare buffer and save old primary buffer as spare */
> --
> 2.33.1

--
Michal Hocko
SUSE Labs

2022-01-11 18:06:56

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <[email protected]>

If the patch won't be dropped for Michal's work, please,
feel free to add
Acked-by: Roman Gushchin <[email protected]>
Thanks!

2022-01-11 18:08:10

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

On Tue, Jan 11, 2022 at 01:03:00AM +0000, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Roman Gushchin <[email protected]>

Thanks!

2022-01-11 18:10:23

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.

Does it bring any benefits except consistency?

>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Roman Gushchin <[email protected]>

Thanks!

2022-01-11 18:24:11

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation

On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
> mem_cgroup_threshold_ary->current_threshold points to the last entry
> who's threshold is less or equal to usage.
>
> Instead of iterating entries to get the correct index, we can leverage
> primary->current_threshold to get it. If the threshold added is less or
> equal to usage, current_threshold should increase by one. Otherwise, it
> doesn't change.

How big is usually an array of thresholds? If it's not huge, likely
any savings won't be really noticeable (it's not a hot path and there
is an rc_synchronize() below).

So I agree with Michal that a better justification is really needed.

Thanks!

2022-01-12 00:24:27

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

On Tue, Jan 11, 2022 at 10:09:52AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
>> The parent we get from page_counter is correct, while this is two
>> different hierarchy.
>>
>> Let's retrieve the parent memcg from css.parent just like parent_cs(),
>> blkcg_parent(), etc.
>
>Does it bring any benefits except consistency?
>

I am afraid no.

>>
>> Signed-off-by: Wei Yang <[email protected]>
>
>Reviewed-by: Roman Gushchin <[email protected]>
>
>Thanks!

--
Wei Yang
Help you, Help me

2022-01-12 00:25:56

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation

On Tue, Jan 11, 2022 at 10:23:41AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
>> mem_cgroup_threshold_ary->current_threshold points to the last entry
>> who's threshold is less or equal to usage.
>>
>> Instead of iterating entries to get the correct index, we can leverage
>> primary->current_threshold to get it. If the threshold added is less or
>> equal to usage, current_threshold should increase by one. Otherwise, it
>> doesn't change.
>
>How big is usually an array of thresholds? If it's not huge, likely
>any savings won't be really noticeable (it's not a hot path and there
>is an rc_synchronize() below).

Usually the size is small I think.

>
>So I agree with Michal that a better justification is really needed.

Yep.

>
>Thanks!

--
Wei Yang
Help you, Help me

2022-01-12 00:46:42

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>>
>> Signed-off-by: Wei Yang <[email protected]>
>
>I am not really sure this is worth it. After the merge window I plan to
>post http://lkml.kernel.org/r/[email protected].

Give me some time to understand it :-)


2022-01-12 08:56:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Wed 12-01-22 00:46:34, Wei Yang wrote:
> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >>
> >> Signed-off-by: Wei Yang <[email protected]>
> >
> >I am not really sure this is worth it. After the merge window I plan to
> >post http://lkml.kernel.org/r/[email protected].
>
> Give me some time to understand it :-)

Just for the record, here is what I have put on top of that series:
---
From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
From: Wei Yang <[email protected]>
Date: Tue, 11 Jan 2022 09:45:25 +0100
Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info

alloc_mem_cgroup_per_node_info is allocated for each possible node and
this used to be a problem because not !node_online nodes didn't have
appropriate data structure allocated. This has changed by "mm: handle
uninitialized numa nodes gracefully" so we can drop the special casing
here.

Signed-off-by: Wei Yang <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
- int tmp = node;
- /*
- * This routine is called against possible nodes.
- * But it's BUG to call kmalloc() against offline node.
- *
- * TODO: this routine can waste much memory for nodes which will
- * never be onlined. It's better to use memory hotplug callback
- * function.
- */
- if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+ pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
if (!pn)
return 1;

--
2.30.2


--
Michal Hocko
SUSE Labs

2022-01-14 00:29:46

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >>
>> >> Signed-off-by: Wei Yang <[email protected]>
>> >
>> >I am not really sure this is worth it. After the merge window I plan to
>> >post http://lkml.kernel.org/r/[email protected].
>>
>> Give me some time to understand it :-)
>
>Just for the record, here is what I have put on top of that series:

Ok, I got what you try to resolve. I am ok with the following change except
one point.

>---
>>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>From: Wei Yang <[email protected]>
>Date: Tue, 11 Jan 2022 09:45:25 +0100
>Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
>alloc_mem_cgroup_per_node_info is allocated for each possible node and
>this used to be a problem because not !node_online nodes didn't have
>appropriate data structure allocated. This has changed by "mm: handle
>uninitialized numa nodes gracefully" so we can drop the special casing
>here.
>
>Signed-off-by: Wei Yang <[email protected]>
>Signed-off-by: Michal Hocko <[email protected]>
>---
> mm/memcontrol.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 781605e92015..ed19a21ee14e 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> {
> struct mem_cgroup_per_node *pn;
>- int tmp = node;
>- /*
>- * This routine is called against possible nodes.
>- * But it's BUG to call kmalloc() against offline node.
>- *
>- * TODO: this routine can waste much memory for nodes which will
>- * never be onlined. It's better to use memory hotplug callback
>- * function.
>- */

Do you think this TODO is not related to this change?

>- if (!node_state(node, N_NORMAL_MEMORY))
>- tmp = -1;
>- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>+
>+ pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
> if (!pn)
> return 1;
>
>--
>2.30.2
>
>
>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2022-01-14 08:51:43

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Fri 14-01-22 00:29:37, Wei Yang wrote:
> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >> >>
> >> >> Signed-off-by: Wei Yang <[email protected]>
> >> >
> >> >I am not really sure this is worth it. After the merge window I plan to
> >> >post http://lkml.kernel.org/r/[email protected].
> >>
> >> Give me some time to understand it :-)
> >
> >Just for the record, here is what I have put on top of that series:
>
> Ok, I got what you try to resolve. I am ok with the following change except
> one point.
>
> >---
> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> >From: Wei Yang <[email protected]>
> >Date: Tue, 11 Jan 2022 09:45:25 +0100
> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
> >
> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
> >this used to be a problem because not !node_online nodes didn't have
> >appropriate data structure allocated. This has changed by "mm: handle
> >uninitialized numa nodes gracefully" so we can drop the special casing
> >here.
> >
> >Signed-off-by: Wei Yang <[email protected]>
> >Signed-off-by: Michal Hocko <[email protected]>
> >---
> > mm/memcontrol.c | 14 ++------------
> > 1 file changed, 2 insertions(+), 12 deletions(-)
> >
> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >index 781605e92015..ed19a21ee14e 100644
> >--- a/mm/memcontrol.c
> >+++ b/mm/memcontrol.c
> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> > {
> > struct mem_cgroup_per_node *pn;
> >- int tmp = node;
> >- /*
> >- * This routine is called against possible nodes.
> >- * But it's BUG to call kmalloc() against offline node.
> >- *
> >- * TODO: this routine can waste much memory for nodes which will
> >- * never be onlined. It's better to use memory hotplug callback
> >- * function.
> >- */
>
> Do you think this TODO is not related to this change?

It is not really related but I am not sure how useful it is. Essentially
any allocation that is per-possible node is in the same situation and if
we really need to deal with large and sparse possible nodes masks.

If you want me to keep the TODO I will do it though.

--
Michal Hocko
SUSE Labs

2022-01-14 11:07:17

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Mike Rapoport <[email protected]>

> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2ed5f2a0879d..11715f7323c0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> * function.
> */
> if (!node_state(node, N_NORMAL_MEMORY))
> - tmp = -1;
> + tmp = NUMA_NO_NODE;
> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> if (!pn)
> return 1;
> --
> 2.33.1
>
>

--
Sincerely yours,
Mike.

2022-01-14 11:08:39

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

On Tue, Jan 11, 2022 at 01:03:00AM +0000, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Mike Rapoport <[email protected]>

> ---
> mm/memcontrol.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 11715f7323c0..a504616f904a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> }
>
> lruvec_init(&pn->lruvec);
> - pn->usage_in_excess = 0;
> - pn->on_tree = false;
> pn->memcg = memcg;
>
> memcg->nodeinfo[node] = pn;
> --
> 2.33.1
>
>

--
Sincerely yours,
Mike.

2022-01-16 16:23:25

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Fri, Jan 14, 2022 at 09:51:31AM +0100, Michal Hocko wrote:
>On Fri 14-01-22 00:29:37, Wei Yang wrote:
>> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >> >>
>> >> >> Signed-off-by: Wei Yang <[email protected]>
>> >> >
>> >> >I am not really sure this is worth it. After the merge window I plan to
>> >> >post http://lkml.kernel.org/r/[email protected].
>> >>
>> >> Give me some time to understand it :-)
>> >
>> >Just for the record, here is what I have put on top of that series:
>>
>> Ok, I got what you try to resolve. I am ok with the following change except
>> one point.
>>
>> >---
>> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>> >From: Wei Yang <[email protected]>
>> >Date: Tue, 11 Jan 2022 09:45:25 +0100
>> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>> >
>> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
>> >this used to be a problem because not !node_online nodes didn't have
>> >appropriate data structure allocated. This has changed by "mm: handle
>> >uninitialized numa nodes gracefully" so we can drop the special casing
>> >here.
>> >
>> >Signed-off-by: Wei Yang <[email protected]>
>> >Signed-off-by: Michal Hocko <[email protected]>
>> >---
>> > mm/memcontrol.c | 14 ++------------
>> > 1 file changed, 2 insertions(+), 12 deletions(-)
>> >
>> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> >index 781605e92015..ed19a21ee14e 100644
>> >--- a/mm/memcontrol.c
>> >+++ b/mm/memcontrol.c
>> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>> > {
>> > struct mem_cgroup_per_node *pn;
>> >- int tmp = node;
>> >- /*
>> >- * This routine is called against possible nodes.
>> >- * But it's BUG to call kmalloc() against offline node.
>> >- *
>> >- * TODO: this routine can waste much memory for nodes which will
>> >- * never be onlined. It's better to use memory hotplug callback
>> >- * function.
>> >- */
>>
>> Do you think this TODO is not related to this change?
>
>It is not really related but I am not sure how useful it is. Essentially
>any allocation that is per-possible node is in the same situation and if
>we really need to deal with large and sparse possible nodes masks.
>

Sounds reasonable :-)

>If you want me to keep the TODO I will do it though.
>
>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2022-01-17 09:42:59

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Wed, Jan 12, 2022 at 12:56 AM Michal Hocko <[email protected]> wrote:
>
> On Wed 12-01-22 00:46:34, Wei Yang wrote:
> > On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> > >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> > >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> > >>
> > >> Signed-off-by: Wei Yang <[email protected]>
> > >
> > >I am not really sure this is worth it. After the merge window I plan to
> > >post http://lkml.kernel.org/r/[email protected].
> >
> > Give me some time to understand it :-)
>
> Just for the record, here is what I have put on top of that series:
> ---
> From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> From: Wei Yang <[email protected]>
> Date: Tue, 11 Jan 2022 09:45:25 +0100
> Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
> alloc_mem_cgroup_per_node_info is allocated for each possible node and
> this used to be a problem because not !node_online nodes didn't have
> appropriate data structure allocated. This has changed by "mm: handle
> uninitialized numa nodes gracefully" so we can drop the special casing
> here.
>
> Signed-off-by: Wei Yang <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2022-01-17 09:43:21

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <[email protected]> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2022-01-17 10:28:48

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <[email protected]> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2022-02-01 15:16:49

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

Hi, Andrew

Would you pick up this patch set, or prefer me to send a v2?

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
>Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
>Signed-off-by: Wei Yang <[email protected]>
>---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 2ed5f2a0879d..11715f7323c0 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> * function.
> */
> if (!node_state(node, N_NORMAL_MEMORY))
>- tmp = -1;
>+ tmp = NUMA_NO_NODE;
> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> if (!pn)
> return 1;
>--
>2.33.1

--
Wei Yang
Help you, Help me

2022-02-01 20:52:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <[email protected]> wrote:

> Hi, Andrew
>
> Would you pick up this patch set, or prefer me to send a v2?
>

It's unclear to me what's happening with [4/4]. At least a new
changelog with more justification is expected?

So yes, please resend?

2022-02-02 00:05:48

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node

On Mon, Jan 31, 2022 at 02:36:20PM -0800, Andrew Morton wrote:
>On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <[email protected]> wrote:
>
>> Hi, Andrew
>>
>> Would you pick up this patch set, or prefer me to send a v2?
>>
>
>It's unclear to me what's happening with [4/4]. At least a new
>changelog with more justification is expected?
>

As Michal and Roman suggested, this is not a hot-path. I would drop this one.

>So yes, please resend?

Sure.

--
Wei Yang
Help you, Help me