2022-08-19 08:33:39

by Muchun Song

[permalink] [raw]
Subject: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal

The following commit offload per-node sysfs creation and removal to a kworker and
did not say why it is needed. And it also said "I don't know that this is
absolutely required". It seems like the author was not sure as well. Since it
only complicates the code, this patch will revert the changes to simplify the code.

39da08cb074c ("hugetlb: offload per node attribute registrations")

We could use memory hotplug notifier to do per-node sysfs creation and removal
instead of inserting those operations to node registration and unregistration.
Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
simplify the code.

Signed-off-by: Muchun Song <[email protected]>
---

Cc Andi per Greg.

drivers/base/node.c | 139 ++-------------------------------------------------
include/linux/node.h | 24 ++-------
mm/hugetlb.c | 45 +++++++++++------
3 files changed, 36 insertions(+), 172 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index eb0f43784c2b..ed391cb09999 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -587,64 +587,9 @@ static const struct attribute_group *node_dev_groups[] = {
NULL
};

-#ifdef CONFIG_HUGETLBFS
-/*
- * hugetlbfs per node attributes registration interface:
- * When/if hugetlb[fs] subsystem initializes [sometime after this module],
- * it will register its per node attributes for all online nodes with
- * memory. It will also call register_hugetlbfs_with_node(), below, to
- * register its attribute registration functions with this node driver.
- * Once these hooks have been initialized, the node driver will call into
- * the hugetlb module to [un]register attributes for hot-plugged nodes.
- */
-static node_registration_func_t __hugetlb_register_node;
-static node_registration_func_t __hugetlb_unregister_node;
-
-static inline bool hugetlb_register_node(struct node *node)
-{
- if (__hugetlb_register_node &&
- node_state(node->dev.id, N_MEMORY)) {
- __hugetlb_register_node(node);
- return true;
- }
- return false;
-}
-
-static inline void hugetlb_unregister_node(struct node *node)
-{
- if (__hugetlb_unregister_node)
- __hugetlb_unregister_node(node);
-}
-
-void register_hugetlbfs_with_node(node_registration_func_t doregister,
- node_registration_func_t unregister)
-{
- __hugetlb_register_node = doregister;
- __hugetlb_unregister_node = unregister;
-}
-#else
-static inline void hugetlb_register_node(struct node *node) {}
-
-static inline void hugetlb_unregister_node(struct node *node) {}
-#endif
-
static void node_device_release(struct device *dev)
{
- struct node *node = to_node(dev);
-
-#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
- /*
- * We schedule the work only when a memory section is
- * onlined/offlined on this node. When we come here,
- * all the memory on this node has been offlined,
- * so we won't enqueue new work to this work.
- *
- * The work is using node->node_work, so we should
- * flush work before freeing the memory.
- */
- flush_work(&node->node_work);
-#endif
- kfree(node);
+ kfree(to_node(dev));
}

/*
@@ -665,11 +610,9 @@ static int register_node(struct node *node, int num)

if (error)
put_device(&node->dev);
- else {
- hugetlb_register_node(node);
-
+ else
compaction_register_node(node);
- }
+
return error;
}

@@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
void unregister_node(struct node *node)
{
compaction_unregister_node(node);
- hugetlb_unregister_node(node); /* no-op, if memoryless node */
node_remove_accesses(node);
node_remove_caches(node);
device_unregister(&node->dev);
@@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
(void *)&nid, func);
return;
}
-
-#ifdef CONFIG_HUGETLBFS
-/*
- * Handle per node hstate attribute [un]registration on transistions
- * to/from memoryless state.
- */
-static void node_hugetlb_work(struct work_struct *work)
-{
- struct node *node = container_of(work, struct node, node_work);
-
- /*
- * We only get here when a node transitions to/from memoryless state.
- * We can detect which transition occurred by examining whether the
- * node has memory now. hugetlb_register_node() already check this
- * so we try to register the attributes. If that fails, then the
- * node has transitioned to memoryless, try to unregister the
- * attributes.
- */
- if (!hugetlb_register_node(node))
- hugetlb_unregister_node(node);
-}
-
-static void init_node_hugetlb_work(int nid)
-{
- INIT_WORK(&node_devices[nid]->node_work, node_hugetlb_work);
-}
-
-static int node_memory_callback(struct notifier_block *self,
- unsigned long action, void *arg)
-{
- struct memory_notify *mnb = arg;
- int nid = mnb->status_change_nid;
-
- switch (action) {
- case MEM_ONLINE:
- case MEM_OFFLINE:
- /*
- * offload per node hstate [un]registration to a work thread
- * when transitioning to/from memoryless state.
- */
- if (nid != NUMA_NO_NODE)
- schedule_work(&node_devices[nid]->node_work);
- break;
-
- case MEM_GOING_ONLINE:
- case MEM_GOING_OFFLINE:
- case MEM_CANCEL_ONLINE:
- case MEM_CANCEL_OFFLINE:
- default:
- break;
- }
-
- return NOTIFY_OK;
-}
-#endif /* CONFIG_HUGETLBFS */
#endif /* CONFIG_MEMORY_HOTPLUG */

-#if !defined(CONFIG_MEMORY_HOTPLUG) || !defined(CONFIG_HUGETLBFS)
-static inline int node_memory_callback(struct notifier_block *self,
- unsigned long action, void *arg)
-{
- return NOTIFY_OK;
-}
-
-static void init_node_hugetlb_work(int nid) { }
-
-#endif
-
int __register_one_node(int nid)
{
int error;
@@ -991,8 +867,6 @@ int __register_one_node(int nid)
}

INIT_LIST_HEAD(&node_devices[nid]->access_list);
- /* initialize work queue for memory hot plug */
- init_node_hugetlb_work(nid);
node_init_caches(nid);

return error;
@@ -1063,13 +937,8 @@ static const struct attribute_group *cpu_root_attr_groups[] = {
NULL,
};

-#define NODE_CALLBACK_PRI 2 /* lower than SLAB */
void __init node_dev_init(void)
{
- static struct notifier_block node_memory_callback_nb = {
- .notifier_call = node_memory_callback,
- .priority = NODE_CALLBACK_PRI,
- };
int ret, i;

BUILD_BUG_ON(ARRAY_SIZE(node_state_attr) != NR_NODE_STATES);
@@ -1079,8 +948,6 @@ void __init node_dev_init(void)
if (ret)
panic("%s() failed to register subsystem: %d\n", __func__, ret);

- register_hotmemory_notifier(&node_memory_callback_nb);
-
/*
* Create all node devices, which will properly link the node
* to applicable memory block devices and already created cpu devices.
diff --git a/include/linux/node.h b/include/linux/node.h
index 40d641a8bfb0..ea817b507f54 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -2,15 +2,15 @@
/*
* include/linux/node.h - generic node definition
*
- * This is mainly for topological representation. We define the
- * basic 'struct node' here, which can be embedded in per-arch
+ * This is mainly for topological representation. We define the
+ * basic 'struct node' here, which can be embedded in per-arch
* definitions of processors.
*
* Basic handling of the devices is done in drivers/base/node.c
- * and system devices are handled in drivers/base/sys.c.
+ * and system devices are handled in drivers/base/sys.c.
*
* Nodes are exported via driverfs in the class/node/devices/
- * directory.
+ * directory.
*/
#ifndef _LINUX_NODE_H_
#define _LINUX_NODE_H_
@@ -18,7 +18,6 @@
#include <linux/device.h>
#include <linux/cpumask.h>
#include <linux/list.h>
-#include <linux/workqueue.h>

/**
* struct node_hmem_attrs - heterogeneous memory performance attributes
@@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
struct node {
struct device dev;
struct list_head access_list;
-
-#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
- struct work_struct node_work;
-#endif
#ifdef CONFIG_HMEM_REPORTING
struct list_head cache_attrs;
struct device *cache_dev;
@@ -96,7 +91,6 @@ struct node {

struct memory_block;
extern struct node *node_devices[];
-typedef void (*node_registration_func_t)(struct node *);

#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
@@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
extern int register_memory_node_under_compute_node(unsigned int mem_nid,
unsigned int cpu_nid,
unsigned access);
-
-#ifdef CONFIG_HUGETLBFS
-extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
- node_registration_func_t unregister);
-#endif
#else
static inline void node_dev_init(void)
{
@@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
{
}
-
-static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
- node_registration_func_t unreg)
-{
-}
#endif

#define to_node(device) container_of(device, struct node, dev)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 536a52c29035..9a72499486c1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -33,6 +33,7 @@
#include <linux/migrate.h>
#include <linux/nospec.h>
#include <linux/delayacct.h>
+#include <linux/memory.h>

#include <asm/page.h>
#include <asm/pgalloc.h>
@@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
* Register hstate attributes for a single node device.
* No-op if attributes already registered.
*/
-static void hugetlb_register_node(struct node *node)
+static int hugetlb_register_node(struct node *node)
{
struct hstate *h;
struct node_hstate *nhs = &node_hstates[node->dev.id];
int err;

if (nhs->hugepages_kobj)
- return; /* already allocated */
+ return 0; /* already allocated */

nhs->hugepages_kobj = kobject_create_and_add("hugepages",
&node->dev.kobj);
if (!nhs->hugepages_kobj)
- return;
+ return -ENOMEM;

for_each_hstate(h) {
err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
@@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
h->name, node->dev.id);
hugetlb_unregister_node(node);
- break;
+ return -ENOMEM;
}
}
+ return 0;
+}
+
+static int __meminit hugetlb_memory_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ int ret = 0;
+ struct memory_notify *mnb = arg;
+ int nid = mnb->status_change_nid;
+
+ if (nid == NUMA_NO_NODE)
+ return NOTIFY_DONE;
+
+ if (action == MEM_GOING_ONLINE)
+ ret = hugetlb_register_node(node_devices[nid]);
+ else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
+ hugetlb_unregister_node(node_devices[nid]);
+
+ return notifier_from_errno(ret);
}

/*
@@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
{
int nid;

- for_each_node_state(nid, N_MEMORY) {
- struct node *node = node_devices[nid];
- if (node->dev.id == nid)
- hugetlb_register_node(node);
- }
-
- /*
- * Let the node device driver know we're here so it can
- * [un]register hstate attributes on node hotplug.
- */
- register_hugetlbfs_with_node(hugetlb_register_node,
- hugetlb_unregister_node);
+ get_online_mems();
+ hotplug_memory_notifier(hugetlb_memory_callback, 0);
+ for_each_node_state(nid, N_MEMORY)
+ hugetlb_register_node(node_devices[nid]);
+ put_online_mems();
}
#else /* !CONFIG_NUMA */

--
2.11.0


2022-08-20 00:31:44

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal

On 08/19/22 16:00, Muchun Song wrote:
> The following commit offload per-node sysfs creation and removal to a kworker and
> did not say why it is needed. And it also said "I don't know that this is
> absolutely required". It seems like the author was not sure as well. Since it
> only complicates the code, this patch will revert the changes to simplify the code.
>
> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>
> We could use memory hotplug notifier to do per-node sysfs creation and removal
> instead of inserting those operations to node registration and unregistration.
> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
> simplify the code.
>
> Signed-off-by: Muchun Song <[email protected]>
> ---

Looking at commit 39da08cb074c, it mentions that other memory register/unregister
handlers use work queues. That seemed to be true at the time, but those handlers
have been rewritten or no longer exist. With a quick look, I did not find any
memory notifiers doing so today.

I certainly like decoupling node driver and hugetlb code. By no means am I an
expert in the hotplug interfaces. I could not find any issues in moving the
functionality to memory hotplug notification. FWICT, functionality should be

>
> Cc Andi per Greg.
>
> drivers/base/node.c | 139 ++-------------------------------------------------
> include/linux/node.h | 24 ++-------
> mm/hugetlb.c | 45 +++++++++++------
> 3 files changed, 36 insertions(+), 172 deletions(-)

Acked-by: Mike Kravetz <[email protected]>
--
Mike Kravetz

2022-08-23 15:31:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal

On 19.08.22 10:00, Muchun Song wrote:
> The following commit offload per-node sysfs creation and removal to a kworker and
> did not say why it is needed. And it also said "I don't know that this is
> absolutely required". It seems like the author was not sure as well. Since it
> only complicates the code, this patch will revert the changes to simplify the code.
>
> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>
> We could use memory hotplug notifier to do per-node sysfs creation and removal
> instead of inserting those operations to node registration and unregistration.
> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
> simplify the code.
>
> Signed-off-by: Muchun Song <[email protected]>


[...]

> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
> void unregister_node(struct node *node)
> {
> compaction_unregister_node(node);
> - hugetlb_unregister_node(node); /* no-op, if memoryless node */
> node_remove_accesses(node);
> node_remove_caches(node);
> device_unregister(&node->dev);
> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
> (void *)&nid, func);
> return;
> }

[...]

> /*
> * Create all node devices, which will properly link the node
> * to applicable memory block devices and already created cpu devices.
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 40d641a8bfb0..ea817b507f54 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -2,15 +2,15 @@
> /*
> * include/linux/node.h - generic node definition
> *
> - * This is mainly for topological representation. We define the
> - * basic 'struct node' here, which can be embedded in per-arch
> + * This is mainly for topological representation. We define the
> + * basic 'struct node' here, which can be embedded in per-arch
> * definitions of processors.
> *
> * Basic handling of the devices is done in drivers/base/node.c
> - * and system devices are handled in drivers/base/sys.c.
> + * and system devices are handled in drivers/base/sys.c.
> *
> * Nodes are exported via driverfs in the class/node/devices/
> - * directory.
> + * directory.

Unrelated changes.

> */
> #ifndef _LINUX_NODE_H_
> #define _LINUX_NODE_H_
> @@ -18,7 +18,6 @@
> #include <linux/device.h>
> #include <linux/cpumask.h>
> #include <linux/list.h>
> -#include <linux/workqueue.h>
>
> /**
> * struct node_hmem_attrs - heterogeneous memory performance attributes
> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
> struct node {
> struct device dev;
> struct list_head access_list;
> -
> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
> - struct work_struct node_work;
> -#endif
> #ifdef CONFIG_HMEM_REPORTING
> struct list_head cache_attrs;
> struct device *cache_dev;
> @@ -96,7 +91,6 @@ struct node {
>
> struct memory_block;
> extern struct node *node_devices[];
> -typedef void (*node_registration_func_t)(struct node *);
>
> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
> void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> unsigned int cpu_nid,
> unsigned access);
> -
> -#ifdef CONFIG_HUGETLBFS
> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
> - node_registration_func_t unregister);
> -#endif
> #else
> static inline void node_dev_init(void)
> {
> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
> }
> -
> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
> - node_registration_func_t unreg)
> -{
> -}
> #endif
>
> #define to_node(device) container_of(device, struct node, dev)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 536a52c29035..9a72499486c1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
> #include <linux/migrate.h>
> #include <linux/nospec.h>
> #include <linux/delayacct.h>
> +#include <linux/memory.h>
>
> #include <asm/page.h>
> #include <asm/pgalloc.h>
> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
> * Register hstate attributes for a single node device.
> * No-op if attributes already registered.
> */
> -static void hugetlb_register_node(struct node *node)
> +static int hugetlb_register_node(struct node *node)
> {
> struct hstate *h;
> struct node_hstate *nhs = &node_hstates[node->dev.id];
> int err;
>
> if (nhs->hugepages_kobj)
> - return; /* already allocated */
> + return 0; /* already allocated */
>
> nhs->hugepages_kobj = kobject_create_and_add("hugepages",
> &node->dev.kobj);
> if (!nhs->hugepages_kobj)
> - return;
> + return -ENOMEM;
>
> for_each_hstate(h) {
> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
> pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
> h->name, node->dev.id);
> hugetlb_unregister_node(node);
> - break;
> + return -ENOMEM;
> }
> }
> + return 0;
> +}
> +
> +static int __meminit hugetlb_memory_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + int ret = 0;
> + struct memory_notify *mnb = arg;
> + int nid = mnb->status_change_nid;
> +
> + if (nid == NUMA_NO_NODE)
> + return NOTIFY_DONE;
> +
> + if (action == MEM_GOING_ONLINE)
> + ret = hugetlb_register_node(node_devices[nid]);
> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
> + hugetlb_unregister_node(node_devices[nid]);
> +
> + return notifier_from_errno(ret);
> }
>
> /*
> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
> {
> int nid;
>
> - for_each_node_state(nid, N_MEMORY) {
> - struct node *node = node_devices[nid];
> - if (node->dev.id == nid)
> - hugetlb_register_node(node);
> - }
> -
> - /*
> - * Let the node device driver know we're here so it can
> - * [un]register hstate attributes on node hotplug.
> - */
> - register_hugetlbfs_with_node(hugetlb_register_node,
> - hugetlb_unregister_node);
> + get_online_mems();
> + hotplug_memory_notifier(hugetlb_memory_callback, 0);
> + for_each_node_state(nid, N_MEMORY)
> + hugetlb_register_node(node_devices[nid]);
> + put_online_mems();
> }
> #else /* !CONFIG_NUMA */

Do we really *need* the memory hotplug notifier and the added complexity
due for handling memory-less nodes?

Why can't we simply register/unregister sysfs entries in
register_node/unregister_node and call it a day?

TBH, we should just have sysfs entries for memory-less nodes and not
care about such (corner) cases.


--
Thanks,

David / dhildenb

2022-08-24 04:12:14

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal



> On Aug 23, 2022, at 18:21, David Hildenbrand <[email protected]> wrote:
>
> On 19.08.22 10:00, Muchun Song wrote:
>> The following commit offload per-node sysfs creation and removal to a kworker and
>> did not say why it is needed. And it also said "I don't know that this is
>> absolutely required". It seems like the author was not sure as well. Since it
>> only complicates the code, this patch will revert the changes to simplify the code.
>>
>> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>>
>> We could use memory hotplug notifier to do per-node sysfs creation and removal
>> instead of inserting those operations to node registration and unregistration.
>> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
>> simplify the code.
>>
>> Signed-off-by: Muchun Song <[email protected]>
>
>
> [...]
>
>> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
>> void unregister_node(struct node *node)
>> {
>> compaction_unregister_node(node);
>> - hugetlb_unregister_node(node); /* no-op, if memoryless node */
>> node_remove_accesses(node);
>> node_remove_caches(node);
>> device_unregister(&node->dev);
>> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>> (void *)&nid, func);
>> return;
>> }
>
> [...]
>
>> /*
>> * Create all node devices, which will properly link the node
>> * to applicable memory block devices and already created cpu devices.
>> diff --git a/include/linux/node.h b/include/linux/node.h
>> index 40d641a8bfb0..ea817b507f54 100644
>> --- a/include/linux/node.h
>> +++ b/include/linux/node.h
>> @@ -2,15 +2,15 @@
>> /*
>> * include/linux/node.h - generic node definition
>> *
>> - * This is mainly for topological representation. We define the
>> - * basic 'struct node' here, which can be embedded in per-arch
>> + * This is mainly for topological representation. We define the
>> + * basic 'struct node' here, which can be embedded in per-arch
>> * definitions of processors.
>> *
>> * Basic handling of the devices is done in drivers/base/node.c
>> - * and system devices are handled in drivers/base/sys.c.
>> + * and system devices are handled in drivers/base/sys.c.
>> *
>> * Nodes are exported via driverfs in the class/node/devices/
>> - * directory.
>> + * directory.
>
> Unrelated changes.

Yep, a minor cleanup BTW.

>
>> */
>> #ifndef _LINUX_NODE_H_
>> #define _LINUX_NODE_H_
>> @@ -18,7 +18,6 @@
>> #include <linux/device.h>
>> #include <linux/cpumask.h>
>> #include <linux/list.h>
>> -#include <linux/workqueue.h>
>>
>> /**
>> * struct node_hmem_attrs - heterogeneous memory performance attributes
>> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
>> struct node {
>> struct device dev;
>> struct list_head access_list;
>> -
>> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
>> - struct work_struct node_work;
>> -#endif
>> #ifdef CONFIG_HMEM_REPORTING
>> struct list_head cache_attrs;
>> struct device *cache_dev;
>> @@ -96,7 +91,6 @@ struct node {
>>
>> struct memory_block;
>> extern struct node *node_devices[];
>> -typedef void (*node_registration_func_t)(struct node *);
>>
>> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
>> void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>> unsigned int cpu_nid,
>> unsigned access);
>> -
>> -#ifdef CONFIG_HUGETLBFS
>> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>> - node_registration_func_t unregister);
>> -#endif
>> #else
>> static inline void node_dev_init(void)
>> {
>> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>> {
>> }
>> -
>> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>> - node_registration_func_t unreg)
>> -{
>> -}
>> #endif
>>
>> #define to_node(device) container_of(device, struct node, dev)
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 536a52c29035..9a72499486c1 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -33,6 +33,7 @@
>> #include <linux/migrate.h>
>> #include <linux/nospec.h>
>> #include <linux/delayacct.h>
>> +#include <linux/memory.h>
>>
>> #include <asm/page.h>
>> #include <asm/pgalloc.h>
>> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
>> * Register hstate attributes for a single node device.
>> * No-op if attributes already registered.
>> */
>> -static void hugetlb_register_node(struct node *node)
>> +static int hugetlb_register_node(struct node *node)
>> {
>> struct hstate *h;
>> struct node_hstate *nhs = &node_hstates[node->dev.id];
>> int err;
>>
>> if (nhs->hugepages_kobj)
>> - return; /* already allocated */
>> + return 0; /* already allocated */
>>
>> nhs->hugepages_kobj = kobject_create_and_add("hugepages",
>> &node->dev.kobj);
>> if (!nhs->hugepages_kobj)
>> - return;
>> + return -ENOMEM;
>>
>> for_each_hstate(h) {
>> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
>> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
>> pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
>> h->name, node->dev.id);
>> hugetlb_unregister_node(node);
>> - break;
>> + return -ENOMEM;
>> }
>> }
>> + return 0;
>> +}
>> +
>> +static int __meminit hugetlb_memory_callback(struct notifier_block *self,
>> + unsigned long action, void *arg)
>> +{
>> + int ret = 0;
>> + struct memory_notify *mnb = arg;
>> + int nid = mnb->status_change_nid;
>> +
>> + if (nid == NUMA_NO_NODE)
>> + return NOTIFY_DONE;
>> +
>> + if (action == MEM_GOING_ONLINE)
>> + ret = hugetlb_register_node(node_devices[nid]);
>> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
>> + hugetlb_unregister_node(node_devices[nid]);
>> +
>> + return notifier_from_errno(ret);
>> }
>>
>> /*
>> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
>> {
>> int nid;
>>
>> - for_each_node_state(nid, N_MEMORY) {
>> - struct node *node = node_devices[nid];
>> - if (node->dev.id == nid)
>> - hugetlb_register_node(node);
>> - }
>> -
>> - /*
>> - * Let the node device driver know we're here so it can
>> - * [un]register hstate attributes on node hotplug.
>> - */
>> - register_hugetlbfs_with_node(hugetlb_register_node,
>> - hugetlb_unregister_node);
>> + get_online_mems();
>> + hotplug_memory_notifier(hugetlb_memory_callback, 0);
>> + for_each_node_state(nid, N_MEMORY)
>> + hugetlb_register_node(node_devices[nid]);
>> + put_online_mems();
>> }
>> #else /* !CONFIG_NUMA */
>
> Do we really *need* the memory hotplug notifier and the added complexity
> due for handling memory-less nodes?

I have found the commit introduced this mechanism, see commit:

4faf8d950ec4 ("hugetlb: handle memory hot-plug events")

From the commit message, I think it is a suggestion from David Rientjes.
I didn’t see any reasons why we need it. So Cc David Rientjes (Maybe
he knew more context). The committer Lee and the reviewer Andi’s email
is invalid (don’t Cc them)

>
> Why can't we simply register/unregister sysfs entries in
> register_node/unregister_node and call it a day?
>

At least, I agree with you. Before I change to this way, let’s wait for
some potential comments from David Rientjes.


Thanks.

> TBH, we should just have sysfs entries for memory-less nodes and not
> care about such (corner) cases.
>
>
> --
> Thanks,
>
> David / dhildenb

2022-09-01 07:19:28

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal



> On Aug 24, 2022, at 11:23, Muchun Song <[email protected]> wrote:
>
>
>
>> On Aug 23, 2022, at 18:21, David Hildenbrand <[email protected]> wrote:
>>
>> On 19.08.22 10:00, Muchun Song wrote:
>>> The following commit offload per-node sysfs creation and removal to a kworker and
>>> did not say why it is needed. And it also said "I don't know that this is
>>> absolutely required". It seems like the author was not sure as well. Since it
>>> only complicates the code, this patch will revert the changes to simplify the code.
>>>
>>> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>>>
>>> We could use memory hotplug notifier to do per-node sysfs creation and removal
>>> instead of inserting those operations to node registration and unregistration.
>>> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
>>> simplify the code.
>>>
>>> Signed-off-by: Muchun Song <[email protected]>
>>
>>
>> [...]
>>
>>> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
>>> void unregister_node(struct node *node)
>>> {
>>> compaction_unregister_node(node);
>>> - hugetlb_unregister_node(node); /* no-op, if memoryless node */
>>> node_remove_accesses(node);
>>> node_remove_caches(node);
>>> device_unregister(&node->dev);
>>> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>> (void *)&nid, func);
>>> return;
>>> }
>>
>> [...]
>>
>>> /*
>>> * Create all node devices, which will properly link the node
>>> * to applicable memory block devices and already created cpu devices.
>>> diff --git a/include/linux/node.h b/include/linux/node.h
>>> index 40d641a8bfb0..ea817b507f54 100644
>>> --- a/include/linux/node.h
>>> +++ b/include/linux/node.h
>>> @@ -2,15 +2,15 @@
>>> /*
>>> * include/linux/node.h - generic node definition
>>> *
>>> - * This is mainly for topological representation. We define the
>>> - * basic 'struct node' here, which can be embedded in per-arch
>>> + * This is mainly for topological representation. We define the
>>> + * basic 'struct node' here, which can be embedded in per-arch
>>> * definitions of processors.
>>> *
>>> * Basic handling of the devices is done in drivers/base/node.c
>>> - * and system devices are handled in drivers/base/sys.c.
>>> + * and system devices are handled in drivers/base/sys.c.
>>> *
>>> * Nodes are exported via driverfs in the class/node/devices/
>>> - * directory.
>>> + * directory.
>>
>> Unrelated changes.
>
> Yep, a minor cleanup BTW.
>
>>
>>> */
>>> #ifndef _LINUX_NODE_H_
>>> #define _LINUX_NODE_H_
>>> @@ -18,7 +18,6 @@
>>> #include <linux/device.h>
>>> #include <linux/cpumask.h>
>>> #include <linux/list.h>
>>> -#include <linux/workqueue.h>
>>>
>>> /**
>>> * struct node_hmem_attrs - heterogeneous memory performance attributes
>>> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
>>> struct node {
>>> struct device dev;
>>> struct list_head access_list;
>>> -
>>> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
>>> - struct work_struct node_work;
>>> -#endif
>>> #ifdef CONFIG_HMEM_REPORTING
>>> struct list_head cache_attrs;
>>> struct device *cache_dev;
>>> @@ -96,7 +91,6 @@ struct node {
>>>
>>> struct memory_block;
>>> extern struct node *node_devices[];
>>> -typedef void (*node_registration_func_t)(struct node *);
>>>
>>> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
>>> void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>>> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>>> unsigned int cpu_nid,
>>> unsigned access);
>>> -
>>> -#ifdef CONFIG_HUGETLBFS
>>> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>>> - node_registration_func_t unregister);
>>> -#endif
>>> #else
>>> static inline void node_dev_init(void)
>>> {
>>> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>>> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>>> {
>>> }
>>> -
>>> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>>> - node_registration_func_t unreg)
>>> -{
>>> -}
>>> #endif
>>>
>>> #define to_node(device) container_of(device, struct node, dev)
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 536a52c29035..9a72499486c1 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -33,6 +33,7 @@
>>> #include <linux/migrate.h>
>>> #include <linux/nospec.h>
>>> #include <linux/delayacct.h>
>>> +#include <linux/memory.h>
>>>
>>> #include <asm/page.h>
>>> #include <asm/pgalloc.h>
>>> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
>>> * Register hstate attributes for a single node device.
>>> * No-op if attributes already registered.
>>> */
>>> -static void hugetlb_register_node(struct node *node)
>>> +static int hugetlb_register_node(struct node *node)
>>> {
>>> struct hstate *h;
>>> struct node_hstate *nhs = &node_hstates[node->dev.id];
>>> int err;
>>>
>>> if (nhs->hugepages_kobj)
>>> - return; /* already allocated */
>>> + return 0; /* already allocated */
>>>
>>> nhs->hugepages_kobj = kobject_create_and_add("hugepages",
>>> &node->dev.kobj);
>>> if (!nhs->hugepages_kobj)
>>> - return;
>>> + return -ENOMEM;
>>>
>>> for_each_hstate(h) {
>>> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
>>> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
>>> pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
>>> h->name, node->dev.id);
>>> hugetlb_unregister_node(node);
>>> - break;
>>> + return -ENOMEM;
>>> }
>>> }
>>> + return 0;
>>> +}
>>> +
>>> +static int __meminit hugetlb_memory_callback(struct notifier_block *self,
>>> + unsigned long action, void *arg)
>>> +{
>>> + int ret = 0;
>>> + struct memory_notify *mnb = arg;
>>> + int nid = mnb->status_change_nid;
>>> +
>>> + if (nid == NUMA_NO_NODE)
>>> + return NOTIFY_DONE;
>>> +
>>> + if (action == MEM_GOING_ONLINE)
>>> + ret = hugetlb_register_node(node_devices[nid]);
>>> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
>>> + hugetlb_unregister_node(node_devices[nid]);
>>> +
>>> + return notifier_from_errno(ret);
>>> }
>>>
>>> /*
>>> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
>>> {
>>> int nid;
>>>
>>> - for_each_node_state(nid, N_MEMORY) {
>>> - struct node *node = node_devices[nid];
>>> - if (node->dev.id == nid)
>>> - hugetlb_register_node(node);
>>> - }
>>> -
>>> - /*
>>> - * Let the node device driver know we're here so it can
>>> - * [un]register hstate attributes on node hotplug.
>>> - */
>>> - register_hugetlbfs_with_node(hugetlb_register_node,
>>> - hugetlb_unregister_node);
>>> + get_online_mems();
>>> + hotplug_memory_notifier(hugetlb_memory_callback, 0);
>>> + for_each_node_state(nid, N_MEMORY)
>>> + hugetlb_register_node(node_devices[nid]);
>>> + put_online_mems();
>>> }
>>> #else /* !CONFIG_NUMA */
>>
>> Do we really *need* the memory hotplug notifier and the added complexity
>> due for handling memory-less nodes?

Hi David,

After some tries, I think it may not reduce the complexity. node_dev_init()
is called at early stage before hugetlb_register_all_nodes(). So we need to
add a mechanism to detect if the hugetlb subsystem finishes initialization
in node_dev_init() so that it can determine to help hugetlb create /sysfs
files, the mechanism is similar with the changes in drivers/base/node.c of
commit 9a30523066cd ("hugetlb: add per node hstate attributes”). This approach
may add more code than the memory-notify-based approach like this patch
implemented. And it also add the code coupling between node.c and hugetlb.c.
So I tend to use memory hotplug notifier. What’s your opinion?

Thanks,
Muchun

>
> I have found the commit introduced this mechanism, see commit:
>
> 4faf8d950ec4 ("hugetlb: handle memory hot-plug events")
>
> From the commit message, I think it is a suggestion from David Rientjes.
> I didn’t see any reasons why we need it. So Cc David Rientjes (Maybe
> he knew more context). The committer Lee and the reviewer Andi’s email
> is invalid (don’t Cc them)
>
>>
>> Why can't we simply register/unregister sysfs entries in
>> register_node/unregister_node and call it a day?
>>
>
> At least, I agree with you. Before I change to this way, let’s wait for
> some potential comments from David Rientjes.
>
>
> Thanks.
>
>> TBH, we should just have sysfs entries for memory-less nodes and not
>> care about such (corner) cases.
>>
>>
>> --
>> Thanks,
>>
>> David / dhildenb

2022-09-01 07:42:25

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal

On 01.09.22 08:35, Muchun Song wrote:
>
>
>> On Aug 24, 2022, at 11:23, Muchun Song <[email protected]> wrote:
>>
>>
>>
>>> On Aug 23, 2022, at 18:21, David Hildenbrand <[email protected]> wrote:
>>>
>>> On 19.08.22 10:00, Muchun Song wrote:
>>>> The following commit offload per-node sysfs creation and removal to a kworker and
>>>> did not say why it is needed. And it also said "I don't know that this is
>>>> absolutely required". It seems like the author was not sure as well. Since it
>>>> only complicates the code, this patch will revert the changes to simplify the code.
>>>>
>>>> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>>>>
>>>> We could use memory hotplug notifier to do per-node sysfs creation and removal
>>>> instead of inserting those operations to node registration and unregistration.
>>>> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
>>>> simplify the code.
>>>>
>>>> Signed-off-by: Muchun Song <[email protected]>
>>>
>>>
>>> [...]
>>>
>>>> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
>>>> void unregister_node(struct node *node)
>>>> {
>>>> compaction_unregister_node(node);
>>>> - hugetlb_unregister_node(node); /* no-op, if memoryless node */
>>>> node_remove_accesses(node);
>>>> node_remove_caches(node);
>>>> device_unregister(&node->dev);
>>>> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>>> (void *)&nid, func);
>>>> return;
>>>> }
>>>
>>> [...]
>>>
>>>> /*
>>>> * Create all node devices, which will properly link the node
>>>> * to applicable memory block devices and already created cpu devices.
>>>> diff --git a/include/linux/node.h b/include/linux/node.h
>>>> index 40d641a8bfb0..ea817b507f54 100644
>>>> --- a/include/linux/node.h
>>>> +++ b/include/linux/node.h
>>>> @@ -2,15 +2,15 @@
>>>> /*
>>>> * include/linux/node.h - generic node definition
>>>> *
>>>> - * This is mainly for topological representation. We define the
>>>> - * basic 'struct node' here, which can be embedded in per-arch
>>>> + * This is mainly for topological representation. We define the
>>>> + * basic 'struct node' here, which can be embedded in per-arch
>>>> * definitions of processors.
>>>> *
>>>> * Basic handling of the devices is done in drivers/base/node.c
>>>> - * and system devices are handled in drivers/base/sys.c.
>>>> + * and system devices are handled in drivers/base/sys.c.
>>>> *
>>>> * Nodes are exported via driverfs in the class/node/devices/
>>>> - * directory.
>>>> + * directory.
>>>
>>> Unrelated changes.
>>
>> Yep, a minor cleanup BTW.
>>
>>>
>>>> */
>>>> #ifndef _LINUX_NODE_H_
>>>> #define _LINUX_NODE_H_
>>>> @@ -18,7 +18,6 @@
>>>> #include <linux/device.h>
>>>> #include <linux/cpumask.h>
>>>> #include <linux/list.h>
>>>> -#include <linux/workqueue.h>
>>>>
>>>> /**
>>>> * struct node_hmem_attrs - heterogeneous memory performance attributes
>>>> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
>>>> struct node {
>>>> struct device dev;
>>>> struct list_head access_list;
>>>> -
>>>> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
>>>> - struct work_struct node_work;
>>>> -#endif
>>>> #ifdef CONFIG_HMEM_REPORTING
>>>> struct list_head cache_attrs;
>>>> struct device *cache_dev;
>>>> @@ -96,7 +91,6 @@ struct node {
>>>>
>>>> struct memory_block;
>>>> extern struct node *node_devices[];
>>>> -typedef void (*node_registration_func_t)(struct node *);
>>>>
>>>> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
>>>> void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>>> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>>>> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>>>> unsigned int cpu_nid,
>>>> unsigned access);
>>>> -
>>>> -#ifdef CONFIG_HUGETLBFS
>>>> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>>>> - node_registration_func_t unregister);
>>>> -#endif
>>>> #else
>>>> static inline void node_dev_init(void)
>>>> {
>>>> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>>>> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>>>> {
>>>> }
>>>> -
>>>> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>>>> - node_registration_func_t unreg)
>>>> -{
>>>> -}
>>>> #endif
>>>>
>>>> #define to_node(device) container_of(device, struct node, dev)
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 536a52c29035..9a72499486c1 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -33,6 +33,7 @@
>>>> #include <linux/migrate.h>
>>>> #include <linux/nospec.h>
>>>> #include <linux/delayacct.h>
>>>> +#include <linux/memory.h>
>>>>
>>>> #include <asm/page.h>
>>>> #include <asm/pgalloc.h>
>>>> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
>>>> * Register hstate attributes for a single node device.
>>>> * No-op if attributes already registered.
>>>> */
>>>> -static void hugetlb_register_node(struct node *node)
>>>> +static int hugetlb_register_node(struct node *node)
>>>> {
>>>> struct hstate *h;
>>>> struct node_hstate *nhs = &node_hstates[node->dev.id];
>>>> int err;
>>>>
>>>> if (nhs->hugepages_kobj)
>>>> - return; /* already allocated */
>>>> + return 0; /* already allocated */
>>>>
>>>> nhs->hugepages_kobj = kobject_create_and_add("hugepages",
>>>> &node->dev.kobj);
>>>> if (!nhs->hugepages_kobj)
>>>> - return;
>>>> + return -ENOMEM;
>>>>
>>>> for_each_hstate(h) {
>>>> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
>>>> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
>>>> pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
>>>> h->name, node->dev.id);
>>>> hugetlb_unregister_node(node);
>>>> - break;
>>>> + return -ENOMEM;
>>>> }
>>>> }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int __meminit hugetlb_memory_callback(struct notifier_block *self,
>>>> + unsigned long action, void *arg)
>>>> +{
>>>> + int ret = 0;
>>>> + struct memory_notify *mnb = arg;
>>>> + int nid = mnb->status_change_nid;
>>>> +
>>>> + if (nid == NUMA_NO_NODE)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + if (action == MEM_GOING_ONLINE)
>>>> + ret = hugetlb_register_node(node_devices[nid]);
>>>> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
>>>> + hugetlb_unregister_node(node_devices[nid]);
>>>> +
>>>> + return notifier_from_errno(ret);
>>>> }
>>>>
>>>> /*
>>>> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
>>>> {
>>>> int nid;
>>>>
>>>> - for_each_node_state(nid, N_MEMORY) {
>>>> - struct node *node = node_devices[nid];
>>>> - if (node->dev.id == nid)
>>>> - hugetlb_register_node(node);
>>>> - }
>>>> -
>>>> - /*
>>>> - * Let the node device driver know we're here so it can
>>>> - * [un]register hstate attributes on node hotplug.
>>>> - */
>>>> - register_hugetlbfs_with_node(hugetlb_register_node,
>>>> - hugetlb_unregister_node);
>>>> + get_online_mems();
>>>> + hotplug_memory_notifier(hugetlb_memory_callback, 0);
>>>> + for_each_node_state(nid, N_MEMORY)
>>>> + hugetlb_register_node(node_devices[nid]);
>>>> + put_online_mems();
>>>> }
>>>> #else /* !CONFIG_NUMA */
>>>
>>> Do we really *need* the memory hotplug notifier and the added complexity
>>> due for handling memory-less nodes?
>
> Hi David,
>

Hi,

thanks for playing with the idea.

> After some tries, I think it may not reduce the complexity. node_dev_init()
> is called at early stage before hugetlb_register_all_nodes(). So we need to
> add a mechanism to detect if the hugetlb subsystem finishes initialization
> in node_dev_init() so that it can determine to help hugetlb create /sysfs
> files, the mechanism is similar with the changes in drivers/base/node.c of
> commit 9a30523066cd ("hugetlb: add per node hstate attributes”). This approach

If I'm not wrong, all you need is a single call from node_dev_init()
into hugetlb code.

There, you create the hugetlb sysfs if hugetlb was already initialized,
otherwise it's a NOP as you initialize when hugetlb gets initialized.

When initializing hugetlb, you go over all added nodes and create
hugetlb sysfs.

Testing/remembering if hugetlb was initialized should be easy, no?

What's the complicated part I am missing?

> may add more code than the memory-notify-based approach like this patch
> implemented. And it also add the code coupling between node.c and hugetlb.c.
> So I tend to use memory hotplug notifier. What’s your opinion?

We have hugetlb special casing all over the place, it's an integral MM
part -- not some random driver where we'd really want decoupling.

So I don't see why the decouling would be beneficial here and how using
the memory notifier is any better then a simple callback.


But again, I did not look into the details of the necessary implementation.

Thanks!


--
Thanks,

David / dhildenb

2022-09-01 08:39:31

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH RESEND] mm: hugetlb: simplify per-node sysfs creation and removal



> On Sep 1, 2022, at 14:57, David Hildenbrand <[email protected]> wrote:
>
> On 01.09.22 08:35, Muchun Song wrote:
>>
>>
>>> On Aug 24, 2022, at 11:23, Muchun Song <[email protected]> wrote:
>>>
>>>
>>>
>>>> On Aug 23, 2022, at 18:21, David Hildenbrand <[email protected]> wrote:
>>>>
>>>> On 19.08.22 10:00, Muchun Song wrote:
>>>>> The following commit offload per-node sysfs creation and removal to a kworker and
>>>>> did not say why it is needed. And it also said "I don't know that this is
>>>>> absolutely required". It seems like the author was not sure as well. Since it
>>>>> only complicates the code, this patch will revert the changes to simplify the code.
>>>>>
>>>>> 39da08cb074c ("hugetlb: offload per node attribute registrations")
>>>>>
>>>>> We could use memory hotplug notifier to do per-node sysfs creation and removal
>>>>> instead of inserting those operations to node registration and unregistration.
>>>>> Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can
>>>>> simplify the code.
>>>>>
>>>>> Signed-off-by: Muchun Song <[email protected]>
>>>>
>>>>
>>>> [...]
>>>>
>>>>> @@ -683,7 +626,6 @@ static int register_node(struct node *node, int num)
>>>>> void unregister_node(struct node *node)
>>>>> {
>>>>> compaction_unregister_node(node);
>>>>> - hugetlb_unregister_node(node); /* no-op, if memoryless node */
>>>>> node_remove_accesses(node);
>>>>> node_remove_caches(node);
>>>>> device_unregister(&node->dev);
>>>>> @@ -905,74 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>>>> (void *)&nid, func);
>>>>> return;
>>>>> }
>>>>
>>>> [...]
>>>>
>>>>> /*
>>>>> * Create all node devices, which will properly link the node
>>>>> * to applicable memory block devices and already created cpu devices.
>>>>> diff --git a/include/linux/node.h b/include/linux/node.h
>>>>> index 40d641a8bfb0..ea817b507f54 100644
>>>>> --- a/include/linux/node.h
>>>>> +++ b/include/linux/node.h
>>>>> @@ -2,15 +2,15 @@
>>>>> /*
>>>>> * include/linux/node.h - generic node definition
>>>>> *
>>>>> - * This is mainly for topological representation. We define the
>>>>> - * basic 'struct node' here, which can be embedded in per-arch
>>>>> + * This is mainly for topological representation. We define the
>>>>> + * basic 'struct node' here, which can be embedded in per-arch
>>>>> * definitions of processors.
>>>>> *
>>>>> * Basic handling of the devices is done in drivers/base/node.c
>>>>> - * and system devices are handled in drivers/base/sys.c.
>>>>> + * and system devices are handled in drivers/base/sys.c.
>>>>> *
>>>>> * Nodes are exported via driverfs in the class/node/devices/
>>>>> - * directory.
>>>>> + * directory.
>>>>
>>>> Unrelated changes.
>>>
>>> Yep, a minor cleanup BTW.
>>>
>>>>
>>>>> */
>>>>> #ifndef _LINUX_NODE_H_
>>>>> #define _LINUX_NODE_H_
>>>>> @@ -18,7 +18,6 @@
>>>>> #include <linux/device.h>
>>>>> #include <linux/cpumask.h>
>>>>> #include <linux/list.h>
>>>>> -#include <linux/workqueue.h>
>>>>>
>>>>> /**
>>>>> * struct node_hmem_attrs - heterogeneous memory performance attributes
>>>>> @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid,
>>>>> struct node {
>>>>> struct device dev;
>>>>> struct list_head access_list;
>>>>> -
>>>>> -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
>>>>> - struct work_struct node_work;
>>>>> -#endif
>>>>> #ifdef CONFIG_HMEM_REPORTING
>>>>> struct list_head cache_attrs;
>>>>> struct device *cache_dev;
>>>>> @@ -96,7 +91,6 @@ struct node {
>>>>>
>>>>> struct memory_block;
>>>>> extern struct node *node_devices[];
>>>>> -typedef void (*node_registration_func_t)(struct node *);
>>>>>
>>>>> #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
>>>>> void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>>>>> @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>>>>> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>>>>> unsigned int cpu_nid,
>>>>> unsigned access);
>>>>> -
>>>>> -#ifdef CONFIG_HUGETLBFS
>>>>> -extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>>>>> - node_registration_func_t unregister);
>>>>> -#endif
>>>>> #else
>>>>> static inline void node_dev_init(void)
>>>>> {
>>>>> @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>>>>> static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>>>>> {
>>>>> }
>>>>> -
>>>>> -static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>>>>> - node_registration_func_t unreg)
>>>>> -{
>>>>> -}
>>>>> #endif
>>>>>
>>>>> #define to_node(device) container_of(device, struct node, dev)
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 536a52c29035..9a72499486c1 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -33,6 +33,7 @@
>>>>> #include <linux/migrate.h>
>>>>> #include <linux/nospec.h>
>>>>> #include <linux/delayacct.h>
>>>>> +#include <linux/memory.h>
>>>>>
>>>>> #include <asm/page.h>
>>>>> #include <asm/pgalloc.h>
>>>>> @@ -3967,19 +3968,19 @@ static void hugetlb_unregister_node(struct node *node)
>>>>> * Register hstate attributes for a single node device.
>>>>> * No-op if attributes already registered.
>>>>> */
>>>>> -static void hugetlb_register_node(struct node *node)
>>>>> +static int hugetlb_register_node(struct node *node)
>>>>> {
>>>>> struct hstate *h;
>>>>> struct node_hstate *nhs = &node_hstates[node->dev.id];
>>>>> int err;
>>>>>
>>>>> if (nhs->hugepages_kobj)
>>>>> - return; /* already allocated */
>>>>> + return 0; /* already allocated */
>>>>>
>>>>> nhs->hugepages_kobj = kobject_create_and_add("hugepages",
>>>>> &node->dev.kobj);
>>>>> if (!nhs->hugepages_kobj)
>>>>> - return;
>>>>> + return -ENOMEM;
>>>>>
>>>>> for_each_hstate(h) {
>>>>> err = hugetlb_sysfs_add_hstate(h, nhs->hugepages_kobj,
>>>>> @@ -3989,9 +3990,28 @@ static void hugetlb_register_node(struct node *node)
>>>>> pr_err("HugeTLB: Unable to add hstate %s for node %d\n",
>>>>> h->name, node->dev.id);
>>>>> hugetlb_unregister_node(node);
>>>>> - break;
>>>>> + return -ENOMEM;
>>>>> }
>>>>> }
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int __meminit hugetlb_memory_callback(struct notifier_block *self,
>>>>> + unsigned long action, void *arg)
>>>>> +{
>>>>> + int ret = 0;
>>>>> + struct memory_notify *mnb = arg;
>>>>> + int nid = mnb->status_change_nid;
>>>>> +
>>>>> + if (nid == NUMA_NO_NODE)
>>>>> + return NOTIFY_DONE;
>>>>> +
>>>>> + if (action == MEM_GOING_ONLINE)
>>>>> + ret = hugetlb_register_node(node_devices[nid]);
>>>>> + else if (action == MEM_CANCEL_ONLINE || action == MEM_OFFLINE)
>>>>> + hugetlb_unregister_node(node_devices[nid]);
>>>>> +
>>>>> + return notifier_from_errno(ret);
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -4003,18 +4023,11 @@ static void __init hugetlb_register_all_nodes(void)
>>>>> {
>>>>> int nid;
>>>>>
>>>>> - for_each_node_state(nid, N_MEMORY) {
>>>>> - struct node *node = node_devices[nid];
>>>>> - if (node->dev.id == nid)
>>>>> - hugetlb_register_node(node);
>>>>> - }
>>>>> -
>>>>> - /*
>>>>> - * Let the node device driver know we're here so it can
>>>>> - * [un]register hstate attributes on node hotplug.
>>>>> - */
>>>>> - register_hugetlbfs_with_node(hugetlb_register_node,
>>>>> - hugetlb_unregister_node);
>>>>> + get_online_mems();
>>>>> + hotplug_memory_notifier(hugetlb_memory_callback, 0);
>>>>> + for_each_node_state(nid, N_MEMORY)
>>>>> + hugetlb_register_node(node_devices[nid]);
>>>>> + put_online_mems();
>>>>> }
>>>>> #else /* !CONFIG_NUMA */
>>>>
>>>> Do we really *need* the memory hotplug notifier and the added complexity
>>>> due for handling memory-less nodes?
>>
>> Hi David,
>>
>
> Hi,
>
> thanks for playing with the idea.
>
>> After some tries, I think it may not reduce the complexity. node_dev_init()
>> is called at early stage before hugetlb_register_all_nodes(). So we need to
>> add a mechanism to detect if the hugetlb subsystem finishes initialization
>> in node_dev_init() so that it can determine to help hugetlb create /sysfs
>> files, the mechanism is similar with the changes in drivers/base/node.c of
>> commit 9a30523066cd ("hugetlb: add per node hstate attributes”). This approach
>
> If I'm not wrong, all you need is a single call from node_dev_init()
> into hugetlb code.
>
> There, you create the hugetlb sysfs if hugetlb was already initialized,
> otherwise it's a NOP as you initialize when hugetlb gets initialized.

Thanks for your reminder, I know how to handle it now. I’ll send a new
patch later.

Thanks,
Muchun

>
> When initializing hugetlb, you go over all added nodes and create
> hugetlb sysfs.
>
> Testing/remembering if hugetlb was initialized should be easy, no?
>
> What's the complicated part I am missing?
>
>> may add more code than the memory-notify-based approach like this patch
>> implemented. And it also add the code coupling between node.c and hugetlb.c.
>> So I tend to use memory hotplug notifier. What’s your opinion?
>
> We have hugetlb special casing all over the place, it's an integral MM
> part -- not some random driver where we'd really want decoupling.
>
> So I don't see why the decouling would be beneficial here and how using
> the memory notifier is any better then a simple callback.
>
>
> But again, I did not look into the details of the necessary implementation.
>
> Thanks!
>
>
> --
> Thanks,
>
> David / dhildenb
>