From: Oscar Salvador <[email protected]>
v3 -> v4:
- Make nodemask_t a stack variable
- Added Reviewed-by from David and Pavel
v2 -> v3:
- NODEMASK_FREE can deal with NULL pointers, so do not
make it conditional (by David).
- Split up node_online's check patch (David's suggestion)
- Added Reviewed-by from Andrew and David
- Fix checkpath.pl warnings
This patchset does some cleanups and refactoring in the memory-hotplug code.
The first and the second patch are pretty straightforward, as they
only remove unused arguments/checks.
The third patch refactors unregister_mem_sect_under_nodes a bit by re-defining
nodemask_t as a stack variable. (More details in Patch3's changelog)
The fourth patch removes a node_online check. (More details in Patch4's changelog)
Since this change has a patch for itself, we could quickly revert it
if we notice that something is wrong with it, or drop it if people
are concerned about it.
Oscar Salvador (4):
mm/memory-hotplug: Drop unused args from remove_memory_section
mm/memory_hotplug: Drop mem_blk check from
unregister_mem_sect_under_nodes
mm/memory_hotplug: Define nodemask_t as a stack variable
mm/memory_hotplug: Drop node_online check in
unregister_mem_sect_under_nodes
drivers/base/memory.c | 5 ++---
drivers/base/node.c | 22 ++++++----------------
include/linux/node.h | 5 ++---
3 files changed, 10 insertions(+), 22 deletions(-)
--
2.13.6
From: Oscar Salvador <[email protected]>
Currently, unregister_mem_sect_under_nodes() tries to allocate a nodemask_t
in order to check whithin the loop which nodes have already been unlinked,
so we do not repeat the operation on them.
NODEMASK_ALLOC calls kmalloc() if NODES_SHIFT > 8, otherwise
it just declares a nodemask_t variable whithin the stack.
Since kmalloc() can fail, we actually check whether NODEMASK_ALLOC failed
or not, and we return -ENOMEM accordingly.
remove_memory_section() does not check for the return value though.
It is pretty rare that such a tiny allocation can fail, but if it does,
we will be left with dangled symlinks under /sys/devices/system/node/,
since the mem_blk's directories will be removed no matter what
unregister_mem_sect_under_nodes() returns.
One way to solve this is to check whether unlinked_nodes is NULL or not.
In the case it is not, we can just use it as before, but if it is NULL,
we can just skip the node_test_and_set check, and call sysfs_remove_link()
unconditionally.
This is harmless as sysfs_remove_link() backs off somewhere down the chain
in case the link has already been removed.
This method was presented in v3 of the path [1].
But since the maximum number of nodes we can have is 1024,
when NODES_SHIFT = 10, that gives us a nodemask_t of 128 bytes.
Although everything depends on how deep the stack is, I think we can afford
to define the nodemask_t variable whithin the stack.
That simplifies the code, and we do not need to worry about untested error
code paths.
If we see that this causes troubles with the stack, we can always return to [1].
[1] https://patchwork.kernel.org/patch/10566673/
Signed-off-by: Oscar Salvador <[email protected]>
---
drivers/base/node.c | 16 ++++++----------
include/linux/node.h | 5 ++---
2 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index dd3bdab230b2..6b8c9b4537c9 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -449,35 +449,31 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
}
/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
unsigned long phys_index)
{
- NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
+ nodemask_t unlinked_nodes;
unsigned long pfn, sect_start_pfn, sect_end_pfn;
- if (!unlinked_nodes)
- return -ENOMEM;
- nodes_clear(*unlinked_nodes);
+ nodes_clear(unlinked_nodes);
sect_start_pfn = section_nr_to_pfn(phys_index);
sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
- int nid;
+ int nid = get_nid_for_pfn(pfn);
- nid = get_nid_for_pfn(pfn);
if (nid < 0)
continue;
if (!node_online(nid))
continue;
- if (node_test_and_set(nid, *unlinked_nodes))
+ if (node_test_and_set(nid, unlinked_nodes))
continue;
+
sysfs_remove_link(&node_devices[nid]->dev.kobj,
kobject_name(&mem_blk->dev.kobj));
sysfs_remove_link(&mem_blk->dev.kobj,
kobject_name(&node_devices[nid]->dev.kobj));
}
- NODEMASK_FREE(unlinked_nodes);
- return 0;
}
int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 257bb3d6d014..1203378e596a 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -72,7 +72,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int register_mem_sect_under_node(struct memory_block *mem_blk,
void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+extern void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
unsigned long phys_index);
#ifdef CONFIG_HUGETLBFS
@@ -105,10 +105,9 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
{
return 0;
}
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+static inline void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
unsigned long phys_index)
{
- return 0;
}
static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
--
2.13.6
From: Oscar Salvador <[email protected]>
unregister_memory_section() calls remove_memory_section()
with three arguments:
* node_id
* section
* phys_device
Neither node_id nor phys_device are used.
Let us drop them from the function.
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
drivers/base/memory.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..2c622a9a7490 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -752,8 +752,7 @@ unregister_memory(struct memory_block *memory)
device_unregister(&memory->dev);
}
-static int remove_memory_section(unsigned long node_id,
- struct mem_section *section, int phys_device)
+static int remove_memory_section(struct mem_section *section)
{
struct memory_block *mem;
@@ -785,7 +784,7 @@ int unregister_memory_section(struct mem_section *section)
if (!present_section(section))
return -EINVAL;
- return remove_memory_section(0, section, 0);
+ return remove_memory_section(section);
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.13.6
From: Oscar Salvador <[email protected]>
Before calling to unregister_mem_sect_under_nodes(),
remove_memory_section() already checks if we got a valid memory_block.
No need to check that again in unregister_mem_sect_under_nodes().
If more functions start using unregister_mem_sect_under_nodes() in the
future, we can always place a WARN_ON to catch null mem_blk's so we can
safely back off.
For now, let us keep the check in remove_memory_section() since it is the
only function that uses it.
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
---
drivers/base/node.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 1ac4c36e13bb..dd3bdab230b2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -455,10 +455,6 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
- if (!mem_blk) {
- NODEMASK_FREE(unlinked_nodes);
- return -EFAULT;
- }
if (!unlinked_nodes)
return -ENOMEM;
nodes_clear(*unlinked_nodes);
--
2.13.6
From: Oscar Salvador <[email protected]>
We are getting the nid from the pages that are not yet removed,
but a node can only be offline when its memory/cpu's have been removed.
Therefore, we know that the node is still online.
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
drivers/base/node.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 6b8c9b4537c9..01e9190be010 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -464,8 +464,6 @@ void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
if (nid < 0)
continue;
- if (!node_online(nid))
- continue;
if (node_test_and_set(nid, unlinked_nodes))
continue;
--
2.13.6
On 17.08.2018 11:00, Oscar Salvador wrote:
> From: Oscar Salvador <[email protected]>
>
> Currently, unregister_mem_sect_under_nodes() tries to allocate a nodemask_t
> in order to check whithin the loop which nodes have already been unlinked,
> so we do not repeat the operation on them.
>
> NODEMASK_ALLOC calls kmalloc() if NODES_SHIFT > 8, otherwise
> it just declares a nodemask_t variable whithin the stack.
>
> Since kmalloc() can fail, we actually check whether NODEMASK_ALLOC failed
> or not, and we return -ENOMEM accordingly.
> remove_memory_section() does not check for the return value though.
> It is pretty rare that such a tiny allocation can fail, but if it does,
> we will be left with dangled symlinks under /sys/devices/system/node/,
> since the mem_blk's directories will be removed no matter what
> unregister_mem_sect_under_nodes() returns.
>
> One way to solve this is to check whether unlinked_nodes is NULL or not.
> In the case it is not, we can just use it as before, but if it is NULL,
> we can just skip the node_test_and_set check, and call sysfs_remove_link()
> unconditionally.
> This is harmless as sysfs_remove_link() backs off somewhere down the chain
> in case the link has already been removed.
> This method was presented in v3 of the path [1].
>
> But since the maximum number of nodes we can have is 1024,
> when NODES_SHIFT = 10, that gives us a nodemask_t of 128 bytes.
> Although everything depends on how deep the stack is, I think we can afford
> to define the nodemask_t variable whithin the stack.
>
> That simplifies the code, and we do not need to worry about untested error
> code paths.
>
> If we see that this causes troubles with the stack, we can always return to [1].
>
> [1] https://patchwork.kernel.org/patch/10566673/
>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> drivers/base/node.c | 16 ++++++----------
> include/linux/node.h | 5 ++---
> 2 files changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index dd3bdab230b2..6b8c9b4537c9 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -449,35 +449,31 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> }
>
> /* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> +void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> unsigned long phys_index)
I am a friend of fixing up alignment of other parameters.
> {
> - NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
> + nodemask_t unlinked_nodes;
> unsigned long pfn, sect_start_pfn, sect_end_pfn;
>
> - if (!unlinked_nodes)
> - return -ENOMEM;
> - nodes_clear(*unlinked_nodes);
> + nodes_clear(unlinked_nodes);
>
> sect_start_pfn = section_nr_to_pfn(phys_index);
> sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
> - int nid;
> + int nid = get_nid_for_pfn(pfn);
>
> - nid = get_nid_for_pfn(pfn);
> if (nid < 0)
> continue;
> if (!node_online(nid))
> continue;
> - if (node_test_and_set(nid, *unlinked_nodes))
> + if (node_test_and_set(nid, unlinked_nodes))
> continue;
> +
> sysfs_remove_link(&node_devices[nid]->dev.kobj,
> kobject_name(&mem_blk->dev.kobj));
> sysfs_remove_link(&mem_blk->dev.kobj,
> kobject_name(&node_devices[nid]->dev.kobj));
> }
> - NODEMASK_FREE(unlinked_nodes);
> - return 0;
> }
>
> int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 257bb3d6d014..1203378e596a 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -72,7 +72,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> +extern void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> unsigned long phys_index);
dito
>
> #ifdef CONFIG_HUGETLBFS
> @@ -105,10 +105,9 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
> {
> return 0;
> }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> +static inline void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> unsigned long phys_index)
dito
> {
> - return 0;
> }
>
> static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>
We'll find out if we have enough stack :) But this is definitely simpler.
--
Thanks,
David / dhildenb
On Fri, Aug 17, 2018 at 11:00:13AM +0200, Oscar Salvador wrote:
> From: Oscar Salvador <[email protected]>
>
> v3 -> v4:
> - Make nodemask_t a stack variable
> - Added Reviewed-by from David and Pavel
>
> v2 -> v3:
> - NODEMASK_FREE can deal with NULL pointers, so do not
> make it conditional (by David).
> - Split up node_online's check patch (David's suggestion)
> - Added Reviewed-by from Andrew and David
> - Fix checkpath.pl warnings
Andrew, will you pick up this patchset?
I saw that v3 was removed from the -mm tree because it was about
to be updated with a new version (this one), but I am not sure if
anything wrong happened.
Thanks
--
Oscar Salvador
SUSE L3
On Tue, 21 Aug 2018 18:21:22 +0200 Oscar Salvador <[email protected]> wrote:
> On Fri, Aug 17, 2018 at 11:00:13AM +0200, Oscar Salvador wrote:
> > From: Oscar Salvador <[email protected]>
> >
> > v3 -> v4:
> > - Make nodemask_t a stack variable
> > - Added Reviewed-by from David and Pavel
> >
> > v2 -> v3:
> > - NODEMASK_FREE can deal with NULL pointers, so do not
> > make it conditional (by David).
> > - Split up node_online's check patch (David's suggestion)
> > - Added Reviewed-by from Andrew and David
> > - Fix checkpath.pl warnings
>
> Andrew, will you pick up this patchset?
> I saw that v3 was removed from the -mm tree because it was about
> to be updated with a new version (this one), but I am not sure if
> anything wrong happened.
Yes, things are still changing and we're late in the merge window. I
decided to park it and shall take it up again after 4.19-rc1.
On Fri, Aug 17, 2018 at 11:00:16AM +0200, Oscar Salvador wrote:
> Signed-off-by: Oscar Salvador <[email protected]>
Pavel, could you please review this?
AFAIK, the change made sense to you.
Andrew was about to take the patchset after the merge window,
but I think that a Reviewed-by would still make sense.
Thanks
--
Oscar Salvador
SUSE L3
On 8/17/18 5:00 AM, Oscar Salvador wrote:
> From: Oscar Salvador <[email protected]>
>
> Currently, unregister_mem_sect_under_nodes() tries to allocate a nodemask_t
> in order to check whithin the loop which nodes have already been unlinked,
> so we do not repeat the operation on them.
>
> NODEMASK_ALLOC calls kmalloc() if NODES_SHIFT > 8, otherwise
> it just declares a nodemask_t variable whithin the stack.
>
> Since kmalloc() can fail, we actually check whether NODEMASK_ALLOC failed
> or not, and we return -ENOMEM accordingly.
> remove_memory_section() does not check for the return value though.
> It is pretty rare that such a tiny allocation can fail, but if it does,
> we will be left with dangled symlinks under /sys/devices/system/node/,
> since the mem_blk's directories will be removed no matter what
> unregister_mem_sect_under_nodes() returns.
>
> One way to solve this is to check whether unlinked_nodes is NULL or not.
> In the case it is not, we can just use it as before, but if it is NULL,
> we can just skip the node_test_and_set check, and call sysfs_remove_link()
> unconditionally.
> This is harmless as sysfs_remove_link() backs off somewhere down the chain
> in case the link has already been removed.
> This method was presented in v3 of the path [1].
>
> But since the maximum number of nodes we can have is 1024,
> when NODES_SHIFT = 10, that gives us a nodemask_t of 128 bytes.
> Although everything depends on how deep the stack is, I think we can afford
> to define the nodemask_t variable whithin the stack.
>
> That simplifies the code, and we do not need to worry about untested error
> code paths.
>
> If we see that this causes troubles with the stack, we can always return to [1].
>
> Signed-off-by: Oscar Salvador <[email protected]>
LGTM:
Reviewed-by: Pavel Tatashin <[email protected]>
Thank you,
Pavel
On Tue, Aug 21, 2018 at 01:43:18PM -0700, Andrew Morton wrote:
> On Tue, 21 Aug 2018 18:21:22 +0200 Oscar Salvador <[email protected]> wrote:
>
> > On Fri, Aug 17, 2018 at 11:00:13AM +0200, Oscar Salvador wrote:
> > > From: Oscar Salvador <[email protected]>
> > >
> > > v3 -> v4:
> > > - Make nodemask_t a stack variable
> > > - Added Reviewed-by from David and Pavel
> > >
> > > v2 -> v3:
> > > - NODEMASK_FREE can deal with NULL pointers, so do not
> > > make it conditional (by David).
> > > - Split up node_online's check patch (David's suggestion)
> > > - Added Reviewed-by from Andrew and David
> > > - Fix checkpath.pl warnings
> >
> > Andrew, will you pick up this patchset?
> > I saw that v3 was removed from the -mm tree because it was about
> > to be updated with a new version (this one), but I am not sure if
> > anything wrong happened.
>
> Yes, things are still changing and we're late in the merge window. I
> decided to park it and shall take it up again after 4.19-rc1.
Hi Andrew,
I just got the Reviewed-by from Pavel for patch3.
So you may consider it for -mm when you got some time.
Thanks
--
Oscar Salvador
SUSE L3