Function kswapd_stop() will be called to destroy the kswapd work thread
when all memory of a NUMA node has been offlined. But kswapd_stop() only
terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
The stale pointer will prevent kswapd_run() from creating a new work thread
when adding memory to the memory-less NUMA node again. Eventually the stale
pointer may cause invalid memory access.
Signed-off-by: Xishi Qiu <[email protected]>
Signed-off-by: Jiang Liu <[email protected]>
---
An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/memory/memory391/state
CPU 11
Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
RIP: 0010:[<ffffffff81051a94>] [<ffffffff81051a94>] exit_creds+0x12/0x78
RSP: 0018:ffff8806044f1d78 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
FS: 00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
Stack:
ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
<0> ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
<0> 0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
Call Trace:
[<ffffffff8103aac5>] __put_task_struct+0x5d/0x97
[<ffffffff8104d21e>] kthread_stop+0x50/0x58
[<ffffffff810bd5b1>] offline_pages+0x324/0x3da
[<ffffffff8121111f>] memory_block_change_state+0x179/0x1db
[<ffffffff8121121f>] store_mem_state+0x9e/0xbb
[<ffffffff8111a1f1>] sysfs_write_file+0xd0/0x107
[<ffffffff810c7fe0>] vfs_write+0xad/0x169
[<ffffffff810c8158>] sys_write+0x45/0x6e
[<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
[<00007fbc0344df60>] 0x7fbc0344df60
Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
RIP [<ffffffff81051a94>] exit_creds+0x12/0x78
RSP <ffff8806044f1d78>
CR2: 0000000000000000
---[ end trace 75959287252338a5 ]---
---
mm/vmscan.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eeb3bc9..7585101 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2961,8 +2961,10 @@ void kswapd_stop(int nid)
{
struct task_struct *kswapd = NODE_DATA(nid)->kswapd;
- if (kswapd)
+ if (kswapd) {
kthread_stop(kswapd);
+ NODE_DATA(nid)->kswapd = NULL;
+ }
}
static int __init kswapd_init(void)
--
1.7.1
(2012/06/14 12:44), Jiang Liu wrote:
> Function kswapd_stop() will be called to destroy the kswapd work thread
> when all memory of a NUMA node has been offlined. But kswapd_stop() only
> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
> The stale pointer will prevent kswapd_run() from creating a new work thread
> when adding memory to the memory-less NUMA node again. Eventually the stale
> pointer may cause invalid memory access.
>
> Signed-off-by: Xishi Qiu<[email protected]>
> Signed-off-by: Jiang Liu<[email protected]>
>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Hi,
On 06/14/2012 12:44 PM, Jiang Liu wrote:
> Function kswapd_stop() will be called to destroy the kswapd work thread
> when all memory of a NUMA node has been offlined. But kswapd_stop() only
> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
> The stale pointer will prevent kswapd_run() from creating a new work thread
> when adding memory to the memory-less NUMA node again. Eventually the stale
> pointer may cause invalid memory access.
>
> Signed-off-by: Xishi Qiu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Nitpick:
I saw kswapd_run and doubt why following line is there.
if (pgdat->kswapd)
return 0;
As looking thorough hotplug, I realized one can hotplug pages which are within different zones but same node.
Because kswapd live in per-node, that code is for checking kswapd already run. Right?
IMHO, better readable code is following as
diff --git a/include/linux/swap.h b/include/linux/swap.h
index b967eda..9425c0e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -299,6 +299,7 @@ static inline void scan_unevictable_unregister_node(struct node *node)
}
#endif
+extern bool is_kswapd_running(int nid);
extern int kswapd_run(int nid);
extern void kswapd_stop(int nid);
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d7e3ec..60f9155 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
init_per_zone_wmark_min();
if (onlined_pages) {
- kswapd_run(zone_to_nid(zone));
+ if (!is_kswapd_running(zone_to_nid(zone))
+ kswapd_run(zone_to_nid(zone));
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eeb3bc9..f331904 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2932,6 +2932,14 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
return NOTIFY_OK;
}
+bool is_kswapd_running(int nid)
+{
+ pg_data_t *pgdat = NODE_DATA(nid);
+ if (pgdat->kswapd)
+ return true;
+ return false;
+}
+
/*
* This kswapd start function will be called by init and node-hot-add.
* On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
@@ -2941,9 +2949,6 @@ int kswapd_run(int nid)
pg_data_t *pgdat = NODE_DATA(nid);
int ret = 0;
- if (pgdat->kswapd)
- return 0;
-
pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
if (IS_ERR(pgdat->kswapd)) {
/* failure at boot is fatal */
Anyway, it's a preference and trivial but I hope you fix that, too if you don't mind
Of course, my nitpick shouldn't prevent merging your good fix.
If you mind it, I don't care of it. :)
Thanks.
--
Kind regards,
Minchan Kim
On Wed, Jun 13, 2012 at 11:44 PM, Jiang Liu <[email protected]> wrote:
> Function kswapd_stop() will be called to destroy the kswapd work thread
> when all memory of a NUMA node has been offlined. But kswapd_stop() only
> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
> The stale pointer will prevent kswapd_run() from creating a new work thread
> when adding memory to the memory-less NUMA node again. Eventually the stale
> pointer may cause invalid memory access.
>
> Signed-off-by: Xishi Qiu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Hi Minchan,
Thanks for comments and will send out a separate patch for
readability soon based on your version.
Thanks!
Gerry
On 2012-6-14 13:31, Minchan Kim wrote:
> Hi,
>
> On 06/14/2012 12:44 PM, Jiang Liu wrote:
>
>> Function kswapd_stop() will be called to destroy the kswapd work thread
>> when all memory of a NUMA node has been offlined. But kswapd_stop() only
>> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
>> The stale pointer will prevent kswapd_run() from creating a new work thread
>> when adding memory to the memory-less NUMA node again. Eventually the stale
>> pointer may cause invalid memory access.
>>
>> Signed-off-by: Xishi Qiu <[email protected]>
>> Signed-off-by: Jiang Liu <[email protected]>
>
>
> Reviewed-by: Minchan Kim <[email protected]>
>
> Nitpick:
>
> I saw kswapd_run and doubt why following line is there.
>
> if (pgdat->kswapd)
> return 0;
>
> As looking thorough hotplug, I realized one can hotplug pages which are within different zones but same node.
> Because kswapd live in per-node, that code is for checking kswapd already run. Right?
Yes, I think so. We could also add new memory pages to existing zones too.
>
> IMHO, better readable code is following as
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index b967eda..9425c0e 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -299,6 +299,7 @@ static inline void scan_unevictable_unregister_node(struct node *node)
> }
> #endif
>
> +extern bool is_kswapd_running(int nid);
> extern int kswapd_run(int nid);
> extern void kswapd_stop(int nid);
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0d7e3ec..60f9155 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
> init_per_zone_wmark_min();
>
> if (onlined_pages) {
> - kswapd_run(zone_to_nid(zone));
> + if (!is_kswapd_running(zone_to_nid(zone))
> + kswapd_run(zone_to_nid(zone));
> node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> }
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eeb3bc9..f331904 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2932,6 +2932,14 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
> return NOTIFY_OK;
> }
>
> +bool is_kswapd_running(int nid)
> +{
> + pg_data_t *pgdat = NODE_DATA(nid);
> + if (pgdat->kswapd)
> + return true;
> + return false;
> +}
> +
> /*
> * This kswapd start function will be called by init and node-hot-add.
> * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
> @@ -2941,9 +2949,6 @@ int kswapd_run(int nid)
> pg_data_t *pgdat = NODE_DATA(nid);
> int ret = 0;
>
> - if (pgdat->kswapd)
> - return 0;
> -
> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
> if (IS_ERR(pgdat->kswapd)) {
> /* failure at boot is fatal */
>
> Anyway, it's a preference and trivial but I hope you fix that, too if you don't mind
> Of course, my nitpick shouldn't prevent merging your good fix.
> If you mind it, I don't care of it. :)
>
> Thanks.
Add kswapd_is_running() to check whether the kswapd worker thread is already
running before calling kswapd_run() when onlining memory pages.
It's based on a draft version from Minchan Kim <[email protected]>.
Signed-off-by: Jiang Liu <[email protected]>
---
include/linux/swap.h | 5 +++++
mm/memory_hotplug.c | 3 ++-
mm/vmscan.c | 3 +--
3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index c84ec68..36249d5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -301,6 +301,11 @@ static inline void scan_unevictable_unregister_node(struct node *node)
extern int kswapd_run(int nid);
extern void kswapd_stop(int nid);
+static inline bool kswapd_is_running(int nid)
+{
+ return !!(NODE_DATA(nid)->kswapd);
+}
+
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
extern int mem_cgroup_swappiness(struct mem_cgroup *mem);
#else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d7e3ec..88e479d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
init_per_zone_wmark_min();
if (onlined_pages) {
- kswapd_run(zone_to_nid(zone));
+ if (!kswapd_is_running(zone_to_nid(zone)))
+ kswapd_run(zone_to_nid(zone));
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7585101..3dafdbe 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2941,8 +2941,7 @@ int kswapd_run(int nid)
pg_data_t *pgdat = NODE_DATA(nid);
int ret = 0;
- if (pgdat->kswapd)
- return 0;
+ BUG_ON(pgdat->kswapd);
pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
if (IS_ERR(pgdat->kswapd)) {
--
1.7.1
(2012/06/14 17:49), Jiang Liu wrote:
> Add kswapd_is_running() to check whether the kswapd worker thread is already
> running before calling kswapd_run() when onlining memory pages.
>
> It's based on a draft version from Minchan Kim<[email protected]>.
>
> Signed-off-by: Jiang Liu<[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
On Thu, Jun 14, 2012 at 4:49 AM, Jiang Liu <[email protected]> wrote:
> Add kswapd_is_running() to check whether the kswapd worker thread is already
> running before calling kswapd_run() when onlining memory pages.
>
> It's based on a draft version from Minchan Kim <[email protected]>.
>
> Signed-off-by: Jiang Liu <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
On Thu, Jun 14, 2012 at 04:49:36PM +0800, Jiang Liu wrote:
> Add kswapd_is_running() to check whether the kswapd worker thread is already
> running before calling kswapd_run() when onlining memory pages.
>
> It's based on a draft version from Minchan Kim <[email protected]>.
>
> Signed-off-by: Jiang Liu <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Thanks!
On Thu, Jun 14, 2012 at 11:44:51AM +0800, Jiang Liu wrote:
> Function kswapd_stop() will be called to destroy the kswapd work thread
> when all memory of a NUMA node has been offlined. But kswapd_stop() only
> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
> The stale pointer will prevent kswapd_run() from creating a new work thread
> when adding memory to the memory-less NUMA node again. Eventually the stale
> pointer may cause invalid memory access.
>
> Signed-off-by: Xishi Qiu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
>
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
SUSE Labs
On Thu, 14 Jun 2012, Jiang Liu wrote:
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index c84ec68..36249d5 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -301,6 +301,11 @@ static inline void scan_unevictable_unregister_node(struct node *node)
>
> extern int kswapd_run(int nid);
> extern void kswapd_stop(int nid);
> +static inline bool kswapd_is_running(int nid)
> +{
> + return !!(NODE_DATA(nid)->kswapd);
> +}
> +
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> extern int mem_cgroup_swappiness(struct mem_cgroup *mem);
> #else
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0d7e3ec..88e479d 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
> init_per_zone_wmark_min();
>
> if (onlined_pages) {
> - kswapd_run(zone_to_nid(zone));
> + if (!kswapd_is_running(zone_to_nid(zone)))
> + kswapd_run(zone_to_nid(zone));
> node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> }
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7585101..3dafdbe 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2941,8 +2941,7 @@ int kswapd_run(int nid)
> pg_data_t *pgdat = NODE_DATA(nid);
> int ret = 0;
>
> - if (pgdat->kswapd)
> - return 0;
> + BUG_ON(pgdat->kswapd);
>
> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
> if (IS_ERR(pgdat->kswapd)) {
This isn't better, there's no functional change and you've just added a
second conditional for no reason and an unnecessary kswapd_is_running()
function.
More concerning is that online_pages() doesn't check the return value of
kswapd_run(). We should probably fail the memory hotplug operation that
onlines a new node and doesn't have a kswapd running and cleanup after
ourselves in online_pages() with some sane error handling.
On Thu, 14 Jun 2012, Jiang Liu wrote:
> Function kswapd_stop() will be called to destroy the kswapd work thread
> when all memory of a NUMA node has been offlined. But kswapd_stop() only
> terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL.
> The stale pointer will prevent kswapd_run() from creating a new work thread
> when adding memory to the memory-less NUMA node again. Eventually the stale
> pointer may cause invalid memory access.
>
> Signed-off-by: Xishi Qiu <[email protected]>
> Signed-off-by: Jiang Liu <[email protected]>
Acked-by: David Rientjes <[email protected]>
On 06/17/2012 11:19 AM, David Rientjes wrote:
> On Thu, 14 Jun 2012, Jiang Liu wrote:
>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index c84ec68..36249d5 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -301,6 +301,11 @@ static inline void scan_unevictable_unregister_node(struct node *node)
>>
>> extern int kswapd_run(int nid);
>> extern void kswapd_stop(int nid);
>> +static inline bool kswapd_is_running(int nid)
>> +{
>> + return !!(NODE_DATA(nid)->kswapd);
>> +}
>> +
>> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>> extern int mem_cgroup_swappiness(struct mem_cgroup *mem);
>> #else
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 0d7e3ec..88e479d 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>> init_per_zone_wmark_min();
>>
>> if (onlined_pages) {
>> - kswapd_run(zone_to_nid(zone));
>> + if (!kswapd_is_running(zone_to_nid(zone)))
>> + kswapd_run(zone_to_nid(zone));
>> node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
>> }
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 7585101..3dafdbe 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -2941,8 +2941,7 @@ int kswapd_run(int nid)
>> pg_data_t *pgdat = NODE_DATA(nid);
>> int ret = 0;
>>
>> - if (pgdat->kswapd)
>> - return 0;
>> + BUG_ON(pgdat->kswapd);
>>
>> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
>> if (IS_ERR(pgdat->kswapd)) {
>
> This isn't better, there's no functional change and you've just added a
> second conditional for no reason and an unnecessary kswapd_is_running()
> function.
Tend to agree.
Now that I think about it, it's enough to add comment.
>
> More concerning is that online_pages() doesn't check the return value of
> kswapd_run(). We should probably fail the memory hotplug operation that
> onlines a new node and doesn't have a kswapd running and cleanup after
> ourselves in online_pages() with some sane error handling.
Yeb. It's more valuable.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
--
Kind regards,
Minchan Kim
(6/17/12 9:12 PM), Minchan Kim wrote:
> On 06/17/2012 11:19 AM, David Rientjes wrote:
>
>> On Thu, 14 Jun 2012, Jiang Liu wrote:
>>
>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>> index c84ec68..36249d5 100644
>>> --- a/include/linux/swap.h
>>> +++ b/include/linux/swap.h
>>> @@ -301,6 +301,11 @@ static inline void scan_unevictable_unregister_node(struct node *node)
>>>
>>> extern int kswapd_run(int nid);
>>> extern void kswapd_stop(int nid);
>>> +static inline bool kswapd_is_running(int nid)
>>> +{
>>> + return !!(NODE_DATA(nid)->kswapd);
>>> +}
>>> +
>>> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>>> extern int mem_cgroup_swappiness(struct mem_cgroup *mem);
>>> #else
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 0d7e3ec..88e479d 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -522,7 +522,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>> init_per_zone_wmark_min();
>>>
>>> if (onlined_pages) {
>>> - kswapd_run(zone_to_nid(zone));
>>> + if (!kswapd_is_running(zone_to_nid(zone)))
>>> + kswapd_run(zone_to_nid(zone));
>>> node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
>>> }
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 7585101..3dafdbe 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -2941,8 +2941,7 @@ int kswapd_run(int nid)
>>> pg_data_t *pgdat = NODE_DATA(nid);
>>> int ret = 0;
>>>
>>> - if (pgdat->kswapd)
>>> - return 0;
>>> + BUG_ON(pgdat->kswapd);
>>>
>>> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
>>> if (IS_ERR(pgdat->kswapd)) {
>>
>> This isn't better, there's no functional change and you've just added a
>> second conditional for no reason and an unnecessary kswapd_is_running()
>> function.
>
> Tend to agree.
> Now that I think about it, it's enough to add comment.
Ok, I'd like to handle this issue because now I have some mem-hotplug related tirivial
fixes. So, to add one more patch is not big bother to me.
> This isn't better, there's no functional change and you've just added a
> second conditional for no reason and an unnecessary kswapd_is_running()
> function.
>
> More concerning is that online_pages() doesn't check the return value of
> kswapd_run(). We should probably fail the memory hotplug operation that
> onlines a new node and doesn't have a kswapd running and cleanup after
> ourselves in online_pages() with some sane error handling.
Hi David,
Good points! Is it feasible to use schedule_delayed_work_on() to
retry kswapd_run() instead of ralling back the online operation in case
kswapd_run() failed to create the work thread?
Thank!
Gerry