2014-11-11 09:46:35

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH] sched/numa: Init numa balancing fields of init_task


We do not initialize init_task.numa_preferred_nid,
but this value is inherited by userspace "init"
process:

rest_init()->kernel_thread(kernel_init)->do_fork(CLONE_VM);

__sched_fork()
{
if (clone_flags & CLONE_VM)
p->numa_preferred_nid = current->numa_preferred_nid;
else
p->numa_preferred_nid = -1;
}

kernel_init() becomes userspace "init" process.

So, we propagate garbage nid to userspace, and it may be used
during numa balancing.

Currently, we do not have reports about this brings a problem,
but it seem we should set it for sure.

Even if init_task.numa_preferred_nid is zero, we may meet a weird
configuration without nid#0. On sparc64, where processors are
numbered physically, I saw a machine without cpu#1, while cpu#2
existed. Possible, something similar may be with numa nodes.
So, let's initialize it and be sure we're safe.

Signed-off-by: Kirill Tkhai <[email protected]>
---
include/linux/init_task.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 77fc43f..5f30ac8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -166,6 +166,15 @@ extern struct task_group root_task_group;
# define INIT_RT_MUTEXES(tsk)
#endif

+#ifdef CONFIG_NUMA_BALANCING
+# define INIT_NUMA_BALANCING(tsk) \
+ .numa_preferred_nid = -1, \
+ .numa_group = NULL, \
+ .numa_faults = NULL,
+#else
+# define INIT_NUMA_BALANCING(tsk)
+#endif
+
/*
* INIT_TASK is used to set up the first task table, touch at
* your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -237,6 +246,7 @@ extern struct task_group root_task_group;
INIT_CPUSET_SEQ(tsk) \
INIT_RT_MUTEXES(tsk) \
INIT_VTIME(tsk) \
+ INIT_NUMA_BALANCING(tsk) \
}





Subject: [tip:sched/core] sched/numa: Init numa balancing fields of init_task

Commit-ID: d8b163c4c657478ef33c082cff78d03a4ca07bb2
Gitweb: http://git.kernel.org/tip/d8b163c4c657478ef33c082cff78d03a4ca07bb2
Author: Kirill Tkhai <[email protected]>
AuthorDate: Tue, 11 Nov 2014 12:46:29 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 16 Nov 2014 10:59:01 +0100

sched/numa: Init numa balancing fields of init_task

We do not initialize init_task.numa_preferred_nid,
but this value is inherited by userspace "init"
process:

rest_init()->kernel_thread(kernel_init)->do_fork(CLONE_VM);

__sched_fork()
{
if (clone_flags & CLONE_VM)
p->numa_preferred_nid = current->numa_preferred_nid;
else
p->numa_preferred_nid = -1;
}

kernel_init() becomes userspace "init" process.

So, we propagate garbage nid to userspace, and it may be used
during numa balancing.

Currently, we do not have reports about this brings a problem,
but it seem we should set it for sure.

Even if init_task.numa_preferred_nid is zero, we may meet a weird
configuration without nid#0. On sparc64, where processors are
numbered physically, I saw a machine without cpu#1, while cpu#2
existed. Possible, something similar may be with numa nodes.
So, let's initialize it and be sure we're safe.

Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Sergey Dyasly <[email protected]>
Link: http://lkml.kernel.org/r/1415699189.15631.6.camel@tkhai
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/init_task.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 77fc43f..5f30ac8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -166,6 +166,15 @@ extern struct task_group root_task_group;
# define INIT_RT_MUTEXES(tsk)
#endif

+#ifdef CONFIG_NUMA_BALANCING
+# define INIT_NUMA_BALANCING(tsk) \
+ .numa_preferred_nid = -1, \
+ .numa_group = NULL, \
+ .numa_faults = NULL,
+#else
+# define INIT_NUMA_BALANCING(tsk)
+#endif
+
/*
* INIT_TASK is used to set up the first task table, touch at
* your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -237,6 +246,7 @@ extern struct task_group root_task_group;
INIT_CPUSET_SEQ(tsk) \
INIT_RT_MUTEXES(tsk) \
INIT_VTIME(tsk) \
+ INIT_NUMA_BALANCING(tsk) \
}