2019-12-19 12:40:49

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

In kernel/sched/Makefile files, describing different sched classes, already
go in the order from the lowest priority class to the highest priority class:

idle.o fair.o rt.o deadline.o stop_task.o

The documentation of GNU linker says, that section appears in the order
they are seen during link time (see [1]):

>Normally, the linker will place files and sections matched by wildcards
>in the order in which they are seen during the link. You can change this
>by using the SORT keyword, which appears before a wildcard pattern
>in parentheses (e.g., SORT(.text*)).

So, we may expect const variables from idle.o will go before ro variables
from fair.o in RO_DATA section, while ro variables from fair.o will go
before ro variables from rt.o, etc.

(Also, it looks like the linking order is already used in kernel, e.g.
in drivers/md/Makefile)

Thus, we may introduce an optimization based on xxx_sched_class addresses
in these two hot scheduler functions: pick_next_task() and check_preempt_curr().

One more result of the patch is that size of object file becomes a little
less (excluding added BUG_ON(), which goes in __init section):

$size kernel/sched/core.o
text data bss dec hex filename
before: 66446 18957 676 86079 1503f kernel/sched/core.o
after: 66398 18957 676 86031 1500f kernel/sched/core.o

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html

Signed-off-by: Kirill Tkhai <[email protected]>
---
kernel/sched/Makefile | 2 ++
kernel/sched/core.c | 24 +++++++++---------------
2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 5fc9c9b70862..f78f177c660a 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -23,6 +23,8 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
endif

obj-y += core.o loadavg.o clock.o cputime.o
+# Order is significant: a more priority class xxx is described by variable
+# xxx_sched_class with a bigger address. See BUG_ON() in sched_init().
obj-y += idle.o fair.o rt.o deadline.o
obj-y += wait.o wait_bit.o swait.o completion.o

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15508c202bf5..befdd7158b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1416,20 +1416,10 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,

void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
- const struct sched_class *class;
-
- if (p->sched_class == rq->curr->sched_class) {
+ if (p->sched_class == rq->curr->sched_class)
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
- } else {
- for_each_class(class) {
- if (class == rq->curr->sched_class)
- break;
- if (class == p->sched_class) {
- resched_curr(rq);
- break;
- }
- }
- }
+ else if (p->sched_class > rq->curr->sched_class)
+ resched_curr(rq);

/*
* A queue event has occurred, and we're going to schedule. In
@@ -3914,8 +3904,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* higher scheduling class, because otherwise those loose the
* opportunity to pull in more work from other CPUs.
*/
- if (likely((prev->sched_class == &idle_sched_class ||
- prev->sched_class == &fair_sched_class) &&
+ if (likely(prev->sched_class <= &fair_sched_class &&
rq->nr_running == rq->cfs.h_nr_running)) {

p = pick_next_task_fair(rq, prev, rf);
@@ -6569,6 +6558,11 @@ void __init sched_init(void)
unsigned long ptr = 0;
int i;

+ BUG_ON(&idle_sched_class > &fair_sched_class ||
+ &fair_sched_class > &rt_sched_class ||
+ &rt_sched_class > &dl_sched_class ||
+ &dl_sched_class > &stop_sched_class);
+
wait_bit_init();

#ifdef CONFIG_FAIR_GROUP_SCHED



2019-12-19 13:13:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> In kernel/sched/Makefile files, describing different sched classes, already
> go in the order from the lowest priority class to the highest priority class:
>
> idle.o fair.o rt.o deadline.o stop_task.o
>
> The documentation of GNU linker says, that section appears in the order
> they are seen during link time (see [1]):
>
> >Normally, the linker will place files and sections matched by wildcards
> >in the order in which they are seen during the link. You can change this
> >by using the SORT keyword, which appears before a wildcard pattern
> >in parentheses (e.g., SORT(.text*)).
>
> So, we may expect const variables from idle.o will go before ro variables
> from fair.o in RO_DATA section, while ro variables from fair.o will go
> before ro variables from rt.o, etc.
>
> (Also, it looks like the linking order is already used in kernel, e.g.
> in drivers/md/Makefile)
>
> Thus, we may introduce an optimization based on xxx_sched_class addresses
> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>
> One more result of the patch is that size of object file becomes a little
> less (excluding added BUG_ON(), which goes in __init section):
>
> $size kernel/sched/core.o
> text data bss dec hex filename
> before: 66446 18957 676 86079 1503f kernel/sched/core.o
> after: 66398 18957 676 86031 1500f kernel/sched/core.o

Does LTO preserve this behaviour? I've never quite dared do this exact
optimization.

2019-12-19 13:51:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 15:39:14 +0300
Kirill Tkhai <[email protected]> wrote:

> @@ -6569,6 +6558,11 @@ void __init sched_init(void)
> unsigned long ptr = 0;
> int i;
>
> + BUG_ON(&idle_sched_class > &fair_sched_class ||
> + &fair_sched_class > &rt_sched_class ||
> + &rt_sched_class > &dl_sched_class ||
> + &dl_sched_class > &stop_sched_class);
> +

Can this be a BUILD_BUG_ON? These address should all be constants.

-- Steve



> wait_bit_init();
>

2019-12-19 13:54:54

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 16:50, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 15:39:14 +0300
> Kirill Tkhai <[email protected]> wrote:
>
>> @@ -6569,6 +6558,11 @@ void __init sched_init(void)
>> unsigned long ptr = 0;
>> int i;
>>
>> + BUG_ON(&idle_sched_class > &fair_sched_class ||
>> + &fair_sched_class > &rt_sched_class ||
>> + &rt_sched_class > &dl_sched_class ||
>> + &dl_sched_class > &stop_sched_class);
>> +
>
> Can this be a BUILD_BUG_ON? These address should all be constants.

BUILD_BUG_ON() is compile-time check, while address is assigned
at link time, isn't it?!

Anyway, plain BUILD_BUG_ON() fails here with the following:

In file included from ./arch/x86/include/asm/current.h:5,
from ./include/linux/sched.h:12,
from kernel/sched/sched.h:5,
from kernel/sched/core.c:9:
kernel/sched/core.c: In function ‘sched_init’:
./include/linux/compiler.h:394:38: error: call to ‘__compiletime_assert_6561’ declared with attribute error: BUILD_BUG_ON failed: &idle_sched_class > &fair_sched_class || &fair_sched_class > &rt_sched_class || &rt_sched_class > &dl_sched_class || &dl_sched_class > &stop_sched_class
394 | _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
| ^
./include/linux/compiler.h:375:4: note: in definition of macro ‘__compiletime_assert’
375 | prefix ## suffix(); \
| ^~~~~~
./include/linux/compiler.h:394:2: note: in expansion of macro ‘_compiletime_assert’
394 | _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
| ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
kernel/sched/core.c:6561:2: note: in expansion of macro ‘BUILD_BUG_ON’
6561 | BUILD_BUG_ON(&idle_sched_class > &fair_sched_class ||
| ^~~~~~~~~~~~


> -- Steve
>
>
>
>> wait_bit_init();
>>

2019-12-19 13:55:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, Dec 19, 2019 at 08:50:42AM -0500, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 15:39:14 +0300
> Kirill Tkhai <[email protected]> wrote:
>
> > @@ -6569,6 +6558,11 @@ void __init sched_init(void)
> > unsigned long ptr = 0;
> > int i;
> >
> > + BUG_ON(&idle_sched_class > &fair_sched_class ||
> > + &fair_sched_class > &rt_sched_class ||
> > + &rt_sched_class > &dl_sched_class ||
> > + &dl_sched_class > &stop_sched_class);
> > +
>
> Can this be a BUILD_BUG_ON? These address should all be constants.

Nope, BUILD_BUG_ON() is for compile time constants, these are link time
constants.

2019-12-19 14:05:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> > In kernel/sched/Makefile files, describing different sched classes, already
> > go in the order from the lowest priority class to the highest priority class:
> >
> > idle.o fair.o rt.o deadline.o stop_task.o
> >
> > The documentation of GNU linker says, that section appears in the order
> > they are seen during link time (see [1]):
> >
> > >Normally, the linker will place files and sections matched by wildcards
> > >in the order in which they are seen during the link. You can change this
> > >by using the SORT keyword, which appears before a wildcard pattern
> > >in parentheses (e.g., SORT(.text*)).
> >
> > So, we may expect const variables from idle.o will go before ro variables
> > from fair.o in RO_DATA section, while ro variables from fair.o will go
> > before ro variables from rt.o, etc.
> >
> > (Also, it looks like the linking order is already used in kernel, e.g.
> > in drivers/md/Makefile)
> >
> > Thus, we may introduce an optimization based on xxx_sched_class addresses
> > in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
> >
> > One more result of the patch is that size of object file becomes a little
> > less (excluding added BUG_ON(), which goes in __init section):
> >
> > $size kernel/sched/core.o
> > text data bss dec hex filename
> > before: 66446 18957 676 86079 1503f kernel/sched/core.o
> > after: 66398 18957 676 86031 1500f kernel/sched/core.o
>
> Does LTO preserve this behaviour? I've never quite dared do this exact
> optimization.

Also, ld.lld seems a popular option.

2019-12-19 14:06:29

by Kirill Tkhai

[permalink] [raw]
Subject: [Q] ld: Does LTO reorder ro variables in two files?

CC: [email protected]

Hi, gcc guys,

this thread starts here: https://lkml.org/lkml/2019/12/19/403

There are two const variables:

struct sched_class idle_sched_class
and
struct sched_class fair_sched_class,

which are declared in two files idle.c and fair.c.

1)In Makefile the order is: idle.o fair.o
2)the variables go to the same ro section
3)there is no SORT(.*) keyword in linker script.

Is it always true, that after linkage &idle_sched_class < &fair_sched_class?

Thanks!
Kirill

On 19.12.2019 16:12, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>> In kernel/sched/Makefile files, describing different sched classes, already
>> go in the order from the lowest priority class to the highest priority class:
>>
>> idle.o fair.o rt.o deadline.o stop_task.o
>>
>> The documentation of GNU linker says, that section appears in the order
>> they are seen during link time (see [1]):
>>
>>> Normally, the linker will place files and sections matched by wildcards
>>> in the order in which they are seen during the link. You can change this
>>> by using the SORT keyword, which appears before a wildcard pattern
>>> in parentheses (e.g., SORT(.text*)).
>>
>> So, we may expect const variables from idle.o will go before ro variables
>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>> before ro variables from rt.o, etc.
>>
>> (Also, it looks like the linking order is already used in kernel, e.g.
>> in drivers/md/Makefile)
>>
>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>
>> One more result of the patch is that size of object file becomes a little
>> less (excluding added BUG_ON(), which goes in __init section):
>>
>> $size kernel/sched/core.o
>> text data bss dec hex filename
>> before: 66446 18957 676 86079 1503f kernel/sched/core.o
>> after: 66398 18957 676 86031 1500f kernel/sched/core.o
>
> Does LTO preserve this behaviour? I've never quite dared do this exact
> optimization.

2019-12-19 14:27:20

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 17:02, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
>> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>>> In kernel/sched/Makefile files, describing different sched classes, already
>>> go in the order from the lowest priority class to the highest priority class:
>>>
>>> idle.o fair.o rt.o deadline.o stop_task.o
>>>
>>> The documentation of GNU linker says, that section appears in the order
>>> they are seen during link time (see [1]):
>>>
>>>> Normally, the linker will place files and sections matched by wildcards
>>>> in the order in which they are seen during the link. You can change this
>>>> by using the SORT keyword, which appears before a wildcard pattern
>>>> in parentheses (e.g., SORT(.text*)).
>>>
>>> So, we may expect const variables from idle.o will go before ro variables
>>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>>> before ro variables from rt.o, etc.
>>>
>>> (Also, it looks like the linking order is already used in kernel, e.g.
>>> in drivers/md/Makefile)
>>>
>>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>>
>>> One more result of the patch is that size of object file becomes a little
>>> less (excluding added BUG_ON(), which goes in __init section):
>>>
>>> $size kernel/sched/core.o
>>> text data bss dec hex filename
>>> before: 66446 18957 676 86079 1503f kernel/sched/core.o
>>> after: 66398 18957 676 86031 1500f kernel/sched/core.o
>>
>> Does LTO preserve this behaviour? I've never quite dared do this exact
>> optimization.
>
> Also, ld.lld seems a popular option.

I asked on their IRC. Oh, it looks like no way is for this.

About the link: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html

(17:19:25) nbjoerg: but it is not guarenteed behavior
(17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

2019-12-19 14:33:37

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 17:25, Kirill Tkhai wrote:
> On 19.12.2019 17:02, Peter Zijlstra wrote:
>> On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
>>> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>>>> In kernel/sched/Makefile files, describing different sched classes, already
>>>> go in the order from the lowest priority class to the highest priority class:
>>>>
>>>> idle.o fair.o rt.o deadline.o stop_task.o
>>>>
>>>> The documentation of GNU linker says, that section appears in the order
>>>> they are seen during link time (see [1]):
>>>>
>>>>> Normally, the linker will place files and sections matched by wildcards
>>>>> in the order in which they are seen during the link. You can change this
>>>>> by using the SORT keyword, which appears before a wildcard pattern
>>>>> in parentheses (e.g., SORT(.text*)).
>>>>
>>>> So, we may expect const variables from idle.o will go before ro variables
>>>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>>>> before ro variables from rt.o, etc.
>>>>
>>>> (Also, it looks like the linking order is already used in kernel, e.g.
>>>> in drivers/md/Makefile)
>>>>
>>>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>>>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>>>
>>>> One more result of the patch is that size of object file becomes a little
>>>> less (excluding added BUG_ON(), which goes in __init section):
>>>>
>>>> $size kernel/sched/core.o
>>>> text data bss dec hex filename
>>>> before: 66446 18957 676 86079 1503f kernel/sched/core.o
>>>> after: 66398 18957 676 86031 1500f kernel/sched/core.o
>>>
>>> Does LTO preserve this behaviour? I've never quite dared do this exact
>>> optimization.
>>
>> Also, ld.lld seems a popular option.
>
> I asked on their IRC. Oh, it looks like no way is for this.
>
> About the link: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html
>
> (17:19:25) nbjoerg: but it is not guarenteed behavior
> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

Introduction of sched_class::id instead of this patch's approach does not have a big sense,
since this will help in check_preempt_curr() only. And this requires too many new lines of code.

2019-12-19 14:44:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 17:25:40 +0300
Kirill Tkhai <[email protected]> wrote:

> (17:19:25) nbjoerg: but it is not guarenteed behavior
> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

Which appears to work. I tried this patch on top of yours:

Not sure how this does with locality though.

-- Steve

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..ff12a422ff19 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,13 @@
#define SBSS_MAIN .sbss
#endif

+#define SCHED_DATA \
+ *(__idle_sched_class) \
+ *(__fair_sched_class) \
+ *(__rt_sched_class) \
+ *(__dl_sched_class) \
+ *(__stop_sched_class)
+
/*
* Align to a 32 byte boundary equal to the
* alignment gcc 4.5 uses for a struct
@@ -308,6 +315,7 @@
#define DATA_DATA \
*(.xiptext) \
*(DATA_MAIN) \
+ SCHED_DATA \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
MEM_KEEP(init.data*) \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
}
}

-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+ __attribute__((section("__dl_sched_class"))) = {
.next = &rt_sched_class,
.enqueue_task = enqueue_task_dl,
.dequeue_task = dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..e745fe0e0cd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
/*
* All the scheduling class methods:
*/
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+ __attribute__((section("__fair_sched_class"))) = {
.next = &idle_sched_class,
.enqueue_task = enqueue_task_fair,
.dequeue_task = dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU idle tasks:
*/
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+ __attribute__((section("__idle_sched_class"))) = {
/* .next is NULL */
/* no enqueue/yield_task for idle tasks */

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
return 0;
}

-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+ __attribute__((section("__rt_sched_class"))) = {
.next = &fair_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU stop tasks:
*/
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+ __attribute__((section("__stop_sched_class"))) = {
.next = &dl_sched_class,

.enqueue_task = enqueue_task_stop,

2019-12-19 14:47:51

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 17:43, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 17:25:40 +0300
> Kirill Tkhai <[email protected]> wrote:
>
>> (17:19:25) nbjoerg: but it is not guarenteed behavior
>> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections
>
> Which appears to work. I tried this patch on top of yours:
>
> Not sure how this does with locality though.

Hm, I'm not sure, but AFAIR some (all?) sections are aligned at 4K.
Will this bring holes (4K-sizeof(struct sched_class)) in address
space?


> -- Steve
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
> #define SBSS_MAIN .sbss
> #endif
>
> +#define SCHED_DATA \
> + *(__idle_sched_class) \
> + *(__fair_sched_class) \
> + *(__rt_sched_class) \
> + *(__dl_sched_class) \
> + *(__stop_sched_class)
> +
> /*
> * Align to a 32 byte boundary equal to the
> * alignment gcc 4.5 uses for a struct
> @@ -308,6 +315,7 @@
> #define DATA_DATA \
> *(.xiptext) \
> *(DATA_MAIN) \
> + SCHED_DATA \
> *(.ref.data) \
> *(.data..shared_aligned) /* percpu related */ \
> MEM_KEEP(init.data*) \
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 43323f875cb9..5abdbe569f93 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> }
> }
>
> -const struct sched_class dl_sched_class = {
> +const struct sched_class dl_sched_class
> + __attribute__((section("__dl_sched_class"))) = {
> .next = &rt_sched_class,
> .enqueue_task = enqueue_task_dl,
> .dequeue_task = dequeue_task_dl,
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 08a233e97a01..e745fe0e0cd3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
> /*
> * All the scheduling class methods:
> */
> -const struct sched_class fair_sched_class = {
> +const struct sched_class fair_sched_class
> + __attribute__((section("__fair_sched_class"))) = {
> .next = &idle_sched_class,
> .enqueue_task = enqueue_task_fair,
> .dequeue_task = dequeue_task_fair,
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index ffa959e91227..700a9c826f0e 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
> /*
> * Simple, special scheduling class for the per-CPU idle tasks:
> */
> -const struct sched_class idle_sched_class = {
> +const struct sched_class idle_sched_class
> + __attribute__((section("__idle_sched_class"))) = {
> /* .next is NULL */
> /* no enqueue/yield_task for idle tasks */
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index e591d40fd645..5d3f9bcddaeb 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> return 0;
> }
>
> -const struct sched_class rt_sched_class = {
> +const struct sched_class rt_sched_class
> + __attribute__((section("__rt_sched_class"))) = {
> .next = &fair_sched_class,
> .enqueue_task = enqueue_task_rt,
> .dequeue_task = dequeue_task_rt,
> diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> index 4c9e9975684f..03bc7530ff75 100644
> --- a/kernel/sched/stop_task.c
> +++ b/kernel/sched/stop_task.c
> @@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
> /*
> * Simple, special scheduling class for the per-CPU stop tasks:
> */
> -const struct sched_class stop_sched_class = {
> +const struct sched_class stop_sched_class
> + __attribute__((section("__stop_sched_class"))) = {
> .next = &dl_sched_class,
>
> .enqueue_task = enqueue_task_stop,
>

2019-12-19 15:03:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 17:46:37 +0300
Kirill Tkhai <[email protected]> wrote:

> On 19.12.2019 17:43, Steven Rostedt wrote:
> > On Thu, 19 Dec 2019 17:25:40 +0300
> > Kirill Tkhai <[email protected]> wrote:
> >
> >> (17:19:25) nbjoerg: but it is not guarenteed behavior
> >> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections
> >
> > Which appears to work. I tried this patch on top of yours:
> >
> > Not sure how this does with locality though.
>
> Hm, I'm not sure, but AFAIR some (all?) sections are aligned at 4K.
> Will this bring holes (4K-sizeof(struct sched_class)) in address
> space?

I believe only if you set the align attribute in the linker script.
With this and your patch:

# grep sched_class /proc/kallsyms
ffffffff8e760900 D idle_sched_class
ffffffff8e7609e0 D fair_sched_class
ffffffff8e760ac0 D rt_sched_class
ffffffff8e760ba0 D dl_sched_class
ffffffff8e760c80 D stop_sched_class

-- Steve

>
>
> > -- Steve
> >
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index e00f41aa8ec4..ff12a422ff19 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -108,6 +108,13 @@
> > #define SBSS_MAIN .sbss
> > #endif
> >
> > +#define SCHED_DATA \
> > + *(__idle_sched_class) \
> > + *(__fair_sched_class) \
> > + *(__rt_sched_class) \
> > + *(__dl_sched_class) \
> > + *(__stop_sched_class)
> > +
> > /*
> > * Align to a 32 byte boundary equal to the
> > * alignment gcc 4.5 uses for a struct
> > @@ -308,6 +315,7 @@
> > #define DATA_DATA \
> > *(.xiptext) \
> > *(DATA_MAIN) \
> > + SCHED_DATA \
> > *(.ref.data) \
> > *(.data..shared_aligned) /* percpu related */ \
> > MEM_KEEP(init.data*) \
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 43323f875cb9..5abdbe569f93 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> > }
> > }
> >
> > -const struct sched_class dl_sched_class = {
> > +const struct sched_class dl_sched_class
> > + __attribute__((section("__dl_sched_class"))) = {
> > .next = &rt_sched_class,
> > .enqueue_task = enqueue_task_dl,
> > .dequeue_task = dequeue_task_dl,
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 08a233e97a01..e745fe0e0cd3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
> > /*
> > * All the scheduling class methods:
> > */
> > -const struct sched_class fair_sched_class = {
> > +const struct sched_class fair_sched_class
> > + __attribute__((section("__fair_sched_class"))) = {
> > .next = &idle_sched_class,
> > .enqueue_task = enqueue_task_fair,
> > .dequeue_task = dequeue_task_fair,
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index ffa959e91227..700a9c826f0e 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
> > /*
> > * Simple, special scheduling class for the per-CPU idle tasks:
> > */
> > -const struct sched_class idle_sched_class = {
> > +const struct sched_class idle_sched_class
> > + __attribute__((section("__idle_sched_class"))) = {
> > /* .next is NULL */
> > /* no enqueue/yield_task for idle tasks */
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index e591d40fd645..5d3f9bcddaeb 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> > return 0;
> > }
> >
> > -const struct sched_class rt_sched_class = {
> > +const struct sched_class rt_sched_class
> > + __attribute__((section("__rt_sched_class"))) = {
> > .next = &fair_sched_class,
> > .enqueue_task = enqueue_task_rt,
> > .dequeue_task = dequeue_task_rt,
> > diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> > index 4c9e9975684f..03bc7530ff75 100644
> > --- a/kernel/sched/stop_task.c
> > +++ b/kernel/sched/stop_task.c
> > @@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
> > /*
> > * Simple, special scheduling class for the per-CPU stop tasks:
> > */
> > -const struct sched_class stop_sched_class = {
> > +const struct sched_class stop_sched_class
> > + __attribute__((section("__stop_sched_class"))) = {
> > .next = &dl_sched_class,
> >
> > .enqueue_task = enqueue_task_stop,
> >

2019-12-19 15:23:26

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

From: Kirill Tkhai <[email protected]>

This introduces an optimization based on xxx_sched_class addresses
in two hot scheduler functions: pick_next_task() and check_preempt_curr().

After this patch, it will be possible to compare pointers to sched classes
to check, which of them has a higher priority, instead of current iterations
using for_each_class().

One more result of the patch is that size of object file becomes a little
less (excluding added BUG_ON(), which goes in __init section):

$size kernel/sched/core.o
text data bss dec hex filename
before: 66446 18957 676 86079 1503f kernel/sched/core.o
after: 66398 18957 676 86031 1500f kernel/sched/core.o

SCHED_DATA improvements guaranteeing order of sched classes are made
by Steven Rostedt <[email protected]>

Signed-off-by: Kirill Tkhai <[email protected]>

v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
---
include/asm-generic/vmlinux.lds.h | 8 ++++++++
kernel/sched/core.c | 24 +++++++++---------------
kernel/sched/deadline.c | 3 ++-
kernel/sched/fair.c | 3 ++-
kernel/sched/idle.c | 3 ++-
kernel/sched/rt.c | 3 ++-
kernel/sched/stop_task.c | 3 ++-
7 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..ff12a422ff19 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,13 @@
#define SBSS_MAIN .sbss
#endif

+#define SCHED_DATA \
+ *(__idle_sched_class) \
+ *(__fair_sched_class) \
+ *(__rt_sched_class) \
+ *(__dl_sched_class) \
+ *(__stop_sched_class)
+
/*
* Align to a 32 byte boundary equal to the
* alignment gcc 4.5 uses for a struct
@@ -308,6 +315,7 @@
#define DATA_DATA \
*(.xiptext) \
*(DATA_MAIN) \
+ SCHED_DATA \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
MEM_KEEP(init.data*) \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15508c202bf5..befdd7158b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1416,20 +1416,10 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,

void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
- const struct sched_class *class;
-
- if (p->sched_class == rq->curr->sched_class) {
+ if (p->sched_class == rq->curr->sched_class)
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
- } else {
- for_each_class(class) {
- if (class == rq->curr->sched_class)
- break;
- if (class == p->sched_class) {
- resched_curr(rq);
- break;
- }
- }
- }
+ else if (p->sched_class > rq->curr->sched_class)
+ resched_curr(rq);

/*
* A queue event has occurred, and we're going to schedule. In
@@ -3914,8 +3904,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* higher scheduling class, because otherwise those loose the
* opportunity to pull in more work from other CPUs.
*/
- if (likely((prev->sched_class == &idle_sched_class ||
- prev->sched_class == &fair_sched_class) &&
+ if (likely(prev->sched_class <= &fair_sched_class &&
rq->nr_running == rq->cfs.h_nr_running)) {

p = pick_next_task_fair(rq, prev, rf);
@@ -6569,6 +6558,11 @@ void __init sched_init(void)
unsigned long ptr = 0;
int i;

+ BUG_ON(&idle_sched_class > &fair_sched_class ||
+ &fair_sched_class > &rt_sched_class ||
+ &rt_sched_class > &dl_sched_class ||
+ &dl_sched_class > &stop_sched_class);
+
wait_bit_init();

#ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
}
}

-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+ __attribute__((section("__dl_sched_class"))) = {
.next = &rt_sched_class,
.enqueue_task = enqueue_task_dl,
.dequeue_task = dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8da0222924cf..9379b3804582 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10760,7 +10760,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
/*
* All the scheduling class methods:
*/
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+ __attribute__((section("__fair_sched_class"))) = {
.next = &idle_sched_class,
.enqueue_task = enqueue_task_fair,
.dequeue_task = dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU idle tasks:
*/
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+ __attribute__((section("__idle_sched_class"))) = {
/* .next is NULL */
/* no enqueue/yield_task for idle tasks */

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
return 0;
}

-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+ __attribute__((section("__rt_sched_class"))) = {
.next = &fair_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU stop tasks:
*/
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+ __attribute__((section("__stop_sched_class"))) = {
.next = &dl_sched_class,

.enqueue_task = enqueue_task_stop,

2019-12-19 15:23:27

by Jeff Law

[permalink] [raw]
Subject: Re: [Q] ld: Does LTO reorder ro variables in two files?

On Thu, 2019-12-19 at 17:04 +0300, Kirill Tkhai wrote:
> CC: [email protected]
>
> Hi, gcc guys,
>
> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>
> There are two const variables:
>
> struct sched_class idle_sched_class
> and
> struct sched_class fair_sched_class,
>
> which are declared in two files idle.c and fair.c.
>
> 1)In Makefile the order is: idle.o fair.o
> 2)the variables go to the same ro section
> 3)there is no SORT(.*) keyword in linker script.
>
> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
I certainly wouldn't depend on it. The first and most obvious problem
is symbol sorting by the linker. Longer term I'd be worried about LTO
reordering things.

In the end I'm pretty sure it'd be well outside what I'd be comfortable
depending on.

jeff
>

2019-12-19 15:32:28

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [Q] ld: Does LTO reorder ro variables in two files?

On 19.12.2019 18:21, Jeff Law wrote:
> On Thu, 2019-12-19 at 17:04 +0300, Kirill Tkhai wrote:
>> CC: [email protected]
>>
>> Hi, gcc guys,
>>
>> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>>
>> There are two const variables:
>>
>> struct sched_class idle_sched_class
>> and
>> struct sched_class fair_sched_class,
>>
>> which are declared in two files idle.c and fair.c.
>>
>> 1)In Makefile the order is: idle.o fair.o
>> 2)the variables go to the same ro section
>> 3)there is no SORT(.*) keyword in linker script.
>>
>> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
> I certainly wouldn't depend on it. The first and most obvious problem
> is symbol sorting by the linker. Longer term I'd be worried about LTO
> reordering things.
>
> In the end I'm pretty sure it'd be well outside what I'd be comfortable
> depending on.

Ok, I'd be comfortable too :) Thanks for the clarification, Jeff.

2019-12-19 15:41:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 18:20:58 +0300
Kirill Tkhai <[email protected]> wrote:

> From: Kirill Tkhai <[email protected]>
>
> This introduces an optimization based on xxx_sched_class addresses
> in two hot scheduler functions: pick_next_task() and check_preempt_curr().
>
> After this patch, it will be possible to compare pointers to sched classes
> to check, which of them has a higher priority, instead of current iterations
> using for_each_class().
>
> One more result of the patch is that size of object file becomes a little
> less (excluding added BUG_ON(), which goes in __init section):
>
> $size kernel/sched/core.o
> text data bss dec hex filename
> before: 66446 18957 676 86079 1503f kernel/sched/core.o
> after: 66398 18957 676 86031 1500f kernel/sched/core.o
>
> SCHED_DATA improvements guaranteeing order of sched classes are made
> by Steven Rostedt <[email protected]>

For the above changes, you can add:

Signed-off-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

>
> Signed-off-by: Kirill Tkhai <[email protected]>
>
> v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
> ---
> include/asm-generic/vmlinux.lds.h | 8 ++++++++
> kernel/sched/core.c | 24 +++++++++---------------
> kernel/sched/deadline.c | 3 ++-
> kernel/sched/fair.c | 3 ++-
> kernel/sched/idle.c | 3 ++-
> kernel/sched/rt.c | 3 ++-
> kernel/sched/stop_task.c | 3 ++-
> 7 files changed, 27 insertions(+), 20 deletions(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
> #define SBSS_MAIN .sbss
> #endif
>

2019-12-19 15:56:36

by Alexander Monakov

[permalink] [raw]
Subject: Re: [Q] ld: Does LTO reorder ro variables in two files?

[adding Jan Hubicka, GCC LTO maintainer]

On Thu, 19 Dec 2019, Kirill Tkhai wrote:

> CC: [email protected]
>
> Hi, gcc guys,
>
> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>
> There are two const variables:
>
> struct sched_class idle_sched_class
> and
> struct sched_class fair_sched_class,
>
> which are declared in two files idle.c and fair.c.
>
> 1)In Makefile the order is: idle.o fair.o
> 2)the variables go to the same ro section
> 3)there is no SORT(.*) keyword in linker script.
>
> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?

No, with LTO you don't have that guarantee. For functions it's more obvious,
GCC wants to analyze functions in reverse topological order so callees are
generally optimized before callers, and it will emit assembly as it goes, so
function ordering with LTO does not give much care to translation unit
boundaries. For variables it's a bit more subtle, GCC partitions all variables
and functions so it can hand them off to multiple compiler processes while doing
LTO. There's no guarantees about order of variables that end up in different
partitions.

There's __attribute__((no_reorder)) that is intended to enforce ordering even
with LTO (it's documented under "Common function attributes" but works for
global variables as well).

Alexander

> Thanks!
> Kirill
>
> On 19.12.2019 16:12, Peter Zijlstra wrote:
> > On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> >> In kernel/sched/Makefile files, describing different sched classes, already
> >> go in the order from the lowest priority class to the highest priority class:
> >>
> >> idle.o fair.o rt.o deadline.o stop_task.o
> >>
> >> The documentation of GNU linker says, that section appears in the order
> >> they are seen during link time (see [1]):
> >>
> >>> Normally, the linker will place files and sections matched by wildcards
> >>> in the order in which they are seen during the link. You can change this
> >>> by using the SORT keyword, which appears before a wildcard pattern
> >>> in parentheses (e.g., SORT(.text*)).
> >>
> >> So, we may expect const variables from idle.o will go before ro variables
> >> from fair.o in RO_DATA section, while ro variables from fair.o will go
> >> before ro variables from rt.o, etc.
> >>
> >> (Also, it looks like the linking order is already used in kernel, e.g.
> >> in drivers/md/Makefile)
> >>
> >> Thus, we may introduce an optimization based on xxx_sched_class addresses
> >> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
> >>
> >> One more result of the patch is that size of object file becomes a little
> >> less (excluding added BUG_ON(), which goes in __init section):
> >>
> >> $size kernel/sched/core.o
> >> text data bss dec hex filename
> >> before: 66446 18957 676 86079 1503f kernel/sched/core.o
> >> after: 66398 18957 676 86031 1500f kernel/sched/core.o
> >
> > Does LTO preserve this behaviour? I've never quite dared do this exact
> > optimization.
>
>

2019-12-19 16:06:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 18:20:58 +0300
Kirill Tkhai <[email protected]> wrote:

> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
> #define SBSS_MAIN .sbss
> #endif

I would also add a comment here:

/*
* The order of the sched class addresses are important, as they are
* used to determine the order of the priority of each sched class in
* relation to each other.
*/

>
> +#define SCHED_DATA \
> + *(__idle_sched_class) \
> + *(__fair_sched_class) \
> + *(__rt_sched_class) \
> + *(__dl_sched_class) \
> + *(__stop_sched_class)
> +
> /*
> * Align to a 32 byte boundary equal to the
> * alignment gcc 4.5 uses for a struct
> @@ -308,6 +315,7 @@
> #define DATA_DATA \
> *(.xiptext) \
> *(DATA_MAIN) \
> + SCHED_DATA \
> *(.ref.data) \
> *(.data..shared_aligned) /* percpu related */ \
> MEM_KEEP(init.data*) \

2019-12-19 16:11:45

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [Q] ld: Does LTO reorder ro variables in two files?

On 19.12.2019 18:45, Alexander Monakov wrote:
> [adding Jan Hubicka, GCC LTO maintainer]
>
> On Thu, 19 Dec 2019, Kirill Tkhai wrote:
>
>> CC: [email protected]
>>
>> Hi, gcc guys,
>>
>> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>>
>> There are two const variables:
>>
>> struct sched_class idle_sched_class
>> and
>> struct sched_class fair_sched_class,
>>
>> which are declared in two files idle.c and fair.c.
>>
>> 1)In Makefile the order is: idle.o fair.o
>> 2)the variables go to the same ro section
>> 3)there is no SORT(.*) keyword in linker script.
>>
>> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
>
> No, with LTO you don't have that guarantee. For functions it's more obvious,
> GCC wants to analyze functions in reverse topological order so callees are
> generally optimized before callers, and it will emit assembly as it goes, so
> function ordering with LTO does not give much care to translation unit
> boundaries. For variables it's a bit more subtle, GCC partitions all variables
> and functions so it can hand them off to multiple compiler processes while doing
> LTO. There's no guarantees about order of variables that end up in different
> partitions.
>
> There's __attribute__((no_reorder)) that is intended to enforce ordering even
> with LTO (it's documented under "Common function attributes" but works for
> global variables as well).

Thanks, Alexander!

Kirill

2019-12-19 16:12:18

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 18:40, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 18:20:58 +0300
> Kirill Tkhai <[email protected]> wrote:
>
>> From: Kirill Tkhai <[email protected]>
>>
>> This introduces an optimization based on xxx_sched_class addresses
>> in two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>
>> After this patch, it will be possible to compare pointers to sched classes
>> to check, which of them has a higher priority, instead of current iterations
>> using for_each_class().
>>
>> One more result of the patch is that size of object file becomes a little
>> less (excluding added BUG_ON(), which goes in __init section):
>>
>> $size kernel/sched/core.o
>> text data bss dec hex filename
>> before: 66446 18957 676 86079 1503f kernel/sched/core.o
>> after: 66398 18957 676 86031 1500f kernel/sched/core.o
>>
>> SCHED_DATA improvements guaranteeing order of sched classes are made
>> by Steven Rostedt <[email protected]>
>
> For the above changes, you can add:
>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Should I resend this as two patches, with your changes in a separate?

>
>>
>> Signed-off-by: Kirill Tkhai <[email protected]>
>>
>> v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
>> ---
>> include/asm-generic/vmlinux.lds.h | 8 ++++++++
>> kernel/sched/core.c | 24 +++++++++---------------
>> kernel/sched/deadline.c | 3 ++-
>> kernel/sched/fair.c | 3 ++-
>> kernel/sched/idle.c | 3 ++-
>> kernel/sched/rt.c | 3 ++-
>> kernel/sched/stop_task.c | 3 ++-
>> 7 files changed, 27 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>> index e00f41aa8ec4..ff12a422ff19 100644
>> --- a/include/asm-generic/vmlinux.lds.h
>> +++ b/include/asm-generic/vmlinux.lds.h
>> @@ -108,6 +108,13 @@
>> #define SBSS_MAIN .sbss
>> #endif
>>

2019-12-19 16:23:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 19:08:05 +0300
Kirill Tkhai <[email protected]> wrote:

> Should I resend this as two patches, with your changes in a separate?

You don't have to, you can include multiple SOBs if a patch was written
by two people.

But perhaps it will better to do so, that way people will know who to
blame when the linker breaks ;-)

I'll send you a patch that you can apply just before your change. That
may be the cleanest way.

-- Steve

2019-12-19 20:18:12

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On 19.12.2019 19:22, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 19:08:05 +0300
> Kirill Tkhai <[email protected]> wrote:
>
>> Should I resend this as two patches, with your changes in a separate?
>
> You don't have to, you can include multiple SOBs if a patch was written
> by two people.
>
> But perhaps it will better to do so, that way people will know who to
> blame when the linker breaks ;-)
>
> I'll send you a patch that you can apply just before your change. That
> may be the cleanest way.

Two small patches look better then one huge, so I prefer to send a patch
on top of yours :)

2019-12-19 20:22:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()

On Thu, 19 Dec 2019 20:16:17 +0000
Kirill Tkhai <[email protected]> wrote:

> Two small patches look better then one huge, so I prefer to send a patch
> on top of yours :)

My one patch has turned into 3 patches. Having the sorted structs gave
me some more crazy ideas. I'll post the series shortly ;-)

-- Steve

2020-06-25 11:55:00

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/core] sched: Force the address order of each sched class descriptor

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 590d69796346353878b275c5512c664e3f875f24
Gitweb: https://git.kernel.org/tip/590d69796346353878b275c5512c664e3f875f24
Author: Steven Rostedt (VMware) <[email protected]>
AuthorDate: Thu, 19 Dec 2019 16:44:52 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 25 Jun 2020 13:45:43 +02:00

sched: Force the address order of each sched class descriptor

In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:

&idle_sched_class < &fair_sched_class < &rt_sched_class <
&dl_sched_class < &stop_sched_class

In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain
---
include/asm-generic/vmlinux.lds.h | 13 +++++++++++++
kernel/sched/deadline.c | 3 ++-
kernel/sched/fair.c | 3 ++-
kernel/sched/idle.c | 3 ++-
kernel/sched/rt.c | 3 ++-
kernel/sched/stop_task.c | 3 ++-
6 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index db600ef..2186d7b 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -109,6 +109,18 @@
#endif

/*
+ * The order of the sched class addresses are important, as they are
+ * used to determine the order of the priority of each sched class in
+ * relation to each other.
+ */
+#define SCHED_DATA \
+ *(__idle_sched_class) \
+ *(__fair_sched_class) \
+ *(__rt_sched_class) \
+ *(__dl_sched_class) \
+ *(__stop_sched_class)
+
+/*
* Align to a 32 byte boundary equal to the
* alignment gcc 4.5 uses for a struct
*/
@@ -388,6 +400,7 @@
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
+ SCHED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
__start___tracepoints_ptrs = .; \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d4708e2..d9e7946 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2479,7 +2479,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
}
}

-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+ __attribute__((section("__dl_sched_class"))) = {
.next = &rt_sched_class,
.enqueue_task = enqueue_task_dl,
.dequeue_task = dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0424a0a..3365f6b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11122,7 +11122,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
/*
* All the scheduling class methods:
*/
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+ __attribute__((section("__fair_sched_class"))) = {
.next = &idle_sched_class,
.enqueue_task = enqueue_task_fair,
.dequeue_task = dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8d75ca2..f580629 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -453,7 +453,8 @@ static void update_curr_idle(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU idle tasks:
*/
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+ __attribute__((section("__idle_sched_class"))) = {
/* .next is NULL */
/* no enqueue/yield_task for idle tasks */

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f395ddb..6543d44 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2429,7 +2429,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
return 0;
}

-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+ __attribute__((section("__rt_sched_class"))) = {
.next = &fair_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 3e50a6a..f4bbd54 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -109,7 +109,8 @@ static void update_curr_stop(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU stop tasks:
*/
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+ __attribute__((section("__stop_sched_class"))) = {
.next = &dl_sched_class,

.enqueue_task = enqueue_task_stop,