2019-12-19 21:47:07

by Steven Rostedt

[permalink] [raw]
Subject: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

From: "Steven Rostedt (VMware)" <[email protected]>

In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:

&idle_sched_class < &fair_sched_class < &rt_sched_class <
&dl_sched_class < &stop_sched_class

In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.

Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 19 +++++++++++++++++++
kernel/sched/deadline.c | 3 ++-
kernel/sched/fair.c | 3 ++-
kernel/sched/idle.c | 3 ++-
kernel/sched/rt.c | 3 ++-
kernel/sched/stop_task.c | 3 ++-
6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..772d961c69a5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,24 @@
#define SBSS_MAIN .sbss
#endif

+#ifdef CONFIG_SMP
+#define STOP_SCHED_CLASS *(__stop_sched_class)
+#else
+#define STOP_SCHED_CLASS
+#endif
+
+/*
+ * The order of the sched class addresses are important, as they are
+ * used to determine the order of the priority of each sched class in
+ * relation to each other.
+ */
+#define SCHED_DATA \
+ *(__idle_sched_class) \
+ *(__fair_sched_class) \
+ *(__rt_sched_class) \
+ *(__dl_sched_class) \
+ STOP_SCHED_CLASS
+
/*
* Align to a 32 byte boundary equal to the
* alignment gcc 4.5 uses for a struct
@@ -308,6 +326,7 @@
#define DATA_DATA \
*(.xiptext) \
*(DATA_MAIN) \
+ SCHED_DATA \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
MEM_KEEP(init.data*) \
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
}
}

-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+ __attribute__((section("__dl_sched_class"))) = {
.next = &rt_sched_class,
.enqueue_task = enqueue_task_dl,
.dequeue_task = dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..e745fe0e0cd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
/*
* All the scheduling class methods:
*/
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+ __attribute__((section("__fair_sched_class"))) = {
.next = &idle_sched_class,
.enqueue_task = enqueue_task_fair,
.dequeue_task = dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU idle tasks:
*/
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+ __attribute__((section("__idle_sched_class"))) = {
/* .next is NULL */
/* no enqueue/yield_task for idle tasks */

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
return 0;
}

-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+ __attribute__((section("__rt_sched_class"))) = {
.next = &fair_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
/*
* Simple, special scheduling class for the per-CPU stop tasks:
*/
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+ __attribute__((section("__stop_sched_class"))) = {
.next = &dl_sched_class,

.enqueue_task = enqueue_task_stop,
--
2.24.0



2019-12-20 08:53:38

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On 19/12/2019 22.44, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <[email protected]>
>
> In order to make a micro optimization in pick_next_task(), the order of the
> sched class descriptor address must be in the same order as their priority
> to each other. That is:
>
> &idle_sched_class < &fair_sched_class < &rt_sched_class <
> &dl_sched_class < &stop_sched_class
>
> In order to guarantee this order of the sched class descriptors, add each
> one into their own data section and force the order in the linker script.

I think it would make the code simpler if one reverses these, see other
reply.

> +/*
> + * The order of the sched class addresses are important, as they are
> + * used to determine the order of the priority of each sched class in
> + * relation to each other.
> + */
> +#define SCHED_DATA \
> + *(__idle_sched_class) \
> + *(__fair_sched_class) \
> + *(__rt_sched_class) \
> + *(__dl_sched_class) \
> + STOP_SCHED_CLASS
> +
> /*
> * Align to a 32 byte boundary equal to the
> * alignment gcc 4.5 uses for a struct
> @@ -308,6 +326,7 @@
> #define DATA_DATA \
> *(.xiptext) \
> *(DATA_MAIN) \
> + SCHED_DATA \
> *(.ref.data) \

Doesn't this make the structs end up in .data (writable) rather than
.rodata?

Rasmus

2019-12-20 10:02:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
> On 19/12/2019 22.44, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <[email protected]>
> >
> > In order to make a micro optimization in pick_next_task(), the order of the
> > sched class descriptor address must be in the same order as their priority
> > to each other. That is:
> >
> > &idle_sched_class < &fair_sched_class < &rt_sched_class <
> > &dl_sched_class < &stop_sched_class
> >
> > In order to guarantee this order of the sched class descriptors, add each
> > one into their own data section and force the order in the linker script.
>
> I think it would make the code simpler if one reverses these, see other
> reply.

I started out agreeing, because of that mess around STOP_SCHED_CLASS and
that horrid BEFORE_CRUD thing.

Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
and that ends up looking like:

- if (likely((prev->sched_class == &idle_sched_class ||
- prev->sched_class == &fair_sched_class) &&
+ if (likely(prev->sched_class >= &fair_sched_class &&

And that's just weird.

Then I had a better look and now...

> > +/*
> > + * The order of the sched class addresses are important, as they are
> > + * used to determine the order of the priority of each sched class in
> > + * relation to each other.
> > + */
> > +#define SCHED_DATA \
> > + *(__idle_sched_class) \
> > + *(__fair_sched_class) \
> > + *(__rt_sched_class) \
> > + *(__dl_sched_class) \
> > + STOP_SCHED_CLASS

I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
Doesn't the linker deal with empty sections already by making them 0
sized?

> > /*
> > * Align to a 32 byte boundary equal to the
> > * alignment gcc 4.5 uses for a struct
> > @@ -308,6 +326,7 @@
> > #define DATA_DATA \
> > *(.xiptext) \
> > *(DATA_MAIN) \
> > + SCHED_DATA \
> > *(.ref.data) \
>
> Doesn't this make the structs end up in .data (writable) rather than
> .rodata?

Right! That wants fixing.

2019-12-20 10:14:05

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On 20/12/2019 11.00, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
>> On 19/12/2019 22.44, Steven Rostedt wrote:
>>> From: "Steven Rostedt (VMware)" <[email protected]>
>>>
>>> In order to make a micro optimization in pick_next_task(), the order of the
>>> sched class descriptor address must be in the same order as their priority
>>> to each other. That is:
>>>
>>> &idle_sched_class < &fair_sched_class < &rt_sched_class <
>>> &dl_sched_class < &stop_sched_class
>>>
>>> In order to guarantee this order of the sched class descriptors, add each
>>> one into their own data section and force the order in the linker script.
>>
>> I think it would make the code simpler if one reverses these, see other
>> reply.
>
> I started out agreeing, because of that mess around STOP_SCHED_CLASS and
> that horrid BEFORE_CRUD thing.
>
> Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
> and that ends up looking like:
>
> - if (likely((prev->sched_class == &idle_sched_class ||
> - prev->sched_class == &fair_sched_class) &&
> + if (likely(prev->sched_class >= &fair_sched_class &&
>
> And that's just weird.

I kind of agree, but if one can come up with good enough macro names, I
think all that comparison logic should be in helpers either way the
array is laid out. Something like

#define sched_class_lower_eq [something] /* perhaps comment on the array
order */
sched_class_lower_eq(prev->sched_class, &fair_sched_class)

> Then I had a better look and now...
>
>>> +/*
>>> + * The order of the sched class addresses are important, as they are
>>> + * used to determine the order of the priority of each sched class in
>>> + * relation to each other.
>>> + */
>>> +#define SCHED_DATA \
>>> + *(__idle_sched_class) \
>>> + *(__fair_sched_class) \
>>> + *(__rt_sched_class) \
>>> + *(__dl_sched_class) \
>>> + STOP_SCHED_CLASS
>
> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
> Doesn't the linker deal with empty sections already by making them 0
> sized?

Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
needing some ifdeffery to define highest_sched_class if they are laid
out in (higher sched class <-> higher address) order.

Rasmus

2019-12-20 10:46:44

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On 20.12.2019 13:12, Rasmus Villemoes wrote:
> On 20/12/2019 11.00, Peter Zijlstra wrote:
>> On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
>>> On 19/12/2019 22.44, Steven Rostedt wrote:
>>>> From: "Steven Rostedt (VMware)" <[email protected]>
>>>>
>>>> In order to make a micro optimization in pick_next_task(), the order of the
>>>> sched class descriptor address must be in the same order as their priority
>>>> to each other. That is:
>>>>
>>>> &idle_sched_class < &fair_sched_class < &rt_sched_class <
>>>> &dl_sched_class < &stop_sched_class
>>>>
>>>> In order to guarantee this order of the sched class descriptors, add each
>>>> one into their own data section and force the order in the linker script.
>>>
>>> I think it would make the code simpler if one reverses these, see other
>>> reply.
>>
>> I started out agreeing, because of that mess around STOP_SCHED_CLASS and
>> that horrid BEFORE_CRUD thing.
>>
>> Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
>> and that ends up looking like:
>>
>> - if (likely((prev->sched_class == &idle_sched_class ||
>> - prev->sched_class == &fair_sched_class) &&
>> + if (likely(prev->sched_class >= &fair_sched_class &&
>>
>> And that's just weird.
>
> I kind of agree, but if one can come up with good enough macro names, I
> think all that comparison logic should be in helpers either way the
> array is laid out. Something like
>
> #define sched_class_lower_eq [something] /* perhaps comment on the array
> order */
> sched_class_lower_eq(prev->sched_class, &fair_sched_class)

You started from making code simple argument. But in case of we make the code
simple in this single place, the rest of code, which are the many places, will
become wrapped over huge helpers, which make the readability worse IMO.

After we have implemented a reliable sections order in lds file, we may use
it for a long time without remembering the way they are organized. But in case
of we introduce huge helpers, we always will bump into them during analyzing
of difficult asynchronous code, and these helpers size will overflow internal
buffers in our heads just by their excess symbols.

My opinion is to better make some less beautiful thing in a single synchronous place,
than to distribute the redundancy over all the code (especially, when it is asynchronous).

>> Then I had a better look and now...
>>
>>>> +/*
>>>> + * The order of the sched class addresses are important, as they are
>>>> + * used to determine the order of the priority of each sched class in
>>>> + * relation to each other.
>>>> + */
>>>> +#define SCHED_DATA \
>>>> + *(__idle_sched_class) \
>>>> + *(__fair_sched_class) \
>>>> + *(__rt_sched_class) \
>>>> + *(__dl_sched_class) \
>>>> + STOP_SCHED_CLASS
>>
>> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
>> Doesn't the linker deal with empty sections already by making them 0
>> sized?
>
> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
> needing some ifdeffery to define highest_sched_class if they are laid
> out in (higher sched class <-> higher address) order.
>
> Rasmus
>

2019-12-20 12:21:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On Fri, Dec 20, 2019 at 11:12:37AM +0100, Rasmus Villemoes wrote:
> On 20/12/2019 11.00, Peter Zijlstra wrote:

> >>> +/*
> >>> + * The order of the sched class addresses are important, as they are
> >>> + * used to determine the order of the priority of each sched class in
> >>> + * relation to each other.
> >>> + */
> >>> +#define SCHED_DATA \
> >>> + *(__idle_sched_class) \
> >>> + *(__fair_sched_class) \
> >>> + *(__rt_sched_class) \
> >>> + *(__dl_sched_class) \
> >>> + STOP_SCHED_CLASS
> >
> > I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
> > Doesn't the linker deal with empty sections already by making them 0
> > sized?
>
> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
> needing some ifdeffery to define highest_sched_class if they are laid
> out in (higher sched class <-> higher address) order.

Would not something like:

__begin_sched_classes = .;
*(__idle_sched_class)
*(__fair_sched_class)
*(__rt_sched_class)
*(__dl_sched_class)
*(__stop_sched_class)
__end_sched_classes = .;

combined with something like:

extern struct sched_class *__begin_sched_classes;
extern struct sched_class *__end_sched_classes;

#define sched_class_highest (__end_sched_classes - 1)
#define sched_class_lowest (__begin_sched_classes - 1)

#define for_class_range(class, _from, _to) \
for (class = (_from); class != (_to), class--)

#define for_each_class(class) \
for_class_range(class, sched_class_highest, sched_class_lowest)

just work?

When no __stop_sched_class is present, that section is 0 sized, and
__end_sched_classes points to one past __dl_sched_class, no?

2019-12-20 14:38:26

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On 20/12/2019 13.19, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 11:12:37AM +0100, Rasmus Villemoes wrote:
>> On 20/12/2019 11.00, Peter Zijlstra wrote:
>
>>>>> +/*
>>>>> + * The order of the sched class addresses are important, as they are
>>>>> + * used to determine the order of the priority of each sched class in
>>>>> + * relation to each other.
>>>>> + */
>>>>> +#define SCHED_DATA \
>>>>> + *(__idle_sched_class) \
>>>>> + *(__fair_sched_class) \
>>>>> + *(__rt_sched_class) \
>>>>> + *(__dl_sched_class) \
>>>>> + STOP_SCHED_CLASS
>>>
>>> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
>>> Doesn't the linker deal with empty sections already by making them 0
>>> sized?
>>
>> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
>> needing some ifdeffery to define highest_sched_class if they are laid
>> out in (higher sched class <-> higher address) order.
>
> Would not something like:
>
> __begin_sched_classes = .;
> *(__idle_sched_class)
> *(__fair_sched_class)
> *(__rt_sched_class)
> *(__dl_sched_class)
> *(__stop_sched_class)
> __end_sched_classes = .;
>
> combined with something like:
>
> extern struct sched_class *__begin_sched_classes;
> extern struct sched_class *__end_sched_classes;

extern const struct sched_class __begin_sched_classes[];

but yes, I get the idea.

> #define sched_class_highest (__end_sched_classes - 1)
> #define sched_class_lowest (__begin_sched_classes - 1)
>
> #define for_class_range(class, _from, _to) \
> for (class = (_from); class != (_to), class--)
>
> #define for_each_class(class) \
> for_class_range(class, sched_class_highest, sched_class_lowest)
>
> just work?

Yes, I think so - I was only thinking of the case where all the symbols
would be defined in the linker script, and for this to work you need the
C compiler to subtract the sizeof().

I'd probably not include the -1 in the definition of sched_class_lowest,
but instead put it in the for_each_class definition (i.e. use
sched_class_lowest-1 as _to parameter).

A whole other option is of course to make the whole thing a bona fide C
array defined in sched/core.c, with fair_sched_class being defined as
&sched_classes[1] etc. But that requires giving all the methods extern
linkage. The advantage might be that the compiler can see how much we
iterate over, though I wouldn't expect it to actually unroll the
for_each_class loops five times. So yes, the above is probably the best
way to go.

Rasmus

2019-12-20 15:20:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor

On Fri, 20 Dec 2019 13:44:05 +0300
Kirill Tkhai <[email protected]> wrote:

> My opinion is to better make some less beautiful thing in a single synchronous place,
> than to distribute the redundancy over all the code (especially, when it is asynchronous).

I very much subscribe to this notion. The TRACE_EVENT() and NMI code
does the same. Keep all other use cases simple by making it complex in
one locality.

-- Steve