When compiling the kernel with clang, we have a problem with the code
in __rescrtl_sched_in() because of the way theinline is optimized when called
from __switch_to(). The effect of the problem is that the bandwidth
restriction driven by the CLOSId is not enforced. The problem is easy to
reproduce on Intel or AMD x86 systems:
1. If resctrl filesystem is not mounted:
$ mount -t resctrl none /sys/fs/resctrl
2. Create resctrl group:
$ mkdir /sys/fs/resctrl/test
3. move shell into resctrl group
$ echo $$ > /sys/fs/resctrl/test/tasks
4. Run bandwidth consuming test in background on CPU0
$ taskset -c 0 triad &
5. Monitor bandwidth consumption
Using perf to measure bandwidth on your processor
6. Restrict bandwidth
- Intel: $ echo MB:0=10 > /sys/fs/resctrl/test/schemata
- AMD: $ echo MB:0=240 > /sys/fs/resctrl/tests/schemata
7. Monitor bandwidth again
At 7, you will see that the restriction is not enforced.
The problem is located in the __resctrl_sched_in() routine which rewrites
the active closid via the PQR_ASSOC register. Because this is an expensive
operation, the kernel only does it when the context switch involves tasks
with different CLOSID. And to check that, it needs to access the current
task's closid field using current->closid. current is actually a macro
that reads the per-cpu variable pcpu_hot.current_task.
After an investigation by compiler experts, the problem has been tracked down
to the usage of the get_current() macro in the __resctrl_sched_in() code and
in particular the per-cpu macro:
static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(pcpu_hot.current_task);
}
And as per percpu.h:
/*
* this_cpu_read() makes gcc load the percpu variable every time it is
* accessed while this_cpu_read_stable() allows the value to be cached.
* this_cpu_read_stable() is more efficient and can be used if its value
* is guaranteed to be valid across cpus. The current users include
* get_current() and get_thread_info() both of which are actually
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
The _stable version of the macro allows the value to be cached, meaning it
does not force a reload.
However in the __switch_to() routine, the current point is changed. If the
compiler optimizes away the reload, then the resctrl_sched_in will look
at the previous task instead of the new current task. This explains why we
were not seeing the limit enforced on the benchmark.
To fix the problem, the resctrl_sched_in() function must use the
this_cpu_read() form of the macro. Given this is specific to the __switch_to
routine, we do not change get_current() but instead invoke the lower level
macro directly from __resched_sched_in(). This has no impact on any other
calls of get_current().
Note that the problem was detected when compiling the kernel with clang (v14)
but it could also happen with gcc.
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/include/asm/resctrl.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 52788f79786fa..f791606a8cb15 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -54,6 +54,7 @@ static void __resctrl_sched_in(void)
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
u32 closid = state->default_closid;
u32 rmid = state->default_rmid;
+ struct task_struct *cur;
u32 tmp;
/*
@@ -61,13 +62,15 @@ static void __resctrl_sched_in(void)
* Else use the closid/rmid assigned to this cpu.
*/
if (static_branch_likely(&rdt_alloc_enable_key)) {
- tmp = READ_ONCE(current->closid);
+ cur = this_cpu_read(pcpu_hot.current_task);
+ tmp = READ_ONCE(cur->closid);
if (tmp)
closid = tmp;
}
if (static_branch_likely(&rdt_mon_enable_key)) {
- tmp = READ_ONCE(current->rmid);
+ cur = this_cpu_read(pcpu_hot.current_task);
+ tmp = READ_ONCE(cur->rmid);
if (tmp)
rmid = tmp;
}
--
2.40.0.rc0.216.gc4246ad0f0-goog
> However in the __switch_to() routine, the current point is changed. If the
> compiler optimizes away the reload, then the resctrl_sched_in will look
> at the previous task instead of the new current task. This explains why we
> were not seeing the limit enforced on the benchmark.
Are there any other uses of "current" during context switch that might
be affected by this?
-Tony
On Fri, Mar 03, 2023 at 03:11:33PM -0800, Stephane Eranian wrote:
> The problem is located in the __resctrl_sched_in() routine which rewrites
> the active closid via the PQR_ASSOC register. Because this is an expensive
> operation, the kernel only does it when the context switch involves tasks
> with different CLOSID. And to check that, it needs to access the current
> task's closid field using current->closid. current is actually a macro
> that reads the per-cpu variable pcpu_hot.current_task.
>
> After an investigation by compiler experts, the problem has been tracked down
> to the usage of the get_current() macro in the __resctrl_sched_in() code and
> in particular the per-cpu macro:
>
> static __always_inline struct task_struct *get_current(void)
> {
> return this_cpu_read_stable(pcpu_hot.current_task);
> }
>
> And as per percpu.h:
>
> /*
> * this_cpu_read() makes gcc load the percpu variable every time it is
> * accessed while this_cpu_read_stable() allows the value to be cached.
> * this_cpu_read_stable() is more efficient and can be used if its value
> * is guaranteed to be valid across cpus. The current users include
> * get_current() and get_thread_info() both of which are actually
> * per-thread variables implemented as per-cpu variables and thus
> * stable for the duration of the respective task.
> */
>
> The _stable version of the macro allows the value to be cached, meaning it
> does not force a reload.
Right, so afaict the difference between this_cpu_read() and
this_cpu_read_stable() is the volatile qualifier.
this_cpu_read() is asm volatile(), while this_cpu_read_stable() and
raw_cpu_read() are both an unqualified asm().
Now, afaiu we're inlining all of this into __switch_to(), which has
raw_cpu_write(pcpu_hot.current_task, next_p).
And I suppose what the compiler is doing is lifting the 'current' load
over that store, but how is it allowed that? I thought C was supposed to
have PO consistency, That raw_cpu_write() should be seen as a store to
to pcpu_hot.current_task, why can it lift a load over the store?
Specifically, percpu_to_op() has a "+m" output constaint while
percpu_stable_op() has a "p" input constraint on the same address.
Compiler folks help?
Start of Lore thread:
https://lore.kernel.org/lkml/[email protected]/
On Mon, Mar 6, 2023 at 4:01 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Mar 03, 2023 at 03:11:33PM -0800, Stephane Eranian wrote:
>
> > The problem is located in the __resctrl_sched_in() routine which rewrites
> > the active closid via the PQR_ASSOC register. Because this is an expensive
> > operation, the kernel only does it when the context switch involves tasks
> > with different CLOSID. And to check that, it needs to access the current
> > task's closid field using current->closid. current is actually a macro
> > that reads the per-cpu variable pcpu_hot.current_task.
> >
> > After an investigation by compiler experts, the problem has been tracked down
> > to the usage of the get_current() macro in the __resctrl_sched_in() code and
> > in particular the per-cpu macro:
> >
> > static __always_inline struct task_struct *get_current(void)
> > {
> > return this_cpu_read_stable(pcpu_hot.current_task);
> > }
> >
> > And as per percpu.h:
> >
> > /*
> > * this_cpu_read() makes gcc load the percpu variable every time it is
> > * accessed while this_cpu_read_stable() allows the value to be cached.
> > * this_cpu_read_stable() is more efficient and can be used if its value
> > * is guaranteed to be valid across cpus. The current users include
> > * get_current() and get_thread_info() both of which are actually
> > * per-thread variables implemented as per-cpu variables and thus
> > * stable for the duration of the respective task.
> > */
> >
> > The _stable version of the macro allows the value to be cached, meaning it
> > does not force a reload.
>
> Right, so afaict the difference between this_cpu_read() and
> this_cpu_read_stable() is the volatile qualifier.
>
> this_cpu_read() is asm volatile(), while this_cpu_read_stable() and
> raw_cpu_read() are both an unqualified asm().
>
> Now, afaiu we're inlining all of this into __switch_to(), which has
> raw_cpu_write(pcpu_hot.current_task, next_p).
>
> And I suppose what the compiler is doing is lifting the 'current' load
> over that store, but how is it allowed that? I thought C was supposed to
> have PO consistency, That raw_cpu_write() should be seen as a store to
> to pcpu_hot.current_task, why can it lift a load over the store?
>
> Specifically, percpu_to_op() has a "+m" output constaint while
> percpu_stable_op() has a "p" input constraint on the same address.
I definitely think the issue is specific to "p" constraints.
https://godbolt.org/z/34YeG6WbY is the test case I reduced which I
think demonstrates the issue.
https://reviews.llvm.org/D145416
-> click "Show Older Changes" for the ongoing discussion.
I don't have a satisfactory answer yet, but am looking into this.
>
> Compiler folks help?
--
Thanks,
~Nick Desaulniers
On Mon, Mar 06, 2023 at 04:16:52PM -0800, Nick Desaulniers wrote:
> Start of Lore thread:
> https://lore.kernel.org/lkml/[email protected]/
>
> On Mon, Mar 6, 2023 at 4:01 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Fri, Mar 03, 2023 at 03:11:33PM -0800, Stephane Eranian wrote:
> >
> > > The problem is located in the __resctrl_sched_in() routine which rewrites
> > > the active closid via the PQR_ASSOC register. Because this is an expensive
> > > operation, the kernel only does it when the context switch involves tasks
> > > with different CLOSID. And to check that, it needs to access the current
> > > task's closid field using current->closid. current is actually a macro
> > > that reads the per-cpu variable pcpu_hot.current_task.
> > >
> > > After an investigation by compiler experts, the problem has been tracked down
> > > to the usage of the get_current() macro in the __resctrl_sched_in() code and
> > > in particular the per-cpu macro:
> > >
> > > static __always_inline struct task_struct *get_current(void)
> > > {
> > > return this_cpu_read_stable(pcpu_hot.current_task);
> > > }
> > >
> > > And as per percpu.h:
> > >
> > > /*
> > > * this_cpu_read() makes gcc load the percpu variable every time it is
> > > * accessed while this_cpu_read_stable() allows the value to be cached.
> > > * this_cpu_read_stable() is more efficient and can be used if its value
> > > * is guaranteed to be valid across cpus. The current users include
> > > * get_current() and get_thread_info() both of which are actually
> > > * per-thread variables implemented as per-cpu variables and thus
> > > * stable for the duration of the respective task.
> > > */
> > >
> > > The _stable version of the macro allows the value to be cached, meaning it
> > > does not force a reload.
> >
> > Right, so afaict the difference between this_cpu_read() and
> > this_cpu_read_stable() is the volatile qualifier.
> >
> > this_cpu_read() is asm volatile(), while this_cpu_read_stable() and
> > raw_cpu_read() are both an unqualified asm().
> >
> > Now, afaiu we're inlining all of this into __switch_to(), which has
> > raw_cpu_write(pcpu_hot.current_task, next_p).
> >
> > And I suppose what the compiler is doing is lifting the 'current' load
> > over that store, but how is it allowed that? I thought C was supposed to
> > have PO consistency, That raw_cpu_write() should be seen as a store to
> > to pcpu_hot.current_task, why can it lift a load over the store?
> >
> > Specifically, percpu_to_op() has a "+m" output constaint while
> > percpu_stable_op() has a "p" input constraint on the same address.
>
> I definitely think the issue is specific to "p" constraints.
> https://godbolt.org/z/34YeG6WbY is the test case I reduced which I
> think demonstrates the issue.
>
> https://reviews.llvm.org/D145416
> -> click "Show Older Changes" for the ongoing discussion.
So per that summary, I'm going to nit-pick and state we very much want
CSE. CSE good. What we don't want it violating store-load ordering.
> I don't have a satisfactory answer yet, but am looking into this.
Oh, geez, what a twisty tale that... So Linus knew back in '09 that "p"
was icky, but it sorta was the only thing and it 'worked' -- until now
:/
Is there a way to explicitly order these things? barrier() obviously
isn't going to help here.
So ideally we'd get something that respects the whole store-load
ordering but still allows agressive CSE. And works for both toolchains.
Small ask, I know :-)
On Tue, Mar 7, 2023 at 3:36 AM Peter Zijlstra <[email protected]> wrote:
>
> > I don't have a satisfactory answer yet, but am looking into this.
>
> Oh, geez, what a twisty tale that... So Linus knew back in '09 that "p"
> was icky, but it sorta was the only thing and it 'worked' -- until now
> :/
Yeah, so 'p' definitely is about the pointer, and I do worry that it
is only a dependency on exactly that - not the memory behind it.
I have this dim memory about us having talked about this with some gcc
person, and coming to the conclusion that it was all fine, but I
suspect it was in some very specific case where it might have been
fine for other reasons.
> Is there a way to explicitly order these things? barrier() obviously
> isn't going to help here.
So one "asm volatile" should always be ordered wrt another "asm volatile".
I have this other dim memory of it not even being clear whether "asm"
and "asm volatile" are ordered. I don't think they necessarily are
(with the obvious caveat that an asm without any arguments - a
so-called "basic asm" - is always volatile whether the "volatile" is
there or not).
I have a lot of dim memories, in other words. Should that worry me?
And then there's the "memory" clobber, of course.
But both of those are also going to disable CSE.
I do think that percpu_stable_op can use "p", but only when the value
is *truly* stable and there is no question about it being moved around
things that might modify it.
Linus
Hi!
On Tue, Mar 07, 2023 at 12:35:45PM +0100, Peter Zijlstra wrote:
> So per that summary, I'm going to nit-pick and state we very much want
> CSE. CSE good. What we don't want it violating store-load ordering.
So you need to describe exactly what you *do* want. There is no way to
forbid most otherwise valid things. But you can express pretty much all
dependencies.
> Oh, geez, what a twisty tale that... So Linus knew back in '09 that "p"
> was icky, but it sorta was the only thing and it 'worked' -- until now
> :/
The "p" constraint is just like any other address_constraint, in most
aspects. Since this is very specific to "p", that limits what is going
on to really just one thing.
For "p", after reload, strict_memory_address_addr_space_p is used. That
is, targetm.addr_space.legitimate_address_p with strict set to true. On
x86 that limits what registers can be used? So I guess that made things
accidentally work before?
> So ideally we'd get something that respects the whole store-load
> ordering but still allows agressive CSE. And works for both toolchains.
> Small ask, I know :-)
Well, what is the ordering you need respected, *exactly*?
Segher
On Tue, Mar 07, 2023 at 12:43:16PM -0600, Segher Boessenkool wrote:
> On Tue, Mar 07, 2023 at 12:35:45PM +0100, Peter Zijlstra wrote:
> > So per that summary, I'm going to nit-pick and state we very much want
> > CSE. CSE good. What we don't want it violating store-load ordering.
>
> So you need to describe exactly what you *do* want. There is no way to
> forbid most otherwise valid things. But you can express pretty much all
> dependencies.
>
> > Oh, geez, what a twisty tale that... So Linus knew back in '09 that "p"
> > was icky, but it sorta was the only thing and it 'worked' -- until now
> > :/
>
> The "p" constraint is just like any other address_constraint, in most
> aspects. Since this is very specific to "p", that limits what is going
> on to really just one thing.
Are we actually talking here about "p" constraint or about p/P (x86) modifiers
(asm ("%p0" : : "i" (42));)?
Jakub
On Tue, Mar 7, 2023 at 12:43 PM Jakub Jelinek <[email protected]> wrote:
>
> Are we actually talking here about "p" constraint or about p/P (x86) modifiers
> (asm ("%p0" : : "i" (42));)?
In this case it's actually the "p" constraint.
And the "percpu_stable_op()" thing uses it exactly because it thinks
it wants the "I don't care about data dependencies in memory, because
this memory location doesn't change".
Of course, that "this memory location doesn't change" is then always
technically a lie. It's exactly things like "current" that obviously
*does* change in the big picture, but from the context of a particular
_thread_, "current" is a fixed value.
Which then causes problems when you use that "percpu_stable_op()"
thing from within the scheduler (generally without being *aware* of
this issue at all, since the use is hidden a few macro expansions
down).
I think the problem is that the <asm/resctrl.h> code is disgusting and
horrible in multiple ways:
(a) it shouldn't define and declare a static function in a header file
(b) the resctrl_sched_in() inline function is midesigned to begin with
basically, the actual scheduling functions should *never* look at
"current" at all. They are - byu definition - changing it, and they
already *know* what both the "incoming current" (aka "prev_p") and
"new current" (aka "next_p") are.
So any scheduling function that uses "current" is hot garbage.
In this case, that hot garbage is resctrl_sched_in().
I do not believe this is a compiler issue. This is literally just a
kernel bug, and that resctrl code being very very wrong.
Linus
> (a) it shouldn't define and declare a static function in a header file
>
> (b) the resctrl_sched_in() inline function is midesigned to begin with
Fixing "b" would seem to be to just pass "next_p" to the function to
use instead of "current".
Can you expand about part "a" ... Linux has zillions of static inline functions
in header files to handle CONFIG options. One version is the real McCoy
while the other is just a stub for the CONFIG=n case.
What's different about this one?
-Tony
On Tue, Mar 7, 2023 at 12:54 PM Linus Torvalds
<[email protected]> wrote:
>
> I think the problem is that the <asm/resctrl.h> code is disgusting and
> horrible in multiple ways:
>
> (a) it shouldn't define and declare a static function in a header file
>
> (b) the resctrl_sched_in() inline function is misdesigned to begin with
Ok, so here's a *ttoally* untested and mindless patch to maybe fix
what I dislike about that resctl code.
Does it fix the code generation issue? I have no idea. But this is
what I would suggest is the right answer, without actually knowing the
code any better, and just going on a mindless rampage.
It seems to compile for me, fwiw.
Linus
On Tue, Mar 7, 2023 at 1:11 PM Luck, Tony <[email protected]> wrote:
>
> Can you expand about part "a" ... Linux has zillions of static inline functions
> in header files to handle CONFIG options. One version is the real McCoy
> while the other is just a stub for the CONFIG=n case.
>
> What's different about this one?
See the patch I just sent out.
Linux has a lot of "static inline" functions. But that's not at all
what that function was. It was literally just
static void __resctrl_sched_in(..)
which is disgusting and very wrong.
I hope that compilers then just ignored it ("It's static and not used,
so I'm not generating that code"), and that header file isn't included
in very many places, but it's still very wrong.
Linus
On Tue, Mar 7, 2023 at 1:11 PM Luck, Tony <[email protected]> wrote:
>
> > (a) it shouldn't define and declare a static function in a header file
> >
> > (b) the resctrl_sched_in() inline function is midesigned to begin with
>
> Fixing "b" would seem to be to just pass "next_p" to the function to
> use instead of "current".
```
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 52788f79786f..f46c0b97334d 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -49,7 +49,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
* simple as possible.
* Must be called with preemption disabled.
*/
-static void __resctrl_sched_in(void)
+static void __resctrl_sched_in(struct task_struct *next)
{
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
u32 closid = state->default_closid;
@@ -61,13 +61,13 @@ static void __resctrl_sched_in(void)
* Else use the closid/rmid assigned to this cpu.
*/
if (static_branch_likely(&rdt_alloc_enable_key)) {
- tmp = READ_ONCE(current->closid);
+ tmp = READ_ONCE(next->closid);
if (tmp)
closid = tmp;
}
if (static_branch_likely(&rdt_mon_enable_key)) {
- tmp = READ_ONCE(current->rmid);
+ tmp = READ_ONCE(next->rmid);
if (tmp)
rmid = tmp;
}
@@ -88,17 +88,17 @@ static inline unsigned int
resctrl_arch_round_mon_val(unsigned int val)
return val * scale;
}
-static inline void resctrl_sched_in(void)
+static inline void resctrl_sched_in(struct task_struct *next)
{
if (static_branch_likely(&rdt_enable_key))
- __resctrl_sched_in();
+ __resctrl_sched_in(next);
}
void resctrl_cpu_detect(struct cpuinfo_x86 *c);
#else
-static inline void resctrl_sched_in(void) {}
+static inline void resctrl_sched_in(struct task_struct *next) {}
static inline void resctrl_cpu_detect(struct cpuinfo_x86 *c) {}
#endif /* CONFIG_X86_CPU_RESCTRL */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e2c1599d1b37..d970347838a4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -314,7 +314,7 @@ static void update_cpu_closid_rmid(void *info)
* executing task might have its own closid selected. Just reuse
* the context switch code.
*/
- resctrl_sched_in();
+ resctrl_sched_in(current);
}
/*
@@ -530,7 +530,7 @@ static void _update_task_closid_rmid(void *task)
* Otherwise, the MSR is updated when the task is scheduled in.
*/
if (task == current)
- resctrl_sched_in();
+ resctrl_sched_in(current);
}
static void update_task_closid_rmid(struct task_struct *t)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 470c128759ea..708c87b88cc1 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -212,7 +212,7 @@ __switch_to(struct task_struct *prev_p, struct
task_struct *next_p)
switch_fpu_finish();
/* Load the Intel cache allocation PQR MSR. */
- resctrl_sched_in();
+ resctrl_sched_in(next_p);
return prev_p;
}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4e34b3b68ebd..bb65a68b4b49 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -656,7 +656,7 @@ __switch_to(struct task_struct *prev_p, struct
task_struct *next_p)
}
/* Load the Intel cache allocation PQR MSR. */
- resctrl_sched_in();
+ resctrl_sched_in(next_p);
return prev_p;
}
```
?
>
> Can you expand about part "a" ... Linux has zillions of static inline functions
> in header files to handle CONFIG options. One version is the real McCoy
> while the other is just a stub for the CONFIG=n case.
Right, I had the same question.
Perhaps it's more so that no one calls __resctrl_sched_in, only
resctrl_sched_in, therefor they should be folded into one function?
>
> What's different about this one?
>
> -Tony
--
Thanks,
~Nick Desaulniers
On Tue, Mar 7, 2023 at 1:16 PM Nick Desaulniers <[email protected]> wrote:
>
> >
> > Can you expand about part "a" ... Linux has zillions of static inline functions
> > in header files to handle CONFIG options. One version is the real McCoy
> > while the other is just a stub for the CONFIG=n case.
>
> Right, I had the same question.
>
> Perhaps it's more so that no one calls __resctrl_sched_in, only
> resctrl_sched_in, therefor they should be folded into one function?
If you think it should be inlined, it should be marked as such.
And if you *don't* think it should be inlined - quite possibly the
right answer here - the definition of that function damn well
shouldn't be in a header file.
So either way, that __resctrl_sched_in() was wrong, wrong, wrong.
Linus
On Tue, Mar 7, 2023 at 1:19 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Mar 7, 2023 at 1:16 PM Nick Desaulniers <[email protected]> wrote:
> >
> > >
> > > Can you expand about part "a" ... Linux has zillions of static inline functions
> > > in header files to handle CONFIG options. One version is the real McCoy
> > > while the other is just a stub for the CONFIG=n case.
> >
> > Right, I had the same question.
> >
> > Perhaps it's more so that no one calls __resctrl_sched_in, only
> > resctrl_sched_in, therefor they should be folded into one function?
>
> If you think it should be inlined, it should be marked as such.
Yep, sorry, I missed that `inline` was missing from that definition!
>
> And if you *don't* think it should be inlined - quite possibly the
> right answer here - the definition of that function damn well
> shouldn't be in a header file.
>
> So either way, that __resctrl_sched_in() was wrong, wrong, wrong.
>
> Linus
>
--
Thanks,
~Nick Desaulniers
> Linux has a lot of "static inline" functions. But that's not at all
> what that function was. It was literally just
>
> static void __resctrl_sched_in(..)
>
> which is disgusting and very wrong.
Ah. I was looking at:
static inline void resctrl_sched_in(void)
{
if (static_branch_likely(&rdt_enable_key))
__resctrl_sched_in();
}
which does have the "inline" attribute.
I wonder if checkpatch could catch missing "inline" on static
function definitions in header files?
-Tony
> Ok, so here's a *ttoally* untested and mindless patch to maybe fix
> what I dislike about that resctl code.
>
> Does it fix the code generation issue? I have no idea. But this is
> what I would suggest is the right answer, without actually knowing the
> code any better, and just going on a mindless rampage.
>
> It seems to compile for me, fwiw.
Beyond compiling it boots and passes the tools/testing/selftests/resctrl test suite.
Tested-by: Tony Luck <[email protected]>
-Tony
On Tue, Mar 7, 2023 at 1:35 PM Luck, Tony <[email protected]> wrote:
>
> > Ok, so here's a *ttoally* untested and mindless patch to maybe fix
> > what I dislike about that resctl code.
> >
> > Does it fix the code generation issue? I have no idea. But this is
> > what I would suggest is the right answer, without actually knowing the
> > code any better, and just going on a mindless rampage.
> >
> > It seems to compile for me, fwiw.
>
> Beyond compiling it boots and passes the tools/testing/selftests/resctrl test suite.
>
> Tested-by: Tony Luck <[email protected]>
LGTM; reloading of current becomes irrelevant now that we're reusing
the existing variable `next_p`.
Reviewed-by: Nick Desaulniers <[email protected]>
Might be nice to tag this for stable.
Cc: <[email protected]>
And credit Stephane who did a nice job tracking this down and having a
concise reproducer.
Reported-by: Stephane Eranian <[email protected]>
Perhaps relevant links
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
Consider reusing parts of Stephane's message from the initial Link above?
Thanks for the patch.
---
While this issue was specific to the usage of `current` (implemented
as a macro that uses `this_cpu_read_stable`, I wonder if we might hit
issues again in the future (at least on x86 using the "p" constraint)
in code that:
1. uses this_cpu_read_stable to access a per cpu variable
2. updates that per cpu variable
3. uses this_cpu_read_stable to access the variable hoping to get the
new value rather than the old.
I guess that this_cpu_read should be used rather than
this_cpu_read_stable? Maybe we can beef up the comment in
arch/x86/include/asm/percpu.h to warn about this? The sentence about
get_thread_info() being a user of this_cpu_read_stable() seems
outdated/due for a refresh?
Is __switch_to the only place that should be updating current? Are
there other variables other than current that might be afflicted by
that 1,2,3 pattern I mention above?
current_top_of_stack() uses this_cpu_read_stable() for instance.
Perhaps there could be a caller that measures the stack depth, grows
the stack, then rereads the wrong value?
--
Thanks,
~Nick Desaulniers
> I wonder if checkpatch could catch missing "inline" on static
> function definitions in header files?
Some casual use of grep shows that resctrl isn't the only offender.
There are many non-inline static functions in header files.
A few hundred scattered across core kernel code, drivers and most
architectures. E.g. a dozen in arch/x86/include/asm/floppy.h.
static irqreturn_t floppy_hardint(int irq, void *dev_id)
static void fd_disable_dma(void)
static int vdma_request_dma(unsigned int dmanr, const char *device_id)
static void vdma_nop(unsigned int dummy)
static int vdma_get_dma_residue(unsigned int dummy)
static int fd_request_irq(void)
static unsigned long dma_mem_alloc(unsigned long size)
static unsigned long vdma_mem_alloc(unsigned long size)
static void _fd_dma_mem_free(unsigned long addr, unsigned long size)
static void _fd_chose_dma_mode(char *addr, unsigned long size)
static int vdma_dma_setup(char *addr, unsigned long size, int mode, int io)
static int hard_dma_setup(char *addr, unsigned long size, int mode, int io)
-Tony
On Tue, Mar 7, 2023 at 1:06 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Mar 7, 2023 at 12:54 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I think the problem is that the <asm/resctrl.h> code is disgusting and
> > horrible in multiple ways:
> >
> > (a) it shouldn't define and declare a static function in a header file
> >
> > (b) the resctrl_sched_in() inline function is misdesigned to begin with
>
> Ok, so here's a *ttoally* untested and mindless patch to maybe fix
> what I dislike about that resctl code.
>
> Does it fix the code generation issue? I have no idea. But this is
> what I would suggest is the right answer, without actually knowing the
> code any better, and just going on a mindless rampage.
>
> It seems to compile for me, fwiw.
>
On Tue, Mar 7, 2023 at 3:01 PM Nick Desaulniers <[email protected]> wrote:
>
> On Tue, Mar 7, 2023 at 2:56 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Tue, Mar 7, 2023 at 2:03 PM Nick Desaulniers <[email protected]> wrote:
> > >
> > > Sounds like Stephane is going to re-run the internal tests he used to
> > > discover the issue with your diff applied, if you don't mind holding
> > > out for another Tested-by tag. EOM
> >
> > Ack. I am in no hurry.
> >
> > In fact, I'd prefer to just get the patch sent back to me with a
> > commit message too, if somebody has the energy. I don't need the
> > credit for a trivial thing like that.
>
> Sure, then maybe Stephane you can supply a v2 with updated commit message and a
>
> Suggested-by: Linus Torvalds <[email protected]>
>
I verified Linus' patch on my test case on AMD Zen3 and it works as
expected, i.e., the limit is enforced. I had tried a similar approach myself
as well and it worked.
I think passing the task pointer is the proper approach because we are
in a sched_in routine
and I would expect the task scheduled in to be passed as argument
instead of having the function
retrieve it from the current pointer.
Thanks.
Tested-by: Stephane Eranian <[email protected]>
> Linus
On 3/7/23 15:06, Linus Torvalds wrote:
> On Tue, Mar 7, 2023 at 12:54 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> I think the problem is that the <asm/resctrl.h> code is disgusting and
>> horrible in multiple ways:
>>
>> (a) it shouldn't define and declare a static function in a header file
>>
>> (b) the resctrl_sched_in() inline function is misdesigned to begin with
>
> Ok, so here's a *ttoally* untested and mindless patch to maybe fix
> what I dislike about that resctl code.
>
> Does it fix the code generation issue? I have no idea. But this is
> what I would suggest is the right answer, without actually knowing the
> code any better, and just going on a mindless rampage.
>
> It seems to compile for me, fwiw.
>
> Linus
Tested both on GCC and CLANG. Test passes and resctrl limits are working
fine. Thanks for the patch.
Tested-by: Babu Moger <[email protected]>
--
Thanks
Babu Moger
On Tue, Mar 7, 2023 at 10:13 PM Stephane Eranian <[email protected]> wrote:
>
> Tested-by: Stephane Eranian <[email protected]>
It's out as commit 7fef09970252 ("x86/resctl: fix scheduler confusion
with 'current'") if anybody cares.
Linus