The cros_ec_spi driver is realtime priority so that it doesn't get
preempted by other taks while it's talking to the EC but overall it
really doesn't need lots of compute power. Unfortunately, by default,
the kernel assumes that all realtime tasks should cause the cpufreq to
jump to max and burn through power to get things done as quickly as
possible. That's just not the correct behavior for cros_ec_spi.
Switch to manually overriding the default.
This won't help us if our work moves over to the SPI pump thread but
that's not the common code path.
Signed-off-by: Douglas Anderson <[email protected]>
---
NOTE: This would cause a conflict if the patch
https://lore.kernel.org/r/[email protected] lands
first
drivers/platform/chrome/cros_ec_spi.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
index debea5c4c829..76d59d5e7efd 100644
--- a/drivers/platform/chrome/cros_ec_spi.c
+++ b/drivers/platform/chrome/cros_ec_spi.c
@@ -709,8 +709,11 @@ static void cros_ec_spi_high_pri_release(void *worker)
static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
struct cros_ec_spi *ec_spi)
{
- struct sched_param sched_priority = {
- .sched_priority = MAX_RT_PRIO / 2,
+ struct sched_attr sched_attr = {
+ .sched_policy = SCHED_FIFO,
+ .sched_priority = MAX_RT_PRIO / 2,
+ .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
+ .sched_util_min = 0,
};
int err;
@@ -728,8 +731,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
if (err)
return err;
- err = sched_setscheduler_nocheck(ec_spi->high_pri_worker->task,
- SCHED_FIFO, &sched_priority);
+ err = sched_setattr_nocheck(ec_spi->high_pri_worker->task, &sched_attr);
if (err)
dev_err(dev, "Can't set cros_ec high pri priority: %d\n", err);
return err;
--
2.27.0.290.gba653c62da-goog
Hi Doug,
On Wednesday 10 Jun 2020 at 15:18:43 (-0700), Douglas Anderson wrote:
> The cros_ec_spi driver is realtime priority so that it doesn't get
> preempted by other taks while it's talking to the EC but overall it
> really doesn't need lots of compute power. Unfortunately, by default,
> the kernel assumes that all realtime tasks should cause the cpufreq to
> jump to max and burn through power to get things done as quickly as
> possible. That's just not the correct behavior for cros_ec_spi.
Is this specific to this driver, or something you would want applied
more globally to all RT tasks in ChromeOS (which is what we'd like to
have in Android for instance)?
IOW, how do you feel about [email protected] ?
Otherwise, the patch looks good to me.
Thanks,
Quentin
Hi,
On Thu, Jun 11, 2020 at 4:03 AM Quentin Perret <[email protected]> wrote:
>
> Hi Doug,
>
> On Wednesday 10 Jun 2020 at 15:18:43 (-0700), Douglas Anderson wrote:
> > The cros_ec_spi driver is realtime priority so that it doesn't get
> > preempted by other taks while it's talking to the EC but overall it
> > really doesn't need lots of compute power. Unfortunately, by default,
> > the kernel assumes that all realtime tasks should cause the cpufreq to
> > jump to max and burn through power to get things done as quickly as
> > possible. That's just not the correct behavior for cros_ec_spi.
>
> Is this specific to this driver, or something you would want applied
> more globally to all RT tasks in ChromeOS (which is what we'd like to
> have in Android for instance)?
Hrm. I guess my first instinct is to say that we still want this
patch even if we have something that is applied more globally.
Specifically it sounds as if the patch you point at is suggesting that
we'd tweak the boost value to something other than max but we'd still
have a boost value. In the case of cros_ec_spi I don't believe we
need any boost value at all, so my patch would still be useful. The
computational needs of cros_ec_spi are very modest and it can do its
work at lower CPU frequencies just fine. It just can't be interrupted
for large swaths of time.
> IOW, how do you feel about [email protected] ?
I'm not totally a fan, but I'm definitely not an expert in this area
(I've also only read the patch description and not the patch or the
whole thread). I really don't want yet another value that I need to
tune from board to board. Even worse, this tuning value isn't
board-specific but a combination of board and software specific. By
this, I'd imagine a scenario where you're using a real-time task to
get audio decoding done within a certain latency. I guess you'd tune
this value to make sure that you can get all your audio decoding done
in time but also not burn extra power. Now, imagine that the OS
upgrades and the audio task suddenly has to decode more complex
streams. You've got to go across all of your boards and re-tune every
one? ...or, nobody thinks about it and older boards start getting
stuttery audio? Perhaps the opposite happens and someone comes up
with a newer lower-cpu-intensive codec and you could save power.
Sounds like a bit of a nightmare.
I'd rather have a boolean value: boost all RT threads to max vs. don't
boost all RT threads to max. Someone that just wanted RT stuff to run
as fast as possible without any hassle on their system and didn't care
about power efficiency could turn this on. Anyone who really cared
about power could turn this off and then could find a more targeted
way to boost things, hopefully in a way that doesn't require tuning.
One option would be to still boost the CPU to max but only for certain
tasks known to be really latency sensitive. Another might be to
somehow measure whether or not the task is making its deadlines and
boost the CPU frequency up if deadlines are not being met. I'm sure
there are fancier ways.
...of course, I believe your patch allows me to do what I want: I can
just set the default boost to 0. It just leaves in the temptation for
others to require a default boost of something else and then I'm stuck
in my tuning nightmare.
-Doug
-Doug
+CC Qais [FYI]
On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> Hrm. I guess my first instinct is to say that we still want this
> patch even if we have something that is applied more globally.
Fair enough.
> Specifically it sounds as if the patch you point at is suggesting that
> we'd tweak the boost value to something other than max but we'd still
> have a boost value. In the case of cros_ec_spi I don't believe we
> need any boost value at all, so my patch would still be useful. The
> computational needs of cros_ec_spi are very modest and it can do its
> work at lower CPU frequencies just fine. It just can't be interrupted
> for large swaths of time.
>
>
> > IOW, how do you feel about [email protected] ?
>
> I'm not totally a fan, but I'm definitely not an expert in this area
> (I've also only read the patch description and not the patch or the
> whole thread). I really don't want yet another value that I need to
> tune from board to board. Even worse, this tuning value isn't
> board-specific but a combination of board and software specific. By
> this, I'd imagine a scenario where you're using a real-time task to
> get audio decoding done within a certain latency. I guess you'd tune
> this value to make sure that you can get all your audio decoding done
> in time but also not burn extra power. Now, imagine that the OS
> upgrades and the audio task suddenly has to decode more complex
> streams. You've got to go across all of your boards and re-tune every
> one? ...or, nobody thinks about it and older boards start getting
> stuttery audio? Perhaps the opposite happens and someone comes up
> with a newer lower-cpu-intensive codec and you could save power.
> Sounds like a bit of a nightmare.
>
> I'd rather have a boolean value: boost all RT threads to max vs. don't
> boost all RT threads to max. Someone that just wanted RT stuff to run
> as fast as possible without any hassle on their system and didn't care
> about power efficiency could turn this on. Anyone who really cared
> about power could turn this off and then could find a more targeted
> way to boost things, hopefully in a way that doesn't require tuning.
> One option would be to still boost the CPU to max but only for certain
> tasks known to be really latency sensitive. Another might be to
> somehow measure whether or not the task is making its deadlines and
> boost the CPU frequency up if deadlines are not being met. I'm sure
> there are fancier ways.
>
> ...of course, I believe your patch allows me to do what I want: I can
> just set the default boost to 0. It just leaves in the temptation for
> others to require a default boost of something else and then I'm stuck
> in my tuning nightmare.
Right, so I am not disagreeing at all with the above. FWIW, I expect
Android to set this default value to 0 as you mentioned, so that uclamp
basically becomes 'opt-in' for _all_ tasks, including RT.
Now, given that Qais' patch is commiting something to userspace, I think
it makes sense to expose the full range rather than a boolean value, as
it's probably more future-proof. That is, if we expose a boolean knob
now, and somebody wants to be able to set a default value in the middle
in a few years, we'll have to add another knob, and maintain both (which
sucks). But it's just my personal opinion. Feel free to jump in the
other thread if you feel differently :)
Cheers,
Quentin
On 06/12/20 10:24, Quentin Perret wrote:
> +CC Qais [FYI]
Thanks for the CC.
>
> On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> > Hrm. I guess my first instinct is to say that we still want this
> > patch even if we have something that is applied more globally.
>
> Fair enough.
>
> > Specifically it sounds as if the patch you point at is suggesting that
> > we'd tweak the boost value to something other than max but we'd still
> > have a boost value. In the case of cros_ec_spi I don't believe we
> > need any boost value at all, so my patch would still be useful. The
> > computational needs of cros_ec_spi are very modest and it can do its
> > work at lower CPU frequencies just fine. It just can't be interrupted
> > for large swaths of time.
> >
> >
> > > IOW, how do you feel about [email protected] ?
> >
> > I'm not totally a fan, but I'm definitely not an expert in this area
> > (I've also only read the patch description and not the patch or the
> > whole thread). I really don't want yet another value that I need to
> > tune from board to board. Even worse, this tuning value isn't
> > board-specific but a combination of board and software specific. By
> > this, I'd imagine a scenario where you're using a real-time task to
> > get audio decoding done within a certain latency. I guess you'd tune
> > this value to make sure that you can get all your audio decoding done
> > in time but also not burn extra power. Now, imagine that the OS
> > upgrades and the audio task suddenly has to decode more complex
> > streams. You've got to go across all of your boards and re-tune every
> > one? ...or, nobody thinks about it and older boards start getting
> > stuttery audio? Perhaps the opposite happens and someone comes up
> > with a newer lower-cpu-intensive codec and you could save power.
> > Sounds like a bit of a nightmare.
Generally I would expect this global tunable to be part of a vendor's SoC BSP.
People tend to think of the flagship SoCs which are powerful, but if you
consider the low and medium end devices there's a massive spectrum over there
that this range is trying to cover.
I would expect older boards init script to be separate for newer boards init
script. The OS by default boosts all RT tasks unless a platform specific script
overrides that. So I can't see how an OS upgrade would affect older boards.
This knob still allows you to disable the max boosting and use the per-task
uclamp interface to boost only those tasks you care about. AFAIK this is
already done in a hacky way in android devices via special vendors provisions.
> >
> > I'd rather have a boolean value: boost all RT threads to max vs. don't
> > boost all RT threads to max. Someone that just wanted RT stuff to run
If that's what your use case requires, you can certainly treat it like
a boolean if you want.
> > as fast as possible without any hassle on their system and didn't care
> > about power efficiency could turn this on. Anyone who really cared
> > about power could turn this off and then could find a more targeted
> > way to boost things, hopefully in a way that doesn't require tuning.
> > One option would be to still boost the CPU to max but only for certain
> > tasks known to be really latency sensitive. Another might be to
per-task uclamp interface allows you to do that. But SoC vendors/system
integrators need to decide that. I'm saying this with Android in mind
specifically. Linux based laptops that are tuned in similar way are rare. But
hopefully this will change at some point :)
> > somehow measure whether or not the task is making its deadlines and
> > boost the CPU frequency up if deadlines are not being met. I'm sure
> > there are fancier ways.
You need to use SCHED_DEADLINE then :)
Making sure that RT tasks meet their deadlines is a userspace tuning problem.
There's no way the kernel can know if an RT task is meeting its deadline.
> >
> > ...of course, I believe your patch allows me to do what I want: I can
> > just set the default boost to 0. It just leaves in the temptation for
> > others to require a default boost of something else and then I'm stuck
> > in my tuning nightmare.
To cover all possible systems Linux can run on, this variability is required.
On general purpose systems with decent SoCs, I'd expect them to just want to
set this to 0 and selectively boost specific RT tasks.
>
> Right, so I am not disagreeing at all with the above. FWIW, I expect
> Android to set this default value to 0 as you mentioned, so that uclamp
> basically becomes 'opt-in' for _all_ tasks, including RT.
>
> Now, given that Qais' patch is commiting something to userspace, I think
> it makes sense to expose the full range rather than a boolean value, as
> it's probably more future-proof. That is, if we expose a boolean knob
> now, and somebody wants to be able to set a default value in the middle
> in a few years, we'll have to add another knob, and maintain both (which
> sucks). But it's just my personal opinion. Feel free to jump in the
> other thread if you feel differently :)
I guess you're talking about Android/Chromium specific knobs on top of the
sysctl one here. Not sure why you need that since init scripts should be able
to modify this directly since they run as root.
Cheers
--
Qais Yousef
On 06/10/20 15:18, Douglas Anderson wrote:
> The cros_ec_spi driver is realtime priority so that it doesn't get
> preempted by other taks while it's talking to the EC but overall it
> really doesn't need lots of compute power. Unfortunately, by default,
> the kernel assumes that all realtime tasks should cause the cpufreq to
> jump to max and burn through power to get things done as quickly as
> possible. That's just not the correct behavior for cros_ec_spi.
>
> Switch to manually overriding the default.
>
> This won't help us if our work moves over to the SPI pump thread but
> that's not the common code path.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> NOTE: This would cause a conflict if the patch
> https://lore.kernel.org/r/[email protected] lands
> first
>
> drivers/platform/chrome/cros_ec_spi.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
> index debea5c4c829..76d59d5e7efd 100644
> --- a/drivers/platform/chrome/cros_ec_spi.c
> +++ b/drivers/platform/chrome/cros_ec_spi.c
> @@ -709,8 +709,11 @@ static void cros_ec_spi_high_pri_release(void *worker)
> static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> struct cros_ec_spi *ec_spi)
> {
> - struct sched_param sched_priority = {
> - .sched_priority = MAX_RT_PRIO / 2,
> + struct sched_attr sched_attr = {
> + .sched_policy = SCHED_FIFO,
> + .sched_priority = MAX_RT_PRIO / 2,
> + .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
> + .sched_util_min = 0,
Hmm I thought Peter already removed all users that change RT priority directly.
https://lore.kernel.org/lkml/[email protected]/
> };
> int err;
>
> @@ -728,8 +731,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> if (err)
> return err;
>
> - err = sched_setscheduler_nocheck(ec_spi->high_pri_worker->task,
> - SCHED_FIFO, &sched_priority);
> + err = sched_setattr_nocheck(ec_spi->high_pri_worker->task, &sched_attr);
> if (err)
> dev_err(dev, "Can't set cros_ec high pri priority: %d\n", err);
> return err;
If I read the code correctly, if you fail here cros_ec_spi_probe() will fail
too and the whole thing will not be loaded. If you compile without uclamp then
sched_setattr_nocheck() will always fail with -EOPNOTSUPP.
Why can't you manage the priority and boost value of such tasks via your init
scripts instead?
I have to admit I need to think about whether it makes sense to have a generic
API that allows drivers to opt-out of the default boosting behavior for their
RT tasks.
Thanks
--
Qais Yousef
> --
> 2.27.0.290.gba653c62da-goog
>
On Friday 12 Jun 2020 at 13:34:48 (+0100), Qais Yousef wrote:
> > On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> > > I'm not totally a fan, but I'm definitely not an expert in this area
> > > (I've also only read the patch description and not the patch or the
> > > whole thread). I really don't want yet another value that I need to
> > > tune from board to board. Even worse, this tuning value isn't
> > > board-specific but a combination of board and software specific. By
> > > this, I'd imagine a scenario where you're using a real-time task to
> > > get audio decoding done within a certain latency. I guess you'd tune
> > > this value to make sure that you can get all your audio decoding done
> > > in time but also not burn extra power. Now, imagine that the OS
> > > upgrades and the audio task suddenly has to decode more complex
> > > streams. You've got to go across all of your boards and re-tune every
> > > one? ...or, nobody thinks about it and older boards start getting
> > > stuttery audio? Perhaps the opposite happens and someone comes up
> > > with a newer lower-cpu-intensive codec and you could save power.
> > > Sounds like a bit of a nightmare.
>
> Generally I would expect this global tunable to be part of a vendor's SoC BSP.
>
> People tend to think of the flagship SoCs which are powerful, but if you
> consider the low and medium end devices there's a massive spectrum over there
> that this range is trying to cover.
>
> I would expect older boards init script to be separate for newer boards init
> script. The OS by default boosts all RT tasks unless a platform specific script
> overrides that. So I can't see how an OS upgrade would affect older boards.
I think Doug meant that the device-specific values need re-tuning in
case of major OS updates, which is indeed a pain. But yeah, I'm not sure
if we have a better solution than that, though.
> This knob still allows you to disable the max boosting and use the per-task
> uclamp interface to boost only those tasks you care about. AFAIK this is
> already done in a hacky way in android devices via special vendors provisions.
>
> > >
> > > I'd rather have a boolean value: boost all RT threads to max vs. don't
> > > boost all RT threads to max. Someone that just wanted RT stuff to run
>
> If that's what your use case requires, you can certainly treat it like
> a boolean if you want.
+1
> > > as fast as possible without any hassle on their system and didn't care
> > > about power efficiency could turn this on. Anyone who really cared
> > > about power could turn this off and then could find a more targeted
> > > way to boost things, hopefully in a way that doesn't require tuning.
> > > One option would be to still boost the CPU to max but only for certain
> > > tasks known to be really latency sensitive. Another might be to
>
> per-task uclamp interface allows you to do that. But SoC vendors/system
> integrators need to decide that. I'm saying this with Android in mind
> specifically. Linux based laptops that are tuned in similar way are rare. But
> hopefully this will change at some point :)
>
> > > somehow measure whether or not the task is making its deadlines and
> > > boost the CPU frequency up if deadlines are not being met. I'm sure
> > > there are fancier ways.
>
> You need to use SCHED_DEADLINE then :)
Well, not quite :-)
The frequency selection for DL is purely based on the userspace-provided
parameters, from which we derive the bandwidth request. But we don't do
things like 'raise the frequency if the actual runtime gets close to the
WCET', or anything of the sort. All of that would have to be implemented
in userspace ATM.
Cheers,
Quentin
On 06/12/20 14:47, Quentin Perret wrote:
[...]
> > > > somehow measure whether or not the task is making its deadlines and
> > > > boost the CPU frequency up if deadlines are not being met. I'm sure
> > > > there are fancier ways.
> >
> > You need to use SCHED_DEADLINE then :)
>
> Well, not quite :-)
>
> The frequency selection for DL is purely based on the userspace-provided
> parameters, from which we derive the bandwidth request. But we don't do
> things like 'raise the frequency if the actual runtime gets close to the
> WCET', or anything of the sort. All of that would have to be implemented
> in userspace ATM.
Hmm okay. I am not a deadline expert so sorry if I was misleading. But it's the
only scheduling class that has the concept of deadline.
Cheers
--
Qais Yousef
Hi,
On Fri, Jun 12, 2020 at 5:52 AM Qais Yousef <[email protected]> wrote:
>
> On 06/10/20 15:18, Douglas Anderson wrote:
> > The cros_ec_spi driver is realtime priority so that it doesn't get
> > preempted by other taks while it's talking to the EC but overall it
> > really doesn't need lots of compute power. Unfortunately, by default,
> > the kernel assumes that all realtime tasks should cause the cpufreq to
> > jump to max and burn through power to get things done as quickly as
> > possible. That's just not the correct behavior for cros_ec_spi.
> >
> > Switch to manually overriding the default.
> >
> > This won't help us if our work moves over to the SPI pump thread but
> > that's not the common code path.
> >
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> > NOTE: This would cause a conflict if the patch
> > https://lore.kernel.org/r/[email protected] lands
> > first
> >
> > drivers/platform/chrome/cros_ec_spi.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
> > index debea5c4c829..76d59d5e7efd 100644
> > --- a/drivers/platform/chrome/cros_ec_spi.c
> > +++ b/drivers/platform/chrome/cros_ec_spi.c
> > @@ -709,8 +709,11 @@ static void cros_ec_spi_high_pri_release(void *worker)
> > static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > struct cros_ec_spi *ec_spi)
> > {
> > - struct sched_param sched_priority = {
> > - .sched_priority = MAX_RT_PRIO / 2,
> > + struct sched_attr sched_attr = {
> > + .sched_policy = SCHED_FIFO,
> > + .sched_priority = MAX_RT_PRIO / 2,
> > + .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
> > + .sched_util_min = 0,
>
> Hmm I thought Peter already removed all users that change RT priority directly.
>
> https://lore.kernel.org/lkml/[email protected]/
Yeah, I mentioned that patch series "after the cut" in my patch and
also made sure to CC Peter on my patch. I'm not sure what happened to
that series since I don't see it in linuxnext...
> > };
> > int err;
> >
> > @@ -728,8 +731,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > if (err)
> > return err;
> >
> > - err = sched_setscheduler_nocheck(ec_spi->high_pri_worker->task,
> > - SCHED_FIFO, &sched_priority);
> > + err = sched_setattr_nocheck(ec_spi->high_pri_worker->task, &sched_attr);
> > if (err)
> > dev_err(dev, "Can't set cros_ec high pri priority: %d\n", err);
> > return err;
>
> If I read the code correctly, if you fail here cros_ec_spi_probe() will fail
> too and the whole thing will not be loaded. If you compile without uclamp then
> sched_setattr_nocheck() will always fail with -EOPNOTSUPP.
Oops, definitely need to send out a v2 for that if nothing else. Is
there any good way for me to code this up or do I need a big #ifdef
somewhere in my code?
> Why can't you manage the priority and boost value of such tasks via your init
> scripts instead?
I guess I feel like it's weird in this case. Userspace isn't creating
this task and isn't the one marking it as realtime. ...and, if we
ever manage to upgrade the protocol which we use to talk to the EC we
might fully get rid of this task the need to have something boosted up
to realtime.
Said another way: the fact that we even have this task at all and the
fact that it's realtime is an implementation detail of the kernel. It
seems really weird to add initscripts for it.
> I have to admit I need to think about whether it makes sense to have a generic
> API that allows drivers to opt-out of the default boosting behavior for their
> RT tasks.
Seems like it would be useful.
-Doug
Hi,
On Fri, Jun 12, 2020 at 5:34 AM Qais Yousef <[email protected]> wrote:
>
> On 06/12/20 10:24, Quentin Perret wrote:
> > +CC Qais [FYI]
>
> Thanks for the CC.
>
> >
> > On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> > > Hrm. I guess my first instinct is to say that we still want this
> > > patch even if we have something that is applied more globally.
> >
> > Fair enough.
> >
> > > Specifically it sounds as if the patch you point at is suggesting that
> > > we'd tweak the boost value to something other than max but we'd still
> > > have a boost value. In the case of cros_ec_spi I don't believe we
> > > need any boost value at all, so my patch would still be useful. The
> > > computational needs of cros_ec_spi are very modest and it can do its
> > > work at lower CPU frequencies just fine. It just can't be interrupted
> > > for large swaths of time.
> > >
> > >
> > > > IOW, how do you feel about [email protected] ?
> > >
> > > I'm not totally a fan, but I'm definitely not an expert in this area
> > > (I've also only read the patch description and not the patch or the
> > > whole thread). I really don't want yet another value that I need to
> > > tune from board to board. Even worse, this tuning value isn't
> > > board-specific but a combination of board and software specific. By
> > > this, I'd imagine a scenario where you're using a real-time task to
> > > get audio decoding done within a certain latency. I guess you'd tune
> > > this value to make sure that you can get all your audio decoding done
> > > in time but also not burn extra power. Now, imagine that the OS
> > > upgrades and the audio task suddenly has to decode more complex
> > > streams. You've got to go across all of your boards and re-tune every
> > > one? ...or, nobody thinks about it and older boards start getting
> > > stuttery audio? Perhaps the opposite happens and someone comes up
> > > with a newer lower-cpu-intensive codec and you could save power.
> > > Sounds like a bit of a nightmare.
>
> Generally I would expect this global tunable to be part of a vendor's SoC BSP.
I think I'm coming at this from a very different world than the one
you're thinking of. The concept of a BSP makes me think of a SoC
vendor providing a drop of Linux with all their own weird hacks in it
that only ever works on that one SoC and will never, ever, ever be
updated. 5 years down the road if a software update is needed
(security fix?) some poor soul will be in charge of tracking down the
exact ancient code base that was used to build the original kernel,
making the tweak, and building a new kernel. ;-)
In the world of Chrome OS we try very very hard to build everything
from a clean code base and are trying hard to stay even closer to
upstream and away from per-device weirdness...
> People tend to think of the flagship SoCs which are powerful, but if you
> consider the low and medium end devices there's a massive spectrum over there
> that this range is trying to cover.
Yeah, perhaps you're right. When thinking about a laptop it's almost
always a fairly powerful device and things could certainly be
different for very low end chips trying to run Linux.
> I would expect older boards init script to be separate for newer boards init
> script. The OS by default boosts all RT tasks unless a platform specific script
> overrides that. So I can't see how an OS upgrade would affect older boards.
In the Chrome OS world I'm part of, the people supporting the boards
are not always the people adding new whizbang features. We get new
versions of the OS every 6 weeks and those new versions may change the
way things work pretty significantly. We ship those new versions off
to all boards. Certainly they undergo testing, but there are a lot of
boards and if something is not tuned as well as it was when the device
first shipped it's likely nobody will really notice. This is my bias
against needing per-board tuning.
> This knob still allows you to disable the max boosting and use the per-task
> uclamp interface to boost only those tasks you care about. AFAIK this is
> already done in a hacky way in android devices via special vendors provisions.
Yeah. I don't think I have enough skin in the game to really say one
way or the other what the API should be so I'll probably stay off the
other, bigger thread and let others decide what the best API should
be. For Chrome OS I'd probably advocate just disabling the default
boost but even there I'd be willing to defer to the folks doing the
actual work.
-Doug
Hi Doug,
On 06/18/20 14:31, Doug Anderson wrote:
> Hi,
>
> On Fri, Jun 12, 2020 at 5:34 AM Qais Yousef <[email protected]> wrote:
> >
> > On 06/12/20 10:24, Quentin Perret wrote:
> > > +CC Qais [FYI]
> >
> > Thanks for the CC.
> >
> > >
> > > On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> > > > Hrm. I guess my first instinct is to say that we still want this
> > > > patch even if we have something that is applied more globally.
> > >
> > > Fair enough.
> > >
> > > > Specifically it sounds as if the patch you point at is suggesting that
> > > > we'd tweak the boost value to something other than max but we'd still
> > > > have a boost value. In the case of cros_ec_spi I don't believe we
> > > > need any boost value at all, so my patch would still be useful. The
> > > > computational needs of cros_ec_spi are very modest and it can do its
> > > > work at lower CPU frequencies just fine. It just can't be interrupted
> > > > for large swaths of time.
> > > >
> > > >
> > > > > IOW, how do you feel about [email protected] ?
> > > >
> > > > I'm not totally a fan, but I'm definitely not an expert in this area
> > > > (I've also only read the patch description and not the patch or the
> > > > whole thread). I really don't want yet another value that I need to
> > > > tune from board to board. Even worse, this tuning value isn't
> > > > board-specific but a combination of board and software specific. By
> > > > this, I'd imagine a scenario where you're using a real-time task to
> > > > get audio decoding done within a certain latency. I guess you'd tune
> > > > this value to make sure that you can get all your audio decoding done
> > > > in time but also not burn extra power. Now, imagine that the OS
> > > > upgrades and the audio task suddenly has to decode more complex
> > > > streams. You've got to go across all of your boards and re-tune every
> > > > one? ...or, nobody thinks about it and older boards start getting
> > > > stuttery audio? Perhaps the opposite happens and someone comes up
> > > > with a newer lower-cpu-intensive codec and you could save power.
> > > > Sounds like a bit of a nightmare.
> >
> > Generally I would expect this global tunable to be part of a vendor's SoC BSP.
>
> I think I'm coming at this from a very different world than the one
> you're thinking of. The concept of a BSP makes me think of a SoC
> vendor providing a drop of Linux with all their own weird hacks in it
> that only ever works on that one SoC and will never, ever, ever be
> updated. 5 years down the road if a software update is needed
> (security fix?) some poor soul will be in charge of tracking down the
> exact ancient code base that was used to build the original kernel,
> making the tweak, and building a new kernel. ;-)
>
> In the world of Chrome OS we try very very hard to build everything
> from a clean code base and are trying hard to stay even closer to
> upstream and away from per-device weirdness...
Hmm I thought OEMs who ship stuff that are based on Chrome OS would have to do
the final tuning here, which would be based on the recommendation of the SoC
vendor. And I'm being overly generic here to think not only SoC from Intel
which traditionally they have been treated in different ways.
I think you provide a generic stack for OEMs, no?
Generally for RT tasks, Linux always required an admin to do the tuning. And to
provide an automatic solution that fits all is not easy to happen, because what
ultimately is important for userspace, is only known by the userspace.
This is an interesting problem for me personally that I have been trying to
look at in my free time. On general purpose systems, it's hard to reason about
what's important because, as you say, you distribute something that should
target a wide range of platforms. And sometimes the end user (like a person
installing Ubuntu Desktop on their laptop), have no clue what a driver is even
to pick the right tuning for all RT tasks in their system.
But this problem already exists and catching up with this will need more work
from both distros and maybe kernel. I can't see a possible situation where the
kernel can do all the tuning without userspace providing more info anyway.
The per-device weirdness you're talking about is just how the way goes in
a world where there are many SoCs that are created to target different budgets
and use cases.
So I don't see the 2 worlds (laptops/mobile) really different, they just
traditionally started differently where such variations in SoC didn't exist
that much. But there are more Arm based SoCs that are targetting laptop markets
now.. So you never know when they will be dealing with the same variations the
mobile world sees today.
>
>
> > People tend to think of the flagship SoCs which are powerful, but if you
> > consider the low and medium end devices there's a massive spectrum over there
> > that this range is trying to cover.
>
> Yeah, perhaps you're right. When thinking about a laptop it's almost
> always a fairly powerful device and things could certainly be
> different for very low end chips trying to run Linux.
I think there are low end laptops in the market that barely have enough grunt
to do work. So not all laptops are fairly powerful.
>
>
> > I would expect older boards init script to be separate for newer boards init
> > script. The OS by default boosts all RT tasks unless a platform specific script
> > overrides that. So I can't see how an OS upgrade would affect older boards.
>
> In the Chrome OS world I'm part of, the people supporting the boards
> are not always the people adding new whizbang features. We get new
> versions of the OS every 6 weeks and those new versions may change the
> way things work pretty significantly. We ship those new versions off
> to all boards. Certainly they undergo testing, but there are a lot of
> boards and if something is not tuned as well as it was when the device
> first shipped it's likely nobody will really notice. This is my bias
> against needing per-board tuning.
Apologies for my lack knowledge about the exact process in Chrome OS. In my
head, I think what you do is like what Android and other distros do. Provide
a software stack that can be used on any platform. On Android OEMs do have to
do the tuning (with a help from SoC vendor usually). Not many laptops ship with
Linux as their default OS, but this is an increasing trend and LWN did write
something recently about the demand is increasing for Linux based laptops.
I thought Chrome OS is the same, where an OEM will take your deliverables and
do the final integration.
So I agree at your level it might be hard to reason about the full spectrum.
But I thought the OEMs that wants to ship Chrome OS will need to look at init
scripts, drivers etc to ensure their laptop is competitive.
>
>
> > This knob still allows you to disable the max boosting and use the per-task
> > uclamp interface to boost only those tasks you care about. AFAIK this is
> > already done in a hacky way in android devices via special vendors provisions.
>
> Yeah. I don't think I have enough skin in the game to really say one
> way or the other what the API should be so I'll probably stay off the
> other, bigger thread and let others decide what the best API should
> be. For Chrome OS I'd probably advocate just disabling the default
> boost but even there I'd be willing to defer to the folks doing the
> actual work.
Yes I appreciate the challenges you have. I think for most users disabling is
good enough for them as RT tasks could end up causing a lot of power to be
wasted unnecessarily. But if someone wants to take the extra mile and squeeze
the best compromise, they can :)
Thanks
--
Qais Yousef
On 06/18/20 14:18, Doug Anderson wrote:
> Hi,
>
> On Fri, Jun 12, 2020 at 5:52 AM Qais Yousef <[email protected]> wrote:
> >
> > On 06/10/20 15:18, Douglas Anderson wrote:
> > > The cros_ec_spi driver is realtime priority so that it doesn't get
> > > preempted by other taks while it's talking to the EC but overall it
> > > really doesn't need lots of compute power. Unfortunately, by default,
> > > the kernel assumes that all realtime tasks should cause the cpufreq to
> > > jump to max and burn through power to get things done as quickly as
> > > possible. That's just not the correct behavior for cros_ec_spi.
> > >
> > > Switch to manually overriding the default.
> > >
> > > This won't help us if our work moves over to the SPI pump thread but
> > > that's not the common code path.
> > >
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > ---
> > > NOTE: This would cause a conflict if the patch
> > > https://lore.kernel.org/r/[email protected] lands
> > > first
> > >
> > > drivers/platform/chrome/cros_ec_spi.c | 10 ++++++----
> > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
> > > index debea5c4c829..76d59d5e7efd 100644
> > > --- a/drivers/platform/chrome/cros_ec_spi.c
> > > +++ b/drivers/platform/chrome/cros_ec_spi.c
> > > @@ -709,8 +709,11 @@ static void cros_ec_spi_high_pri_release(void *worker)
> > > static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > > struct cros_ec_spi *ec_spi)
> > > {
> > > - struct sched_param sched_priority = {
> > > - .sched_priority = MAX_RT_PRIO / 2,
> > > + struct sched_attr sched_attr = {
> > > + .sched_policy = SCHED_FIFO,
> > > + .sched_priority = MAX_RT_PRIO / 2,
> > > + .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
> > > + .sched_util_min = 0,
> >
> > Hmm I thought Peter already removed all users that change RT priority directly.
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> Yeah, I mentioned that patch series "after the cut" in my patch and
> also made sure to CC Peter on my patch. I'm not sure what happened to
> that series since I don't see it in linuxnext...
Apologies I missed that.
>
>
> > > };
> > > int err;
> > >
> > > @@ -728,8 +731,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > > if (err)
> > > return err;
> > >
> > > - err = sched_setscheduler_nocheck(ec_spi->high_pri_worker->task,
> > > - SCHED_FIFO, &sched_priority);
> > > + err = sched_setattr_nocheck(ec_spi->high_pri_worker->task, &sched_attr);
> > > if (err)
> > > dev_err(dev, "Can't set cros_ec high pri priority: %d\n", err);
> > > return err;
> >
> > If I read the code correctly, if you fail here cros_ec_spi_probe() will fail
> > too and the whole thing will not be loaded. If you compile without uclamp then
> > sched_setattr_nocheck() will always fail with -EOPNOTSUPP.
>
> Oops, definitely need to send out a v2 for that if nothing else. Is
> there any good way for me to code this up or do I need a big #ifdef
> somewhere in my code?
A big #ifdef. But this kind of use I don't think was anticipated. And generally
if we want to allow that, it has to be done via a proper API. Drivers picking
random uclamp values is as bad as them picking random RT priority.
>
>
> > Why can't you manage the priority and boost value of such tasks via your init
> > scripts instead?
>
> I guess I feel like it's weird in this case. Userspace isn't creating
> this task and isn't the one marking it as realtime. ...and, if we
> ever manage to upgrade the protocol which we use to talk to the EC we
> might fully get rid of this task the need to have something boosted up
> to realtime.
>
> Said another way: the fact that we even have this task at all and the
> fact that it's realtime is an implementation detail of the kernel. It
> seems really weird to add initscripts for it.
Yes this is the problem of RT for a general purpose systems. It's hard to
reason about their priorities/importance since it's not a special purpose
system with well defined spec of what hardware/software will be running on it
and their precise requirements is not known before hand.
>
>
> > I have to admit I need to think about whether it makes sense to have a generic
> > API that allows drivers to opt-out of the default boosting behavior for their
> > RT tasks.
>
> Seems like it would be useful.
If you propose something that will help the discussion. I think based on the
same approach Peter has taken to prevent random RT priorities. In uclamp case
I think we just want to allow driver to opt RT tasks out of the default
boosting behavior.
I'm a bit wary that this extra layer of tuning might create a confusion, but
I can't reason about why is it bad for a driver to say I don't want my RT task
to be boosted too.
Thanks
--
Qais Yousef
Hi,
On Fri, Jun 19, 2020 at 8:31 AM Qais Yousef <[email protected]> wrote:
>
> Hi Doug,
>
> On 06/18/20 14:31, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Jun 12, 2020 at 5:34 AM Qais Yousef <[email protected]> wrote:
> > >
> > > On 06/12/20 10:24, Quentin Perret wrote:
> > > > +CC Qais [FYI]
> > >
> > > Thanks for the CC.
> > >
> > > >
> > > > On Thursday 11 Jun 2020 at 10:48:40 (-0700), Doug Anderson wrote:
> > > > > Hrm. I guess my first instinct is to say that we still want this
> > > > > patch even if we have something that is applied more globally.
> > > >
> > > > Fair enough.
> > > >
> > > > > Specifically it sounds as if the patch you point at is suggesting that
> > > > > we'd tweak the boost value to something other than max but we'd still
> > > > > have a boost value. In the case of cros_ec_spi I don't believe we
> > > > > need any boost value at all, so my patch would still be useful. The
> > > > > computational needs of cros_ec_spi are very modest and it can do its
> > > > > work at lower CPU frequencies just fine. It just can't be interrupted
> > > > > for large swaths of time.
> > > > >
> > > > >
> > > > > > IOW, how do you feel about [email protected] ?
> > > > >
> > > > > I'm not totally a fan, but I'm definitely not an expert in this area
> > > > > (I've also only read the patch description and not the patch or the
> > > > > whole thread). I really don't want yet another value that I need to
> > > > > tune from board to board. Even worse, this tuning value isn't
> > > > > board-specific but a combination of board and software specific. By
> > > > > this, I'd imagine a scenario where you're using a real-time task to
> > > > > get audio decoding done within a certain latency. I guess you'd tune
> > > > > this value to make sure that you can get all your audio decoding done
> > > > > in time but also not burn extra power. Now, imagine that the OS
> > > > > upgrades and the audio task suddenly has to decode more complex
> > > > > streams. You've got to go across all of your boards and re-tune every
> > > > > one? ...or, nobody thinks about it and older boards start getting
> > > > > stuttery audio? Perhaps the opposite happens and someone comes up
> > > > > with a newer lower-cpu-intensive codec and you could save power.
> > > > > Sounds like a bit of a nightmare.
> > >
> > > Generally I would expect this global tunable to be part of a vendor's SoC BSP.
> >
> > I think I'm coming at this from a very different world than the one
> > you're thinking of. The concept of a BSP makes me think of a SoC
> > vendor providing a drop of Linux with all their own weird hacks in it
> > that only ever works on that one SoC and will never, ever, ever be
> > updated. 5 years down the road if a software update is needed
> > (security fix?) some poor soul will be in charge of tracking down the
> > exact ancient code base that was used to build the original kernel,
> > making the tweak, and building a new kernel. ;-)
> >
> > In the world of Chrome OS we try very very hard to build everything
> > from a clean code base and are trying hard to stay even closer to
> > upstream and away from per-device weirdness...
>
> Hmm I thought OEMs who ship stuff that are based on Chrome OS would have to do
> the final tuning here, which would be based on the recommendation of the SoC
> vendor. And I'm being overly generic here to think not only SoC from Intel
> which traditionally they have been treated in different ways.
>
> I think you provide a generic stack for OEMs, no?
No, that's the Android way. The only way Chrome OS can ship updates
for the whole fleet every ~6 weeks and support it for so many years is
that all the builds and updates happen on Google servers. The only
way Google can maintain this is to not have a separate kernel code
base for every single variant of every single board.
> Generally for RT tasks, Linux always required an admin to do the tuning. And to
> provide an automatic solution that fits all is not easy to happen, because what
> ultimately is important for userspace, is only known by the userspace.
>
> This is an interesting problem for me personally that I have been trying to
> look at in my free time. On general purpose systems, it's hard to reason about
> what's important because, as you say, you distribute something that should
> target a wide range of platforms. And sometimes the end user (like a person
> installing Ubuntu Desktop on their laptop), have no clue what a driver is even
> to pick the right tuning for all RT tasks in their system.
>
> But this problem already exists and catching up with this will need more work
> from both distros and maybe kernel. I can't see a possible situation where the
> kernel can do all the tuning without userspace providing more info anyway.
>
> The per-device weirdness you're talking about is just how the way goes in
> a world where there are many SoCs that are created to target different budgets
> and use cases.
I think it's sane for the OS to do very high level tuning, like "this
platform has an underpowered CPU and being fast is more important than
battery life, so error on the side of running fast" or "this platform
has a fast SSD so tune disk algorithms appropriately". ...but picking
specific values gets tricky.
> So I don't see the 2 worlds (laptops/mobile) really different, they just
> traditionally started differently where such variations in SoC didn't exist
> that much. But there are more Arm based SoCs that are targetting laptop markets
> now.. So you never know when they will be dealing with the same variations the
> mobile world sees today.
The slowest laptop we currently support is a Quad-core A17 running at
~1.8 GHz. On that CPU the cros_ec thread needs to be realtime (so it
doesn't get interrupted) but doesn't need any particular CPU speed.
That being said, that does remind me that on that laptop we did give
up and end up with a hack to set a minimum CPU speed when USB audio
was happening. That CPU has dwc2 for its USB controller and it's
utterly important to service interrupts in < 125 us when USB audio is
going on. So in that case, yes, I was forced to do tuning and pick a
specific value.
> > > People tend to think of the flagship SoCs which are powerful, but if you
> > > consider the low and medium end devices there's a massive spectrum over there
> > > that this range is trying to cover.
> >
> > Yeah, perhaps you're right. When thinking about a laptop it's almost
> > always a fairly powerful device and things could certainly be
> > different for very low end chips trying to run Linux.
>
> I think there are low end laptops in the market that barely have enough grunt
> to do work. So not all laptops are fairly powerful.
>
> >
> >
> > > I would expect older boards init script to be separate for newer boards init
> > > script. The OS by default boosts all RT tasks unless a platform specific script
> > > overrides that. So I can't see how an OS upgrade would affect older boards.
> >
> > In the Chrome OS world I'm part of, the people supporting the boards
> > are not always the people adding new whizbang features. We get new
> > versions of the OS every 6 weeks and those new versions may change the
> > way things work pretty significantly. We ship those new versions off
> > to all boards. Certainly they undergo testing, but there are a lot of
> > boards and if something is not tuned as well as it was when the device
> > first shipped it's likely nobody will really notice. This is my bias
> > against needing per-board tuning.
>
> Apologies for my lack knowledge about the exact process in Chrome OS. In my
> head, I think what you do is like what Android and other distros do. Provide
> a software stack that can be used on any platform. On Android OEMs do have to
> do the tuning (with a help from SoC vendor usually). Not many laptops ship with
> Linux as their default OS, but this is an increasing trend and LWN did write
> something recently about the demand is increasing for Linux based laptops.
>
>
> I thought Chrome OS is the same, where an OEM will take your deliverables and
> do the final integration.
>
> So I agree at your level it might be hard to reason about the full spectrum.
> But I thought the OEMs that wants to ship Chrome OS will need to look at init
> scripts, drivers etc to ensure their laptop is competitive.
>
> >
> >
> > > This knob still allows you to disable the max boosting and use the per-task
> > > uclamp interface to boost only those tasks you care about. AFAIK this is
> > > already done in a hacky way in android devices via special vendors provisions.
> >
> > Yeah. I don't think I have enough skin in the game to really say one
> > way or the other what the API should be so I'll probably stay off the
> > other, bigger thread and let others decide what the best API should
> > be. For Chrome OS I'd probably advocate just disabling the default
> > boost but even there I'd be willing to defer to the folks doing the
> > actual work.
>
> Yes I appreciate the challenges you have. I think for most users disabling is
> good enough for them as RT tasks could end up causing a lot of power to be
> wasted unnecessarily. But if someone wants to take the extra mile and squeeze
> the best compromise, they can :)
Yeah, thinking about the USB audio hack we ended up with makes me
understand that sometimes on lower-end systems you are forced to do
tuning even if it's annoying.
-Doug
Hi,
On Fri, Jun 19, 2020 at 8:38 AM Qais Yousef <[email protected]> wrote:
>
> On 06/18/20 14:18, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Jun 12, 2020 at 5:52 AM Qais Yousef <[email protected]> wrote:
> > >
> > > On 06/10/20 15:18, Douglas Anderson wrote:
> > > > The cros_ec_spi driver is realtime priority so that it doesn't get
> > > > preempted by other taks while it's talking to the EC but overall it
> > > > really doesn't need lots of compute power. Unfortunately, by default,
> > > > the kernel assumes that all realtime tasks should cause the cpufreq to
> > > > jump to max and burn through power to get things done as quickly as
> > > > possible. That's just not the correct behavior for cros_ec_spi.
> > > >
> > > > Switch to manually overriding the default.
> > > >
> > > > This won't help us if our work moves over to the SPI pump thread but
> > > > that's not the common code path.
> > > >
> > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > ---
> > > > NOTE: This would cause a conflict if the patch
> > > > https://lore.kernel.org/r/[email protected] lands
> > > > first
> > > >
> > > > drivers/platform/chrome/cros_ec_spi.c | 10 ++++++----
> > > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrome/cros_ec_spi.c
> > > > index debea5c4c829..76d59d5e7efd 100644
> > > > --- a/drivers/platform/chrome/cros_ec_spi.c
> > > > +++ b/drivers/platform/chrome/cros_ec_spi.c
> > > > @@ -709,8 +709,11 @@ static void cros_ec_spi_high_pri_release(void *worker)
> > > > static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > > > struct cros_ec_spi *ec_spi)
> > > > {
> > > > - struct sched_param sched_priority = {
> > > > - .sched_priority = MAX_RT_PRIO / 2,
> > > > + struct sched_attr sched_attr = {
> > > > + .sched_policy = SCHED_FIFO,
> > > > + .sched_priority = MAX_RT_PRIO / 2,
> > > > + .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
> > > > + .sched_util_min = 0,
> > >
> > > Hmm I thought Peter already removed all users that change RT priority directly.
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> >
> > Yeah, I mentioned that patch series "after the cut" in my patch and
> > also made sure to CC Peter on my patch. I'm not sure what happened to
> > that series since I don't see it in linuxnext...
>
> Apologies I missed that.
>
> >
> >
> > > > };
> > > > int err;
> > > >
> > > > @@ -728,8 +731,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct device *dev,
> > > > if (err)
> > > > return err;
> > > >
> > > > - err = sched_setscheduler_nocheck(ec_spi->high_pri_worker->task,
> > > > - SCHED_FIFO, &sched_priority);
> > > > + err = sched_setattr_nocheck(ec_spi->high_pri_worker->task, &sched_attr);
> > > > if (err)
> > > > dev_err(dev, "Can't set cros_ec high pri priority: %d\n", err);
> > > > return err;
> > >
> > > If I read the code correctly, if you fail here cros_ec_spi_probe() will fail
> > > too and the whole thing will not be loaded. If you compile without uclamp then
> > > sched_setattr_nocheck() will always fail with -EOPNOTSUPP.
> >
> > Oops, definitely need to send out a v2 for that if nothing else. Is
> > there any good way for me to code this up or do I need a big #ifdef
> > somewhere in my code?
>
> A big #ifdef. But this kind of use I don't think was anticipated. And generally
> if we want to allow that, it has to be done via a proper API. Drivers picking
> random uclamp values is as bad as them picking random RT priority.
>
> >
> >
> > > Why can't you manage the priority and boost value of such tasks via your init
> > > scripts instead?
> >
> > I guess I feel like it's weird in this case. Userspace isn't creating
> > this task and isn't the one marking it as realtime. ...and, if we
> > ever manage to upgrade the protocol which we use to talk to the EC we
> > might fully get rid of this task the need to have something boosted up
> > to realtime.
> >
> > Said another way: the fact that we even have this task at all and the
> > fact that it's realtime is an implementation detail of the kernel. It
> > seems really weird to add initscripts for it.
>
> Yes this is the problem of RT for a general purpose systems. It's hard to
> reason about their priorities/importance since it's not a special purpose
> system with well defined spec of what hardware/software will be running on it
> and their precise requirements is not known before hand.
>
> >
> >
> > > I have to admit I need to think about whether it makes sense to have a generic
> > > API that allows drivers to opt-out of the default boosting behavior for their
> > > RT tasks.
> >
> > Seems like it would be useful.
>
> If you propose something that will help the discussion. I think based on the
> same approach Peter has taken to prevent random RT priorities. In uclamp case
> I think we just want to allow driver to opt RT tasks out of the default
> boosting behavior.
>
> I'm a bit wary that this extra layer of tuning might create a confusion, but
> I can't reason about why is it bad for a driver to say I don't want my RT task
> to be boosted too.
Right. I was basically just trying to say "turn my boosting off".
...so I guess you're saying that doing a v2 of my patch with the
proper #ifdef protection wouldn't be a good way to go and I'd need to
propose some sort of API for this?
-Doug
On 06/22/20 11:19, Doug Anderson wrote:
[...]
> > Hmm I thought OEMs who ship stuff that are based on Chrome OS would have to do
> > the final tuning here, which would be based on the recommendation of the SoC
> > vendor. And I'm being overly generic here to think not only SoC from Intel
> > which traditionally they have been treated in different ways.
> >
> > I think you provide a generic stack for OEMs, no?
>
> No, that's the Android way. The only way Chrome OS can ship updates
> for the whole fleet every ~6 weeks and support it for so many years is
> that all the builds and updates happen on Google servers. The only
> way Google can maintain this is to not have a separate kernel code
> base for every single variant of every single board.
I was referring to userspace tuning, which I expected to be platform specific.
Assuming the devices share the same kernel. At least in the context of uclamp
and RT tuning that should be possible.
I appreciate that you want to ship a simple generic setup on as many
devices. But there will be a trade-off to make between optimal tuning and
keeping things simple and generic. The latter is perfectly fine goal to have
of course. Others who want to tune more they're free to do so too as well.
>
>
> > Generally for RT tasks, Linux always required an admin to do the tuning. And to
> > provide an automatic solution that fits all is not easy to happen, because what
> > ultimately is important for userspace, is only known by the userspace.
> >
> > This is an interesting problem for me personally that I have been trying to
> > look at in my free time. On general purpose systems, it's hard to reason about
> > what's important because, as you say, you distribute something that should
> > target a wide range of platforms. And sometimes the end user (like a person
> > installing Ubuntu Desktop on their laptop), have no clue what a driver is even
> > to pick the right tuning for all RT tasks in their system.
> >
> > But this problem already exists and catching up with this will need more work
> > from both distros and maybe kernel. I can't see a possible situation where the
> > kernel can do all the tuning without userspace providing more info anyway.
> >
> > The per-device weirdness you're talking about is just how the way goes in
> > a world where there are many SoCs that are created to target different budgets
> > and use cases.
>
> I think it's sane for the OS to do very high level tuning, like "this
> platform has an underpowered CPU and being fast is more important than
> battery life, so error on the side of running fast" or "this platform
> has a fast SSD so tune disk algorithms appropriately". ...but picking
> specific values gets tricky.
You can always use the per-task API to boost that task to maximum, if power is
not your concern. From user-space ;-)
If you're power conscious too, yeah there's no way but to test for the minimum
value that gives you what you want. The task can try to regulate itself too if
it can observe when it's not running fast enough (notices a glitch?).
You can get fancy if you want, depending on your goal.
It's hard to get the best though with one size fits all.
HTH
--
Qais Yousef
On 06/22/20 11:21, Doug Anderson wrote:
[...]
> > If you propose something that will help the discussion. I think based on the
> > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > I think we just want to allow driver to opt RT tasks out of the default
> > boosting behavior.
> >
> > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > I can't reason about why is it bad for a driver to say I don't want my RT task
> > to be boosted too.
>
> Right. I was basically just trying to say "turn my boosting off".
>
> ...so I guess you're saying that doing a v2 of my patch with the
> proper #ifdef protection wouldn't be a good way to go and I'd need to
> propose some sort of API for this?
It's up to Peter really.
It concerns me in general to start having in-kernel users of uclamp that might
end up setting random values (like we ended having random RT priorities), that
really don't mean a lot outside the context of the specific system it was
tested on. Given the kernel could run anywhere, it's hard to rationalize what's
okay or not.
Opting out of default RT boost for a specific task in the kernel, could make
sense though it still concerns me for the same reasons. Is this okay for all
possible systems this can run on?
It feels better for userspace to turn RT boosting off for all tasks if you know
your system is powerful, or use the per task API to switch off boosting for the
tasks you know they don't need it.
But if we want to allow in-kernel users, IMO it needs to be done in
a controlled way, in a similar manner Peter changed how RT priority can be set
in the kernel.
It would be good hear what Peter thinks.
Thanks
--
Qais Yousef
On Tue, Jun 23, 2020 at 12:40 PM Qais Yousef <[email protected]> wrote:
>
> On 06/22/20 11:21, Doug Anderson wrote:
>
> [...]
>
> > > If you propose something that will help the discussion. I think based on the
> > > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > > I think we just want to allow driver to opt RT tasks out of the default
> > > boosting behavior.
> > >
> > > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > > I can't reason about why is it bad for a driver to say I don't want my RT task
> > > to be boosted too.
> >
> > Right. I was basically just trying to say "turn my boosting off".
> >
> > ...so I guess you're saying that doing a v2 of my patch with the
> > proper #ifdef protection wouldn't be a good way to go and I'd need to
> > propose some sort of API for this?
>
> It's up to Peter really.
>
> It concerns me in general to start having in-kernel users of uclamp that might
> end up setting random values (like we ended having random RT priorities), that
> really don't mean a lot outside the context of the specific system it was
> tested on. Given the kernel could run anywhere, it's hard to rationalize what's
> okay or not.
>
> Opting out of default RT boost for a specific task in the kernel, could make
> sense though it still concerns me for the same reasons. Is this okay for all
> possible systems this can run on?
>
> It feels better for userspace to turn RT boosting off for all tasks if you know
> your system is powerful, or use the per task API to switch off boosting for the
> tasks you know they don't need it.
>
> But if we want to allow in-kernel users, IMO it needs to be done in
> a controlled way, in a similar manner Peter changed how RT priority can be set
> in the kernel.
>
> It would be good hear what Peter thinks.
It seems a bit of a hack, but really the commit message says the
driver is not expected to take a lot of CPU capacity so it should be
expected to work across platforms. It is likely to behave better than
how it behaves now.
On Thu, Jun 18, 2020 at 02:18:23PM -0700, Doug Anderson wrote:
> Hi,
>
> On Fri, Jun 12, 2020 at 5:52 AM Qais Yousef <[email protected]> wrote:
> >
> > On 06/10/20 15:18, Douglas Anderson wrote:
> > > + struct sched_attr sched_attr = {
> > > + .sched_policy = SCHED_FIFO,
> > > + .sched_priority = MAX_RT_PRIO / 2,
> > > + .sched_flags = SCHED_FLAG_UTIL_CLAMP_MIN,
> > > + .sched_util_min = 0,
> >
> > Hmm I thought Peter already removed all users that change RT priority directly.
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> Yeah, I mentioned that patch series "after the cut" in my patch and
> also made sure to CC Peter on my patch. I'm not sure what happened to
> that series since I don't see it in linuxnext...
Right, I got busy with other stuff and it languished a bit.
I tried to merge it recently (it's actually in tip/sched/fifo atm), but
then 0day robot found a problem with it and I've not yet had the time to
deal with that.
Specifically these patches unEXPORT the schet_setscheduler() family, but
kernel/trace/ring_buffer_benchmark.c can still be a module :-(
On Tue, Jun 23, 2020 at 05:40:21PM +0100, Qais Yousef wrote:
> On 06/22/20 11:21, Doug Anderson wrote:
>
> [...]
>
> > > If you propose something that will help the discussion. I think based on the
> > > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > > I think we just want to allow driver to opt RT tasks out of the default
> > > boosting behavior.
> > >
> > > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > > I can't reason about why is it bad for a driver to say I don't want my RT task
> > > to be boosted too.
> >
> > Right. I was basically just trying to say "turn my boosting off".
> >
> > ...so I guess you're saying that doing a v2 of my patch with the
> > proper #ifdef protection wouldn't be a good way to go and I'd need to
> > propose some sort of API for this?
>
> It's up to Peter really.
>
> It concerns me in general to start having in-kernel users of uclamp that might
> end up setting random values (like we ended having random RT priorities), that
> really don't mean a lot outside the context of the specific system it was
> tested on. Given the kernel could run anywhere, it's hard to rationalize what's
> okay or not.
>
> Opting out of default RT boost for a specific task in the kernel, could make
> sense though it still concerns me for the same reasons. Is this okay for all
> possible systems this can run on?
>
> It feels better for userspace to turn RT boosting off for all tasks if you know
> your system is powerful, or use the per task API to switch off boosting for the
> tasks you know they don't need it.
>
> But if we want to allow in-kernel users, IMO it needs to be done in
> a controlled way, in a similar manner Peter changed how RT priority can be set
> in the kernel.
>
> It would be good hear what Peter thinks.
Hurmph.. I think I understand the problem, but I'm not sure what to do
about it :-(
Esp. given the uclamp optimization patches now under consideration. That
is, if random drivers are going to install uclamps, then that will
automagically enable the static_key and make the scheduler slower, even
if that driver isn't particularly interesting to the user.
On 06/24/20 11:49, Joel Fernandes wrote:
> On Tue, Jun 23, 2020 at 12:40 PM Qais Yousef <[email protected]> wrote:
> >
> > On 06/22/20 11:21, Doug Anderson wrote:
> >
> > [...]
> >
> > > > If you propose something that will help the discussion. I think based on the
> > > > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > > > I think we just want to allow driver to opt RT tasks out of the default
> > > > boosting behavior.
> > > >
> > > > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > > > I can't reason about why is it bad for a driver to say I don't want my RT task
> > > > to be boosted too.
> > >
> > > Right. I was basically just trying to say "turn my boosting off".
> > >
> > > ...so I guess you're saying that doing a v2 of my patch with the
> > > proper #ifdef protection wouldn't be a good way to go and I'd need to
> > > propose some sort of API for this?
> >
> > It's up to Peter really.
> >
> > It concerns me in general to start having in-kernel users of uclamp that might
> > end up setting random values (like we ended having random RT priorities), that
> > really don't mean a lot outside the context of the specific system it was
> > tested on. Given the kernel could run anywhere, it's hard to rationalize what's
> > okay or not.
> >
> > Opting out of default RT boost for a specific task in the kernel, could make
> > sense though it still concerns me for the same reasons. Is this okay for all
> > possible systems this can run on?
> >
> > It feels better for userspace to turn RT boosting off for all tasks if you know
> > your system is powerful, or use the per task API to switch off boosting for the
> > tasks you know they don't need it.
> >
> > But if we want to allow in-kernel users, IMO it needs to be done in
> > a controlled way, in a similar manner Peter changed how RT priority can be set
> > in the kernel.
> >
> > It would be good hear what Peter thinks.
>
> It seems a bit of a hack, but really the commit message says the
Which part is the hack, the userspace control? It is how Linux expects things
to work AFAIU. But I do agree there's a hole for general purpose userspace that
wants to run and manage a diverse range of hardware.
I still think it's the job of device manufacturers/system integrator. But
not many ship Linux by default. Though I thought ChromeOS is the exception
here.
> driver is not expected to take a lot of CPU capacity so it should be
> expected to work across platforms. It is likely to behave better than
> how it behaves now.
Doing the in-kernel opt-out via API should be fine, I think. But this will
need to be discussed in the wider circle. It will already clash with this for
example
https://lore.kernel.org/lkml/[email protected]/
Thanks
--
Qais Yousef
On 06/24/20 18:01, Peter Zijlstra wrote:
> On Tue, Jun 23, 2020 at 05:40:21PM +0100, Qais Yousef wrote:
> > On 06/22/20 11:21, Doug Anderson wrote:
> >
> > [...]
> >
> > > > If you propose something that will help the discussion. I think based on the
> > > > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > > > I think we just want to allow driver to opt RT tasks out of the default
> > > > boosting behavior.
> > > >
> > > > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > > > I can't reason about why is it bad for a driver to say I don't want my RT task
> > > > to be boosted too.
> > >
> > > Right. I was basically just trying to say "turn my boosting off".
> > >
> > > ...so I guess you're saying that doing a v2 of my patch with the
> > > proper #ifdef protection wouldn't be a good way to go and I'd need to
> > > propose some sort of API for this?
> >
> > It's up to Peter really.
> >
> > It concerns me in general to start having in-kernel users of uclamp that might
> > end up setting random values (like we ended having random RT priorities), that
> > really don't mean a lot outside the context of the specific system it was
> > tested on. Given the kernel could run anywhere, it's hard to rationalize what's
> > okay or not.
> >
> > Opting out of default RT boost for a specific task in the kernel, could make
> > sense though it still concerns me for the same reasons. Is this okay for all
> > possible systems this can run on?
> >
> > It feels better for userspace to turn RT boosting off for all tasks if you know
> > your system is powerful, or use the per task API to switch off boosting for the
> > tasks you know they don't need it.
> >
> > But if we want to allow in-kernel users, IMO it needs to be done in
> > a controlled way, in a similar manner Peter changed how RT priority can be set
> > in the kernel.
> >
> > It would be good hear what Peter thinks.
>
> Hurmph.. I think I understand the problem, but I'm not sure what to do
> about it :-(
>
> Esp. given the uclamp optimization patches now under consideration. That
> is, if random drivers are going to install uclamps, then that will
> automagically enable the static_key and make the scheduler slower, even
> if that driver isn't particularly interesting to the user.
For some reason this didn't land in my inbox.
Yeah I was considering how we can handle this potential new user without
a clash. But I thought we first need to agree this is indeed the right thing to
do.
The easiest way is to make the new API act on the task struct directly rather
than go through sched_setscheduler(). I.e: a wrapper around uclamp_se_set().
But first, I think we need to consider how accessible we want to keep
sched_setscheduler(). IMO it should be hidden because of its potential
misuse/abuse. Happy to help with that if you agree.
Thanks
--
Qais Yousef
On Wed, Jun 24, 2020 at 12:55 PM Qais Yousef <[email protected]> wrote:
>
> On 06/24/20 11:49, Joel Fernandes wrote:
> > On Tue, Jun 23, 2020 at 12:40 PM Qais Yousef <[email protected]> wrote:
> > >
> > > On 06/22/20 11:21, Doug Anderson wrote:
> > >
> > > [...]
> > >
> > > > > If you propose something that will help the discussion. I think based on the
> > > > > same approach Peter has taken to prevent random RT priorities. In uclamp case
> > > > > I think we just want to allow driver to opt RT tasks out of the default
> > > > > boosting behavior.
> > > > >
> > > > > I'm a bit wary that this extra layer of tuning might create a confusion, but
> > > > > I can't reason about why is it bad for a driver to say I don't want my RT task
> > > > > to be boosted too.
> > > >
> > > > Right. I was basically just trying to say "turn my boosting off".
> > > >
> > > > ...so I guess you're saying that doing a v2 of my patch with the
> > > > proper #ifdef protection wouldn't be a good way to go and I'd need to
> > > > propose some sort of API for this?
> > >
> > > It's up to Peter really.
> > >
> > > It concerns me in general to start having in-kernel users of uclamp that might
> > > end up setting random values (like we ended having random RT priorities), that
> > > really don't mean a lot outside the context of the specific system it was
> > > tested on. Given the kernel could run anywhere, it's hard to rationalize what's
> > > okay or not.
> > >
> > > Opting out of default RT boost for a specific task in the kernel, could make
> > > sense though it still concerns me for the same reasons. Is this okay for all
> > > possible systems this can run on?
> > >
> > > It feels better for userspace to turn RT boosting off for all tasks if you know
> > > your system is powerful, or use the per task API to switch off boosting for the
> > > tasks you know they don't need it.
> > >
> > > But if we want to allow in-kernel users, IMO it needs to be done in
> > > a controlled way, in a similar manner Peter changed how RT priority can be set
> > > in the kernel.
> > >
> > > It would be good hear what Peter thinks.
> >
> > It seems a bit of a hack, but really the commit message says the
>
> Which part is the hack, the userspace control? It is how Linux expects things
> to work AFAIU. But I do agree there's a hole for general purpose userspace that
> wants to run and manage a diverse range of hardware.
I meant to say, this patch is a necessary hack of sorts.
> > driver is not expected to take a lot of CPU capacity so it should be
> > expected to work across platforms. It is likely to behave better than
> > how it behaves now.
>
> Doing the in-kernel opt-out via API should be fine, I think. But this will
> need to be discussed in the wider circle. It will already clash with this for
> example
>
> https://lore.kernel.org/lkml/[email protected]/
Have not yet looked closer at that patch, but are you saying this
patch clashes with that work? Sorry I am operating on 2 hours of sleep
here.
On 06/24/20 13:35, Joel Fernandes wrote:
[...]
> > Doing the in-kernel opt-out via API should be fine, I think. But this will
> > need to be discussed in the wider circle. It will already clash with this for
> > example
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> Have not yet looked closer at that patch, but are you saying this
> patch clashes with that work? Sorry I am operating on 2 hours of sleep
> here.
The series is an optimization to remove the uclamp overhead from the scheduler
fastpath until the userspace uses it. It introduces a static key that is
disabled by default and will cause uclamp logic not to execute in the fast
path. Once the userspace starts using util clamp, which we detect by either
1. Changing uclamp value of a task with sched_setattr()
2. Modifying the default sysctl_sched_util_clamp_{min, max}
3. Modifying the default cpu.uclamp.{min, max} value in cgroup
If we start having in-kernel users changing uclamp value this means drivers
will cause the system to opt-in into uclamp automatically even if the
userspace doesn't actually use it.
I think we can solve this by providing a special API to opt-out safely. Which
is the right thing to do anyway even if we didn't have this clash.
Hope this makes sense.
Cheers
--
Qais Yousef
On Wed, Jun 24, 2020 at 1:52 PM Qais Yousef <[email protected]> wrote:
>
> On 06/24/20 13:35, Joel Fernandes wrote:
>
> [...]
>
> > > Doing the in-kernel opt-out via API should be fine, I think. But this will
> > > need to be discussed in the wider circle. It will already clash with this for
> > > example
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> >
> > Have not yet looked closer at that patch, but are you saying this
> > patch clashes with that work? Sorry I am operating on 2 hours of sleep
> > here.
>
> The series is an optimization to remove the uclamp overhead from the scheduler
> fastpath until the userspace uses it. It introduces a static key that is
> disabled by default and will cause uclamp logic not to execute in the fast
> path. Once the userspace starts using util clamp, which we detect by either
>
> 1. Changing uclamp value of a task with sched_setattr()
> 2. Modifying the default sysctl_sched_util_clamp_{min, max}
> 3. Modifying the default cpu.uclamp.{min, max} value in cgroup
>
> If we start having in-kernel users changing uclamp value this means drivers
> will cause the system to opt-in into uclamp automatically even if the
> userspace doesn't actually use it.
>
> I think we can solve this by providing a special API to opt-out safely. Which
> is the right thing to do anyway even if we didn't have this clash.
Makes sense, thanks.
- Joel
Hi,
On Wed, Jun 24, 2020 at 10:55 AM Joel Fernandes <[email protected]> wrote:
>
> On Wed, Jun 24, 2020 at 1:52 PM Qais Yousef <[email protected]> wrote:
> >
> > On 06/24/20 13:35, Joel Fernandes wrote:
> >
> > [...]
> >
> > > > Doing the in-kernel opt-out via API should be fine, I think. But this will
> > > > need to be discussed in the wider circle. It will already clash with this for
> > > > example
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > Have not yet looked closer at that patch, but are you saying this
> > > patch clashes with that work? Sorry I am operating on 2 hours of sleep
> > > here.
> >
> > The series is an optimization to remove the uclamp overhead from the scheduler
> > fastpath until the userspace uses it. It introduces a static key that is
> > disabled by default and will cause uclamp logic not to execute in the fast
> > path. Once the userspace starts using util clamp, which we detect by either
> >
> > 1. Changing uclamp value of a task with sched_setattr()
> > 2. Modifying the default sysctl_sched_util_clamp_{min, max}
> > 3. Modifying the default cpu.uclamp.{min, max} value in cgroup
> >
> > If we start having in-kernel users changing uclamp value this means drivers
> > will cause the system to opt-in into uclamp automatically even if the
> > userspace doesn't actually use it.
> >
> > I think we can solve this by providing a special API to opt-out safely. Which
> > is the right thing to do anyway even if we didn't have this clash.
>
> Makes sense, thanks.
OK, so I think the summary is:
1. There are enough external dependencies that are currently in the
works that it makes sense for those to land first without trying to
cram my patch to cros_ec in.
2. Maybe, as part of the work that's already going on, someone will
add an API that I can use. If so then I can write my patch once that
lands.
3. If nobody adds an API then I could look at adding the API myself
once everything else is settled.
4. It looks as if the patch you mentioned originally [1] that allows
userspace to just fully disable uclamp for RT tasks will land
eventually (if we're stuck for a short term solution we can pick the
existing patch). I believe Chrome OS will use that to just fully
disable the default boosting for RT tasks across our system and (if
needed) add boosts on a case-by-case basis. Once we do that, the
incentive to land a patch for cros_ec will be mostly gone and probably
we could consider my patch abandoned. While it would technically
still be sane to land it won't have any real-world benefit.
[1] https://lore.kernel.org/r/[email protected]
On 06/24/20 11:29, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 24, 2020 at 10:55 AM Joel Fernandes <[email protected]> wrote:
> >
> > On Wed, Jun 24, 2020 at 1:52 PM Qais Yousef <[email protected]> wrote:
> > >
> > > On 06/24/20 13:35, Joel Fernandes wrote:
> > >
> > > [...]
> > >
> > > > > Doing the in-kernel opt-out via API should be fine, I think. But this will
> > > > > need to be discussed in the wider circle. It will already clash with this for
> > > > > example
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > Have not yet looked closer at that patch, but are you saying this
> > > > patch clashes with that work? Sorry I am operating on 2 hours of sleep
> > > > here.
> > >
> > > The series is an optimization to remove the uclamp overhead from the scheduler
> > > fastpath until the userspace uses it. It introduces a static key that is
> > > disabled by default and will cause uclamp logic not to execute in the fast
> > > path. Once the userspace starts using util clamp, which we detect by either
> > >
> > > 1. Changing uclamp value of a task with sched_setattr()
> > > 2. Modifying the default sysctl_sched_util_clamp_{min, max}
> > > 3. Modifying the default cpu.uclamp.{min, max} value in cgroup
> > >
> > > If we start having in-kernel users changing uclamp value this means drivers
> > > will cause the system to opt-in into uclamp automatically even if the
> > > userspace doesn't actually use it.
> > >
> > > I think we can solve this by providing a special API to opt-out safely. Which
> > > is the right thing to do anyway even if we didn't have this clash.
> >
> > Makes sense, thanks.
>
> OK, so I think the summary is:
>
> 1. There are enough external dependencies that are currently in the
> works that it makes sense for those to land first without trying to
> cram my patch to cros_ec in.
+1
>
> 2. Maybe, as part of the work that's already going on, someone will
> add an API that I can use. If so then I can write my patch once that
> lands.
I won't be adding this API. Mainly because I can't argue for it personally as
I'm still not convinced it's a valid way of handling RT default boosting
behavior from within the kernel. But it's a valid discussion to have if you
want to drive it.
>
> 3. If nobody adds an API then I could look at adding the API myself
> once everything else is settled.
>
> 4. It looks as if the patch you mentioned originally [1] that allows
> userspace to just fully disable uclamp for RT tasks will land
> eventually (if we're stuck for a short term solution we can pick the
> existing patch). I believe Chrome OS will use that to just fully
> disable the default boosting for RT tasks across our system and (if
> needed) add boosts on a case-by-case basis. Once we do that, the
> incentive to land a patch for cros_ec will be mostly gone and probably
> we could consider my patch abandoned. While it would technically
> still be sane to land it won't have any real-world benefit.
I think this is the best way forward. I'm interesting in hearing what
difficulties you encounter while doing this work.
Regarding the patch [1], I need to tweak the way it is implemented and send v6,
but there are no objection to the idea and interface, so hopefully once I send
v6 it'd be accepted.
Thanks
--
Qais Yousef
> [1] https://lore.kernel.org/r/[email protected]