From: "Joel Fernandes (Google)" <[email protected]>
Currently there is a chance of a schedutil cpufreq update request to be
dropped if there is a pending update request. This pending request can
be delayed if there is a scheduling delay of the irq_work and the wake
up of the schedutil governor kthread.
A very bad scenario is when a schedutil request was already just made,
such as to reduce the CPU frequency, then a newer request to increase
CPU frequency (even sched deadline urgent frequency increase requests)
can be dropped, even though the rate limits suggest that its Ok to
process a request. This is because of the way the work_in_progress flag
is used.
This patch improves the situation by allowing new requests to happen
even though the old one is still being processed. Note that in this
approach, if an irq_work was already issued, we just update next_freq
and don't bother to queue another request so there's no extra work being
done to make this happen.
I had brought up this issue at the OSPM conference and Claudio had a
discussion RFC with an alternate approach [1]. I prefer the approach as
done in the patch below since it doesn't need any new flags and doesn't
cause any other extra overhead.
[1] https://patchwork.kernel.org/patch/10384261/
LGTMed-by: Viresh Kumar <[email protected]>
LGTMed-by: Juri Lelli <[email protected]>
CC: Viresh Kumar <[email protected]>
CC: Rafael J. Wysocki <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Patrick Bellasi <[email protected]>
CC: Juri Lelli <[email protected]>
Cc: Luca Abeni <[email protected]>
CC: Joel Fernandes <[email protected]>
CC: Todd Kjos <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
v1 -> v2: Minor style related changes.
kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e13df951aca7..5c482ec38610 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
- if (sg_policy->work_in_progress)
- return false;
-
if (unlikely(sg_policy->need_freq_update)) {
sg_policy->need_freq_update = false;
/*
@@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
policy->cur = next_freq;
trace_cpu_frequency(next_freq, smp_processor_id());
- } else {
+ } else if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
}
@@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
ignore_dl_rate_limit(sg_cpu, sg_policy);
+ /*
+ * For slow-switch systems, single policy requests can't run at the
+ * moment if update is in progress, unless we acquire update_lock.
+ */
+ if (sg_policy->work_in_progress)
+ return;
+
if (!sugov_should_update_freq(sg_policy, time))
return;
@@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
+ unsigned int freq;
+ unsigned long flags;
+
+ /*
+ * Hold sg_policy->update_lock shortly to handle the case where:
+ * incase sg_policy->next_freq is read here, and then updated by
+ * sugov_update_shared just before work_in_progress is set to false
+ * here, we may miss queueing the new update.
+ *
+ * Note: If a work was queued after the update_lock is released,
+ * sugov_work will just be called again by kthread_work code; and the
+ * request will be proceed before the sugov thread sleeps.
+ */
+ raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
+ freq = sg_policy->next_freq;
+ sg_policy->work_in_progress = false;
+ raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
mutex_lock(&sg_policy->work_lock);
- __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
- CPUFREQ_RELATION_L);
+ __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
-
- sg_policy->work_in_progress = false;
}
static void sugov_irq_work(struct irq_work *irq_work)
--
2.17.0.441.gb46fe60e1d-goog
On 05/18/2018 11:55 AM, Joel Fernandes (Google.) wrote:
> From: "Joel Fernandes (Google)" <[email protected]>
>
> Currently there is a chance of a schedutil cpufreq update request to be
> dropped if there is a pending update request. This pending request can
> be delayed if there is a scheduling delay of the irq_work and the wake
> up of the schedutil governor kthread.
>
> A very bad scenario is when a schedutil request was already just made,
> such as to reduce the CPU frequency, then a newer request to increase
> CPU frequency (even sched deadline urgent frequency increase requests)
> can be dropped, even though the rate limits suggest that its Ok to
> process a request. This is because of the way the work_in_progress flag
> is used.
>
> This patch improves the situation by allowing new requests to happen
> even though the old one is still being processed. Note that in this
> approach, if an irq_work was already issued, we just update next_freq
> and don't bother to queue another request so there's no extra work being
> done to make this happen.
>
> I had brought up this issue at the OSPM conference and Claudio had a
> discussion RFC with an alternate approach [1]. I prefer the approach as
> done in the patch below since it doesn't need any new flags and doesn't
> cause any other extra overhead.
>
> [1] https://patchwork.kernel.org/patch/10384261/
>
> LGTMed-by: Viresh Kumar <[email protected]>
> LGTMed-by: Juri Lelli <[email protected]>
> CC: Viresh Kumar <[email protected]>
> CC: Rafael J. Wysocki <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Patrick Bellasi <[email protected]>
> CC: Juri Lelli <[email protected]>
> Cc: Luca Abeni <[email protected]>
> CC: Joel Fernandes <[email protected]>
> CC: Todd Kjos <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> v1 -> v2: Minor style related changes.
>
> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
> 1 file changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e13df951aca7..5c482ec38610 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> return false;
>
> - if (sg_policy->work_in_progress)
> - return false;
> -
> if (unlikely(sg_policy->need_freq_update)) {
> sg_policy->need_freq_update = false;
> /*
> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>
> policy->cur = next_freq;
> trace_cpu_frequency(next_freq, smp_processor_id());
> - } else {
> + } else if (!sg_policy->work_in_progress) {
Not really something you added, but if you are modifying it:
Do we really need this work_in_progress flag? irq_work_queue() already
checks if the work is pending and then returns true/false.
Wouldn't the issue you are trying to fix be resolved just by dropping
this flag check entirely?
> sg_policy->work_in_progress = true;
> irq_work_queue(&sg_policy->irq_work);
> }
> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + /*
> + * For slow-switch systems, single policy requests can't run at the
> + * moment if update is in progress, unless we acquire update_lock.
> + */
> + if (sg_policy->work_in_progress)
> + return;
> +
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> static void sugov_work(struct kthread_work *work)
> {
> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> + unsigned int freq;
> + unsigned long flags;
> +
> + /*
> + * Hold sg_policy->update_lock shortly to handle the case where:
> + * incase sg_policy->next_freq is read here, and then updated by
> + * sugov_update_shared just before work_in_progress is set to false
> + * here, we may miss queueing the new update.
> + *
> + * Note: If a work was queued after the update_lock is released,
> + * sugov_work will just be called again by kthread_work code; and the
> + * request will be proceed before the sugov thread sleeps.
> + */
> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> + freq = sg_policy->next_freq;
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> mutex_lock(&sg_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> - CPUFREQ_RELATION_L);
> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> mutex_unlock(&sg_policy->work_lock);
> -
> - sg_policy->work_in_progress = false;
> }
>
> static void sugov_irq_work(struct irq_work *irq_work)
>
-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Fri, May 18, 2018 at 11:13 PM, Saravana Kannan
<[email protected]> wrote:
> On 05/18/2018 11:55 AM, Joel Fernandes (Google.) wrote:
>>
>> From: "Joel Fernandes (Google)" <[email protected]>
>>
>> Currently there is a chance of a schedutil cpufreq update request to be
>> dropped if there is a pending update request. This pending request can
>> be delayed if there is a scheduling delay of the irq_work and the wake
>> up of the schedutil governor kthread.
>>
>> A very bad scenario is when a schedutil request was already just made,
>> such as to reduce the CPU frequency, then a newer request to increase
>> CPU frequency (even sched deadline urgent frequency increase requests)
>> can be dropped, even though the rate limits suggest that its Ok to
>> process a request. This is because of the way the work_in_progress flag
>> is used.
>>
>> This patch improves the situation by allowing new requests to happen
>> even though the old one is still being processed. Note that in this
>> approach, if an irq_work was already issued, we just update next_freq
>> and don't bother to queue another request so there's no extra work being
>> done to make this happen.
>>
>> I had brought up this issue at the OSPM conference and Claudio had a
>> discussion RFC with an alternate approach [1]. I prefer the approach as
>> done in the patch below since it doesn't need any new flags and doesn't
>> cause any other extra overhead.
>>
>> [1] https://patchwork.kernel.org/patch/10384261/
>>
>> LGTMed-by: Viresh Kumar <[email protected]>
>> LGTMed-by: Juri Lelli <[email protected]>
>> CC: Viresh Kumar <[email protected]>
>> CC: Rafael J. Wysocki <[email protected]>
>> CC: Peter Zijlstra <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: Patrick Bellasi <[email protected]>
>> CC: Juri Lelli <[email protected]>
>> Cc: Luca Abeni <[email protected]>
>> CC: Joel Fernandes <[email protected]>
>> CC: Todd Kjos <[email protected]>
>> CC: [email protected]
>> CC: [email protected]
>> CC: [email protected]
>> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>> ---
>> v1 -> v2: Minor style related changes.
>>
>> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
>> 1 file changed, 26 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c
>> b/kernel/sched/cpufreq_schedutil.c
>> index e13df951aca7..5c482ec38610 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy
>> *sg_policy, u64 time)
>> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> return false;
>>
>> - if (sg_policy->work_in_progress)
>> - return false;
>> -
>> if (unlikely(sg_policy->need_freq_update)) {
>> sg_policy->need_freq_update = false;
>> /*
>> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy
>> *sg_policy, u64 time,
>>
>> policy->cur = next_freq;
>> trace_cpu_frequency(next_freq, smp_processor_id());
>> - } else {
>> + } else if (!sg_policy->work_in_progress) {
>
>
> Not really something you added, but if you are modifying it:
> Do we really need this work_in_progress flag? irq_work_queue() already
> checks if the work is pending and then returns true/false.
>
> Wouldn't the issue you are trying to fix be resolved just by dropping this
> flag check entirely?
You've missed the entire discussion on that several days ago, sorry.
Thanks,
Rafael
On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> From: "Joel Fernandes (Google)" <[email protected]>
>
> Currently there is a chance of a schedutil cpufreq update request to be
> dropped if there is a pending update request. This pending request can
> be delayed if there is a scheduling delay of the irq_work and the wake
> up of the schedutil governor kthread.
>
> A very bad scenario is when a schedutil request was already just made,
> such as to reduce the CPU frequency, then a newer request to increase
> CPU frequency (even sched deadline urgent frequency increase requests)
> can be dropped, even though the rate limits suggest that its Ok to
> process a request. This is because of the way the work_in_progress flag
> is used.
>
> This patch improves the situation by allowing new requests to happen
> even though the old one is still being processed. Note that in this
> approach, if an irq_work was already issued, we just update next_freq
> and don't bother to queue another request so there's no extra work being
> done to make this happen.
Now that this isn't an RFC anymore, you shouldn't have added below
paragraph here. It could go to the comments section though.
> I had brought up this issue at the OSPM conference and Claudio had a
> discussion RFC with an alternate approach [1]. I prefer the approach as
> done in the patch below since it doesn't need any new flags and doesn't
> cause any other extra overhead.
>
> [1] https://patchwork.kernel.org/patch/10384261/
>
> LGTMed-by: Viresh Kumar <[email protected]>
> LGTMed-by: Juri Lelli <[email protected]>
Looks like a Tag you just invented ? :)
> CC: Viresh Kumar <[email protected]>
> CC: Rafael J. Wysocki <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Patrick Bellasi <[email protected]>
> CC: Juri Lelli <[email protected]>
> Cc: Luca Abeni <[email protected]>
> CC: Joel Fernandes <[email protected]>
> CC: Todd Kjos <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> v1 -> v2: Minor style related changes.
>
> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
> 1 file changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e13df951aca7..5c482ec38610 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> return false;
>
> - if (sg_policy->work_in_progress)
> - return false;
> -
> if (unlikely(sg_policy->need_freq_update)) {
> sg_policy->need_freq_update = false;
> /*
> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>
> policy->cur = next_freq;
> trace_cpu_frequency(next_freq, smp_processor_id());
> - } else {
> + } else if (!sg_policy->work_in_progress) {
> sg_policy->work_in_progress = true;
> irq_work_queue(&sg_policy->irq_work);
> }
> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + /*
> + * For slow-switch systems, single policy requests can't run at the
> + * moment if update is in progress, unless we acquire update_lock.
> + */
> + if (sg_policy->work_in_progress)
> + return;
> +
I would still want this to go away :)
@Rafael, will it be fine to get locking in place for unshared policy
platforms ?
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> static void sugov_work(struct kthread_work *work)
> {
> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> + unsigned int freq;
> + unsigned long flags;
> +
> + /*
> + * Hold sg_policy->update_lock shortly to handle the case where:
> + * incase sg_policy->next_freq is read here, and then updated by
> + * sugov_update_shared just before work_in_progress is set to false
> + * here, we may miss queueing the new update.
> + *
> + * Note: If a work was queued after the update_lock is released,
> + * sugov_work will just be called again by kthread_work code; and the
> + * request will be proceed before the sugov thread sleeps.
> + */
> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> + freq = sg_policy->next_freq;
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> mutex_lock(&sg_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> - CPUFREQ_RELATION_L);
> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> mutex_unlock(&sg_policy->work_lock);
> -
> - sg_policy->work_in_progress = false;
> }
>
> static void sugov_irq_work(struct irq_work *irq_work)
Fix the commit log and you can add my
Acked-by: Viresh Kumar <[email protected]>
--
viresh
On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>> From: "Joel Fernandes (Google)" <[email protected]>
>>
>> Currently there is a chance of a schedutil cpufreq update request to be
>> dropped if there is a pending update request. This pending request can
>> be delayed if there is a scheduling delay of the irq_work and the wake
>> up of the schedutil governor kthread.
>>
>> A very bad scenario is when a schedutil request was already just made,
>> such as to reduce the CPU frequency, then a newer request to increase
>> CPU frequency (even sched deadline urgent frequency increase requests)
>> can be dropped, even though the rate limits suggest that its Ok to
>> process a request. This is because of the way the work_in_progress flag
>> is used.
>>
>> This patch improves the situation by allowing new requests to happen
>> even though the old one is still being processed. Note that in this
>> approach, if an irq_work was already issued, we just update next_freq
>> and don't bother to queue another request so there's no extra work being
>> done to make this happen.
>
> Now that this isn't an RFC anymore, you shouldn't have added below
> paragraph here. It could go to the comments section though.
>
>> I had brought up this issue at the OSPM conference and Claudio had a
>> discussion RFC with an alternate approach [1]. I prefer the approach as
>> done in the patch below since it doesn't need any new flags and doesn't
>> cause any other extra overhead.
>>
>> [1] https://patchwork.kernel.org/patch/10384261/
>>
>> LGTMed-by: Viresh Kumar <[email protected]>
>> LGTMed-by: Juri Lelli <[email protected]>
>
> Looks like a Tag you just invented ? :)
Yeah.
The LGTM from Juri can be converned into an ACK silently IMO. That
said I have added Looks-good-to: tags to a couple of commits. :-)
>> CC: Viresh Kumar <[email protected]>
>> CC: Rafael J. Wysocki <[email protected]>
>> CC: Peter Zijlstra <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: Patrick Bellasi <[email protected]>
>> CC: Juri Lelli <[email protected]>
>> Cc: Luca Abeni <[email protected]>
>> CC: Joel Fernandes <[email protected]>
>> CC: Todd Kjos <[email protected]>
>> CC: [email protected]
>> CC: [email protected]
>> CC: [email protected]
>> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>> ---
>> v1 -> v2: Minor style related changes.
>>
>> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
>> 1 file changed, 26 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index e13df951aca7..5c482ec38610 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> return false;
>>
>> - if (sg_policy->work_in_progress)
>> - return false;
>> -
>> if (unlikely(sg_policy->need_freq_update)) {
>> sg_policy->need_freq_update = false;
>> /*
>> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>>
>> policy->cur = next_freq;
>> trace_cpu_frequency(next_freq, smp_processor_id());
>> - } else {
>> + } else if (!sg_policy->work_in_progress) {
>> sg_policy->work_in_progress = true;
>> irq_work_queue(&sg_policy->irq_work);
>> }
>> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>>
>> ignore_dl_rate_limit(sg_cpu, sg_policy);
>>
>> + /*
>> + * For slow-switch systems, single policy requests can't run at the
>> + * moment if update is in progress, unless we acquire update_lock.
>> + */
>> + if (sg_policy->work_in_progress)
>> + return;
>> +
>
> I would still want this to go away :)
>
> @Rafael, will it be fine to get locking in place for unshared policy
> platforms ?
As long as it doesn't affect the fast switch path in any way.
>
>> if (!sugov_should_update_freq(sg_policy, time))
>> return;
>>
>> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>> static void sugov_work(struct kthread_work *work)
>> {
>> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
>> + unsigned int freq;
>> + unsigned long flags;
>> +
>> + /*
>> + * Hold sg_policy->update_lock shortly to handle the case where:
>> + * incase sg_policy->next_freq is read here, and then updated by
>> + * sugov_update_shared just before work_in_progress is set to false
>> + * here, we may miss queueing the new update.
>> + *
>> + * Note: If a work was queued after the update_lock is released,
>> + * sugov_work will just be called again by kthread_work code; and the
>> + * request will be proceed before the sugov thread sleeps.
>> + */
>> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
>> + freq = sg_policy->next_freq;
>> + sg_policy->work_in_progress = false;
>> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>>
>> mutex_lock(&sg_policy->work_lock);
>> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
>> - CPUFREQ_RELATION_L);
>> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
>> mutex_unlock(&sg_policy->work_lock);
>> -
>> - sg_policy->work_in_progress = false;
>> }
>>
>> static void sugov_irq_work(struct irq_work *irq_work)
>
> Fix the commit log and you can add my
I can fix it up.
> Acked-by: Viresh Kumar <[email protected]>
Thanks,
Rafael
On 21/05/18 10:29, Rafael J. Wysocki wrote:
> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> >> From: "Joel Fernandes (Google)" <[email protected]>
> >>
> >> Currently there is a chance of a schedutil cpufreq update request to be
> >> dropped if there is a pending update request. This pending request can
> >> be delayed if there is a scheduling delay of the irq_work and the wake
> >> up of the schedutil governor kthread.
> >>
> >> A very bad scenario is when a schedutil request was already just made,
> >> such as to reduce the CPU frequency, then a newer request to increase
> >> CPU frequency (even sched deadline urgent frequency increase requests)
> >> can be dropped, even though the rate limits suggest that its Ok to
> >> process a request. This is because of the way the work_in_progress flag
> >> is used.
> >>
> >> This patch improves the situation by allowing new requests to happen
> >> even though the old one is still being processed. Note that in this
> >> approach, if an irq_work was already issued, we just update next_freq
> >> and don't bother to queue another request so there's no extra work being
> >> done to make this happen.
> >
> > Now that this isn't an RFC anymore, you shouldn't have added below
> > paragraph here. It could go to the comments section though.
> >
> >> I had brought up this issue at the OSPM conference and Claudio had a
> >> discussion RFC with an alternate approach [1]. I prefer the approach as
> >> done in the patch below since it doesn't need any new flags and doesn't
> >> cause any other extra overhead.
> >>
> >> [1] https://patchwork.kernel.org/patch/10384261/
> >>
> >> LGTMed-by: Viresh Kumar <[email protected]>
> >> LGTMed-by: Juri Lelli <[email protected]>
> >
> > Looks like a Tag you just invented ? :)
>
> Yeah.
>
> The LGTM from Juri can be converned into an ACK silently IMO. That
Sure! :)
Thanks,
- Juri
On 18-May 11:55, Joel Fernandes (Google.) wrote:
> From: "Joel Fernandes (Google)" <[email protected]>
>
> Currently there is a chance of a schedutil cpufreq update request to be
> dropped if there is a pending update request. This pending request can
> be delayed if there is a scheduling delay of the irq_work and the wake
> up of the schedutil governor kthread.
>
> A very bad scenario is when a schedutil request was already just made,
> such as to reduce the CPU frequency, then a newer request to increase
> CPU frequency (even sched deadline urgent frequency increase requests)
> can be dropped, even though the rate limits suggest that its Ok to
> process a request. This is because of the way the work_in_progress flag
> is used.
>
> This patch improves the situation by allowing new requests to happen
> even though the old one is still being processed. Note that in this
> approach, if an irq_work was already issued, we just update next_freq
> and don't bother to queue another request so there's no extra work being
> done to make this happen.
Maybe I'm missing something but... is not this patch just a partial
mitigation of the issue you descrive above?
If a DL freq increase is queued, with this patch we store the request
but we don't actually increase the frequency until the next schedutil
update, which can be one tick away... isn't it?
If that's the case, maybe something like the following can complete
the cure?
---8<---
#define SUGOV_FREQ_NONE 0
static unsigned int sugov_work_update(struct sugov_policy *sg_policy,
unsigned int prev_freq)
{
unsigned long irq_flags;
bool update_freq = true;
unsigned int next_freq;
/*
* Hold sg_policy->update_lock shortly to handle the case where:
* incase sg_policy->next_freq is read here, and then updated by
* sugov_update_shared just before work_in_progress is set to false
* here, we may miss queueing the new update.
*
* Note: If a work was queued after the update_lock is released,
* sugov_work will just be called again by kthread_work code; and the
* request will be proceed before the sugov thread sleeps.
*/
raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags);
next_freq = sg_policy->next_freq;
sg_policy->work_in_progress = false;
if (prev_freq == next_freq)
update_freq = false;
raw_spin_unlock_irqrestore(&sg_policy->update_lock, irq_flags);
/*
* Update the frequency only if it has changed since the last call.
*/
if (update_freq) {
mutex_lock(&sg_policy->work_lock);
__cpufreq_driver_target(sg_policy->policy, next_freq, CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
return next_freq;
}
return SUGOV_FREQ_NONE;
}
static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
unsigned int prev_freq = 0;
/*
* Keep updating the frequency until we end up with a frequency which
* satisfies the most recent request we got meanwhile.
*/
do {
prev_freq = sugov_work_update(sg_policy, prev_freq);
} while (prev_freq != SUGOV_FREQ_NONE);
}
---8<---
--
#include <best/regards.h>
Patrick Bellasi
On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > From: "Joel Fernandes (Google)" <[email protected]>
> >
> > Currently there is a chance of a schedutil cpufreq update request to be
> > dropped if there is a pending update request. This pending request can
> > be delayed if there is a scheduling delay of the irq_work and the wake
> > up of the schedutil governor kthread.
> >
> > A very bad scenario is when a schedutil request was already just made,
> > such as to reduce the CPU frequency, then a newer request to increase
> > CPU frequency (even sched deadline urgent frequency increase requests)
> > can be dropped, even though the rate limits suggest that its Ok to
> > process a request. This is because of the way the work_in_progress flag
> > is used.
> >
> > This patch improves the situation by allowing new requests to happen
> > even though the old one is still being processed. Note that in this
> > approach, if an irq_work was already issued, we just update next_freq
> > and don't bother to queue another request so there's no extra work being
> > done to make this happen.
>
> Maybe I'm missing something but... is not this patch just a partial
> mitigation of the issue you descrive above?
>
> If a DL freq increase is queued, with this patch we store the request
> but we don't actually increase the frequency until the next schedutil
> update, which can be one tick away... isn't it?
>
> If that's the case, maybe something like the following can complete
> the cure?
We already discussed this and thought of this case, I think you missed a
previous thread [1]. The outer loop in the kthread_work subsystem will take
care of calling sugov_work again incase another request was queued which we
happen to miss. So I don't think more complexity is needed to handle the case
you're bringing up.
thanks!
- Joel
[1] https://lkml.org/lkml/2018/5/17/668
On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> >> From: "Joel Fernandes (Google)" <[email protected]>
> >>
> >> Currently there is a chance of a schedutil cpufreq update request to be
> >> dropped if there is a pending update request. This pending request can
> >> be delayed if there is a scheduling delay of the irq_work and the wake
> >> up of the schedutil governor kthread.
> >>
> >> A very bad scenario is when a schedutil request was already just made,
> >> such as to reduce the CPU frequency, then a newer request to increase
> >> CPU frequency (even sched deadline urgent frequency increase requests)
> >> can be dropped, even though the rate limits suggest that its Ok to
> >> process a request. This is because of the way the work_in_progress flag
> >> is used.
> >>
> >> This patch improves the situation by allowing new requests to happen
> >> even though the old one is still being processed. Note that in this
> >> approach, if an irq_work was already issued, we just update next_freq
> >> and don't bother to queue another request so there's no extra work being
> >> done to make this happen.
> >
> > Now that this isn't an RFC anymore, you shouldn't have added below
> > paragraph here. It could go to the comments section though.
> >
> >> I had brought up this issue at the OSPM conference and Claudio had a
> >> discussion RFC with an alternate approach [1]. I prefer the approach as
> >> done in the patch below since it doesn't need any new flags and doesn't
> >> cause any other extra overhead.
> >>
> >> [1] https://patchwork.kernel.org/patch/10384261/
> >>
> >> LGTMed-by: Viresh Kumar <[email protected]>
> >> LGTMed-by: Juri Lelli <[email protected]>
> >
> > Looks like a Tag you just invented ? :)
>
> Yeah.
>
> The LGTM from Juri can be converned into an ACK silently IMO. That
> said I have added Looks-good-to: tags to a couple of commits. :-)
Cool, I'll covert them to Acks :-)
[..]
> >> v1 -> v2: Minor style related changes.
> >>
> >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
> >> 1 file changed, 26 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> >> index e13df951aca7..5c482ec38610 100644
> >> --- a/kernel/sched/cpufreq_schedutil.c
> >> +++ b/kernel/sched/cpufreq_schedutil.c
> >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> >> return false;
> >>
> >> - if (sg_policy->work_in_progress)
> >> - return false;
> >> -
> >> if (unlikely(sg_policy->need_freq_update)) {
> >> sg_policy->need_freq_update = false;
> >> /*
> >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> >>
> >> policy->cur = next_freq;
> >> trace_cpu_frequency(next_freq, smp_processor_id());
> >> - } else {
> >> + } else if (!sg_policy->work_in_progress) {
> >> sg_policy->work_in_progress = true;
> >> irq_work_queue(&sg_policy->irq_work);
> >> }
> >> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >>
> >> ignore_dl_rate_limit(sg_cpu, sg_policy);
> >>
> >> + /*
> >> + * For slow-switch systems, single policy requests can't run at the
> >> + * moment if update is in progress, unless we acquire update_lock.
> >> + */
> >> + if (sg_policy->work_in_progress)
> >> + return;
> >> +
> >
> > I would still want this to go away :)
> >
> > @Rafael, will it be fine to get locking in place for unshared policy
> > platforms ?
>
> As long as it doesn't affect the fast switch path in any way.
I just realized that on a single policy switch that uses the governor thread,
there will be 1 thread per-CPU. The sugov_update_single will be called on the
same CPU with interrupts disabled. In sugov_work, we are doing a
raw_spin_lock_irqsave which also disables interrupts. So I don't think
there's any possibility of a race happening on the same CPU between the
frequency update request and the sugov_work executing. In other words, I feel
we can drop the above if (..) statement for single policies completely and
only keep the changes for the shared policy. Viresh since you brought up the
single policy issue initially which made me add this if statememnt, could you
let me know if you agree with what I just said?
> >> if (!sugov_should_update_freq(sg_policy, time))
> >> return;
> >>
> >> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> >> static void sugov_work(struct kthread_work *work)
> >> {
> >> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> >> + unsigned int freq;
> >> + unsigned long flags;
> >> +
> >> + /*
> >> + * Hold sg_policy->update_lock shortly to handle the case where:
> >> + * incase sg_policy->next_freq is read here, and then updated by
> >> + * sugov_update_shared just before work_in_progress is set to false
> >> + * here, we may miss queueing the new update.
> >> + *
> >> + * Note: If a work was queued after the update_lock is released,
> >> + * sugov_work will just be called again by kthread_work code; and the
> >> + * request will be proceed before the sugov thread sleeps.
> >> + */
> >> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> >> + freq = sg_policy->next_freq;
> >> + sg_policy->work_in_progress = false;
> >> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >>
> >> mutex_lock(&sg_policy->work_lock);
> >> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> >> - CPUFREQ_RELATION_L);
> >> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> >> mutex_unlock(&sg_policy->work_lock);
> >> -
> >> - sg_policy->work_in_progress = false;
> >> }
> >>
> >> static void sugov_irq_work(struct irq_work *irq_work)
> >
> > Fix the commit log and you can add my
>
> I can fix it up.
>
> > Acked-by: Viresh Kumar <[email protected]>
Thanks!
- Joel
On 21-May 08:49, Joel Fernandes wrote:
> On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> > On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > > From: "Joel Fernandes (Google)" <[email protected]>
> > >
> > > Currently there is a chance of a schedutil cpufreq update request to be
> > > dropped if there is a pending update request. This pending request can
> > > be delayed if there is a scheduling delay of the irq_work and the wake
> > > up of the schedutil governor kthread.
> > >
> > > A very bad scenario is when a schedutil request was already just made,
> > > such as to reduce the CPU frequency, then a newer request to increase
> > > CPU frequency (even sched deadline urgent frequency increase requests)
> > > can be dropped, even though the rate limits suggest that its Ok to
> > > process a request. This is because of the way the work_in_progress flag
> > > is used.
> > >
> > > This patch improves the situation by allowing new requests to happen
> > > even though the old one is still being processed. Note that in this
> > > approach, if an irq_work was already issued, we just update next_freq
> > > and don't bother to queue another request so there's no extra work being
> > > done to make this happen.
> >
> > Maybe I'm missing something but... is not this patch just a partial
> > mitigation of the issue you descrive above?
> >
> > If a DL freq increase is queued, with this patch we store the request
> > but we don't actually increase the frequency until the next schedutil
> > update, which can be one tick away... isn't it?
> >
> > If that's the case, maybe something like the following can complete
> > the cure?
>
> We already discussed this and thought of this case, I think you missed a
> previous thread [1]. The outer loop in the kthread_work subsystem will take
> care of calling sugov_work again incase another request was queued which we
> happen to miss.
Ok, I missed that thread... my bad.
However, [1] made me noticing that your solution works under the
assumption that we keep queuing a new kworker job for each request we
get, isn't it?
If that's the case, this means that if, for example, during a
frequency switch you get a request to reduce the frequency (e.g.
deadline task passing the 0-lag time) and right after a request to
increase the frequency (e.g. the current FAIR task tick)... you will
enqueue a freq drop followed by a freq increase and actually do two
frequency hops?
> So I don't think more complexity is needed to handle the case
> you're bringing up.
>
> thanks!
>
> - Joel
>
> [1] https://lkml.org/lkml/2018/5/17/668
>
--
#include <best/regards.h>
Patrick Bellasi
Hi Patrick,
On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote:
> On 21-May 08:49, Joel Fernandes wrote:
> > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> > > On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > > > From: "Joel Fernandes (Google)" <[email protected]>
> > > >
> > > > Currently there is a chance of a schedutil cpufreq update request to be
> > > > dropped if there is a pending update request. This pending request can
> > > > be delayed if there is a scheduling delay of the irq_work and the wake
> > > > up of the schedutil governor kthread.
> > > >
> > > > A very bad scenario is when a schedutil request was already just made,
> > > > such as to reduce the CPU frequency, then a newer request to increase
> > > > CPU frequency (even sched deadline urgent frequency increase requests)
> > > > can be dropped, even though the rate limits suggest that its Ok to
> > > > process a request. This is because of the way the work_in_progress flag
> > > > is used.
> > > >
> > > > This patch improves the situation by allowing new requests to happen
> > > > even though the old one is still being processed. Note that in this
> > > > approach, if an irq_work was already issued, we just update next_freq
> > > > and don't bother to queue another request so there's no extra work being
> > > > done to make this happen.
> > >
> > > Maybe I'm missing something but... is not this patch just a partial
> > > mitigation of the issue you descrive above?
> > >
> > > If a DL freq increase is queued, with this patch we store the request
> > > but we don't actually increase the frequency until the next schedutil
> > > update, which can be one tick away... isn't it?
> > >
> > > If that's the case, maybe something like the following can complete
> > > the cure?
> >
> > We already discussed this and thought of this case, I think you missed a
> > previous thread [1]. The outer loop in the kthread_work subsystem will take
> > care of calling sugov_work again incase another request was queued which we
> > happen to miss.
>
> Ok, I missed that thread... my bad.
Sure no problem, sorry I was just pointing out the thread, not blaming you
for not reading it ;)
> However, [1] made me noticing that your solution works under the
> assumption that we keep queuing a new kworker job for each request we
> get, isn't it?
Not at each request, but each request after work_in_progress was cleared by the
sugov_work. Any requests that happen between work_in_progress is set and
cleared only result in updating of the next_freq.
> If that's the case, this means that if, for example, during a
> frequency switch you get a request to reduce the frequency (e.g.
> deadline task passing the 0-lag time) and right after a request to
> increase the frequency (e.g. the current FAIR task tick)... you will
> enqueue a freq drop followed by a freq increase and actually do two
> frequency hops?
Yes possibly, I see your point but I'm not sure if the tight loop around that
is worth the complexity, or atleast is within the scope of my patch. Perhaps
the problem you describe can be looked at as a future enhancement?
thanks,
- Joel
On 21-May 10:20, Joel Fernandes wrote:
> Hi Patrick,
>
> On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote:
> > On 21-May 08:49, Joel Fernandes wrote:
> > > On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> > > > On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > > > > From: "Joel Fernandes (Google)" <[email protected]>
> > > > >
> > > > > Currently there is a chance of a schedutil cpufreq update request to be
> > > > > dropped if there is a pending update request. This pending request can
> > > > > be delayed if there is a scheduling delay of the irq_work and the wake
> > > > > up of the schedutil governor kthread.
> > > > >
> > > > > A very bad scenario is when a schedutil request was already just made,
> > > > > such as to reduce the CPU frequency, then a newer request to increase
> > > > > CPU frequency (even sched deadline urgent frequency increase requests)
> > > > > can be dropped, even though the rate limits suggest that its Ok to
> > > > > process a request. This is because of the way the work_in_progress flag
> > > > > is used.
> > > > >
> > > > > This patch improves the situation by allowing new requests to happen
> > > > > even though the old one is still being processed. Note that in this
> > > > > approach, if an irq_work was already issued, we just update next_freq
> > > > > and don't bother to queue another request so there's no extra work being
> > > > > done to make this happen.
> > > >
> > > > Maybe I'm missing something but... is not this patch just a partial
> > > > mitigation of the issue you descrive above?
> > > >
> > > > If a DL freq increase is queued, with this patch we store the request
> > > > but we don't actually increase the frequency until the next schedutil
> > > > update, which can be one tick away... isn't it?
> > > >
> > > > If that's the case, maybe something like the following can complete
> > > > the cure?
> > >
> > > We already discussed this and thought of this case, I think you missed a
> > > previous thread [1]. The outer loop in the kthread_work subsystem will take
> > > care of calling sugov_work again incase another request was queued which we
> > > happen to miss.
> >
> > Ok, I missed that thread... my bad.
>
> Sure no problem, sorry I was just pointing out the thread, not blaming you
> for not reading it ;)
Sure, np here too ;)
> > However, [1] made me noticing that your solution works under the
> > assumption that we keep queuing a new kworker job for each request we
> > get, isn't it?
>
> Not at each request, but each request after work_in_progress was cleared by the
> sugov_work. Any requests that happen between work_in_progress is set and
> cleared only result in updating of the next_freq.
I see, so we enqueue for the time of:
mutex_lock(&sg_policy->work_lock);
__cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
> > If that's the case, this means that if, for example, during a
> > frequency switch you get a request to reduce the frequency (e.g.
> > deadline task passing the 0-lag time) and right after a request to
> > increase the frequency (e.g. the current FAIR task tick)... you will
> > enqueue a freq drop followed by a freq increase and actually do two
> > frequency hops?
>
> Yes possibly,
Not sure about the time window above, I can try to get some
measurements tomorrow.
> I see your point but I'm not sure if the tight loop around that
> is worth the complexity, or atleast is within the scope of my patch.
> Perhaps the problem you describe can be looked at as a future enhancement?
Sure, I already have it as a patch on top of your. I can post it
afterwards and we can discuss whether it makes sense or not.
Still have to better check, but I think that maybe we can skip
the queueing altogether if some work is already pending... in case we
wanna go for a dedicated inner loop like the one I was proposing.
Apart that, I think that your patch is already fixing 90% of the
issue we have now.
> thanks,
>
> - Joel
--
#include <best/regards.h>
Patrick Bellasi
On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > From: "Joel Fernandes (Google)" <[email protected]>
> >
> > Currently there is a chance of a schedutil cpufreq update request to be
> > dropped if there is a pending update request. This pending request can
> > be delayed if there is a scheduling delay of the irq_work and the wake
> > up of the schedutil governor kthread.
> >
> > A very bad scenario is when a schedutil request was already just made,
> > such as to reduce the CPU frequency, then a newer request to increase
> > CPU frequency (even sched deadline urgent frequency increase requests)
> > can be dropped, even though the rate limits suggest that its Ok to
> > process a request. This is because of the way the work_in_progress flag
> > is used.
> >
> > This patch improves the situation by allowing new requests to happen
> > even though the old one is still being processed. Note that in this
> > approach, if an irq_work was already issued, we just update next_freq
> > and don't bother to queue another request so there's no extra work being
> > done to make this happen.
>
> Maybe I'm missing something but... is not this patch just a partial
> mitigation of the issue you descrive above?
>
> If a DL freq increase is queued, with this patch we store the request
> but we don't actually increase the frequency until the next schedutil
> update, which can be one tick away... isn't it?
>
> If that's the case, maybe something like the following can complete
> the cure?
>
> ---8<---
> #define SUGOV_FREQ_NONE 0
>
> static unsigned int sugov_work_update(struct sugov_policy *sg_policy,
> unsigned int prev_freq)
> {
> unsigned long irq_flags;
> bool update_freq = true;
> unsigned int next_freq;
>
> /*
> * Hold sg_policy->update_lock shortly to handle the case where:
> * incase sg_policy->next_freq is read here, and then updated by
> * sugov_update_shared just before work_in_progress is set to false
> * here, we may miss queueing the new update.
> *
> * Note: If a work was queued after the update_lock is released,
> * sugov_work will just be called again by kthread_work code; and the
> * request will be proceed before the sugov thread sleeps.
> */
> raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags);
> next_freq = sg_policy->next_freq;
> sg_policy->work_in_progress = false;
> if (prev_freq == next_freq)
> update_freq = false;
About this patch on top of mine, I believe this check is already being done
by sugov_update_commit? :
static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
struct cpufreq_policy *policy = sg_policy->policy;
if (sg_policy->next_freq == next_freq)
return;
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
----
thanks,
- Joel
On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <[email protected]> wrote:
> On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote:
>> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
>> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>> >> From: "Joel Fernandes (Google)" <[email protected]>
>> >>
>> >> Currently there is a chance of a schedutil cpufreq update request to be
>> >> dropped if there is a pending update request. This pending request can
>> >> be delayed if there is a scheduling delay of the irq_work and the wake
>> >> up of the schedutil governor kthread.
>> >>
>> >> A very bad scenario is when a schedutil request was already just made,
>> >> such as to reduce the CPU frequency, then a newer request to increase
>> >> CPU frequency (even sched deadline urgent frequency increase requests)
>> >> can be dropped, even though the rate limits suggest that its Ok to
>> >> process a request. This is because of the way the work_in_progress flag
>> >> is used.
>> >>
>> >> This patch improves the situation by allowing new requests to happen
>> >> even though the old one is still being processed. Note that in this
>> >> approach, if an irq_work was already issued, we just update next_freq
>> >> and don't bother to queue another request so there's no extra work being
>> >> done to make this happen.
>> >
>> > Now that this isn't an RFC anymore, you shouldn't have added below
>> > paragraph here. It could go to the comments section though.
>> >
>> >> I had brought up this issue at the OSPM conference and Claudio had a
>> >> discussion RFC with an alternate approach [1]. I prefer the approach as
>> >> done in the patch below since it doesn't need any new flags and doesn't
>> >> cause any other extra overhead.
>> >>
>> >> [1] https://patchwork.kernel.org/patch/10384261/
>> >>
>> >> LGTMed-by: Viresh Kumar <[email protected]>
>> >> LGTMed-by: Juri Lelli <[email protected]>
>> >
>> > Looks like a Tag you just invented ? :)
>>
>> Yeah.
>>
>> The LGTM from Juri can be converned into an ACK silently IMO. That
>> said I have added Looks-good-to: tags to a couple of commits. :-)
>
> Cool, I'll covert them to Acks :-)
So it looks like I should expect an update of this patch, right?
Or do you prefer the current one to be applied and work on top of it?
> [..]
>> >> v1 -> v2: Minor style related changes.
>> >>
>> >> kernel/sched/cpufreq_schedutil.c | 34 ++++++++++++++++++++++++--------
>> >> 1 file changed, 26 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> >> index e13df951aca7..5c482ec38610 100644
>> >> --- a/kernel/sched/cpufreq_schedutil.c
>> >> +++ b/kernel/sched/cpufreq_schedutil.c
>> >> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> >> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> >> return false;
>> >>
>> >> - if (sg_policy->work_in_progress)
>> >> - return false;
>> >> -
>> >> if (unlikely(sg_policy->need_freq_update)) {
>> >> sg_policy->need_freq_update = false;
>> >> /*
>> >> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> >>
>> >> policy->cur = next_freq;
>> >> trace_cpu_frequency(next_freq, smp_processor_id());
>> >> - } else {
>> >> + } else if (!sg_policy->work_in_progress) {
>> >> sg_policy->work_in_progress = true;
>> >> irq_work_queue(&sg_policy->irq_work);
>> >> }
>> >> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>> >>
>> >> ignore_dl_rate_limit(sg_cpu, sg_policy);
>> >>
>> >> + /*
>> >> + * For slow-switch systems, single policy requests can't run at the
>> >> + * moment if update is in progress, unless we acquire update_lock.
>> >> + */
>> >> + if (sg_policy->work_in_progress)
>> >> + return;
>> >> +
>> >
>> > I would still want this to go away :)
>> >
>> > @Rafael, will it be fine to get locking in place for unshared policy
>> > platforms ?
>>
>> As long as it doesn't affect the fast switch path in any way.
>
> I just realized that on a single policy switch that uses the governor thread,
> there will be 1 thread per-CPU. The sugov_update_single will be called on the
> same CPU with interrupts disabled.
sugov_update_single() doesn't have to run on the target CPU.
> In sugov_work, we are doing a
> raw_spin_lock_irqsave which also disables interrupts. So I don't think
> there's any possibility of a race happening on the same CPU between the
> frequency update request and the sugov_work executing. In other words, I feel
> we can drop the above if (..) statement for single policies completely and
> only keep the changes for the shared policy. Viresh since you brought up the
> single policy issue initially which made me add this if statememnt, could you
> let me know if you agree with what I just said?
Which is why you need the spinlock too.
On 21-05-18, 10:20, Joel Fernandes wrote:
> On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote:
> > If that's the case, this means that if, for example, during a
> > frequency switch you get a request to reduce the frequency (e.g.
> > deadline task passing the 0-lag time) and right after a request to
> > increase the frequency (e.g. the current FAIR task tick)... you will
> > enqueue a freq drop followed by a freq increase and actually do two
> > frequency hops?
I don't think so.
Consider the kthread as running currently and has just cleared the
work_in_progress flag. Sched update comes at that time and we decide
to reduce the frequency, we queue another work and update next_freq.
Now if another sched update comes before the kthread finishes its
previous loop, we will simply update next_freq and return. So when the
next time kthread runs, it will pick the most recent update.
Where is the problem both of you see ?
--
viresh
On 21-May 11:05, Joel Fernandes wrote:
> On Mon, May 21, 2018 at 11:50:55AM +0100, Patrick Bellasi wrote:
> > On 18-May 11:55, Joel Fernandes (Google.) wrote:
> > > From: "Joel Fernandes (Google)" <[email protected]>
> > >
> > > Currently there is a chance of a schedutil cpufreq update request to be
> > > dropped if there is a pending update request. This pending request can
> > > be delayed if there is a scheduling delay of the irq_work and the wake
> > > up of the schedutil governor kthread.
> > >
> > > A very bad scenario is when a schedutil request was already just made,
> > > such as to reduce the CPU frequency, then a newer request to increase
> > > CPU frequency (even sched deadline urgent frequency increase requests)
> > > can be dropped, even though the rate limits suggest that its Ok to
> > > process a request. This is because of the way the work_in_progress flag
> > > is used.
> > >
> > > This patch improves the situation by allowing new requests to happen
> > > even though the old one is still being processed. Note that in this
> > > approach, if an irq_work was already issued, we just update next_freq
> > > and don't bother to queue another request so there's no extra work being
> > > done to make this happen.
> >
> > Maybe I'm missing something but... is not this patch just a partial
> > mitigation of the issue you descrive above?
> >
> > If a DL freq increase is queued, with this patch we store the request
> > but we don't actually increase the frequency until the next schedutil
> > update, which can be one tick away... isn't it?
> >
> > If that's the case, maybe something like the following can complete
> > the cure?
> >
> > ---8<---
> > #define SUGOV_FREQ_NONE 0
> >
> > static unsigned int sugov_work_update(struct sugov_policy *sg_policy,
> > unsigned int prev_freq)
> > {
> > unsigned long irq_flags;
> > bool update_freq = true;
> > unsigned int next_freq;
> >
> > /*
> > * Hold sg_policy->update_lock shortly to handle the case where:
> > * incase sg_policy->next_freq is read here, and then updated by
> > * sugov_update_shared just before work_in_progress is set to false
> > * here, we may miss queueing the new update.
> > *
> > * Note: If a work was queued after the update_lock is released,
> > * sugov_work will just be called again by kthread_work code; and the
> > * request will be proceed before the sugov thread sleeps.
> > */
> > raw_spin_lock_irqsave(&sg_policy->update_lock, irq_flags);
> > next_freq = sg_policy->next_freq;
> > sg_policy->work_in_progress = false;
> > if (prev_freq == next_freq)
> > update_freq = false;
>
> About this patch on top of mine, I believe this check is already being done
> by sugov_update_commit? :
No, that check is different...
>
> static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> if (sg_policy->next_freq == next_freq)
> return;
>
> sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
> ----
... in my snippet the check is required to verify if, once a freq
swich has been completed by the kthread, the sugov_update_commit has
actually committed a new and different frequency wrt the one the
kthread has just configured.
It means we will have two async paths:
1. sugov_update_commit()
which updates sg_policy->next_freq
2. sugov_work_update()
which will run in a loop until the last freq it configures matches
with the current value of sg_policy->next_freq
But again, as we was discussing yesterday, we can have these
additional bits in a following patch on top of your.
--
#include <best/regards.h>
Patrick Bellasi
Okay, me and Rafael were discussing this patch, locking and races around this.
On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e13df951aca7..5c482ec38610 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> return false;
>
> - if (sg_policy->work_in_progress)
> - return false;
> -
> if (unlikely(sg_policy->need_freq_update)) {
> sg_policy->need_freq_update = false;
> /*
> @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>
> policy->cur = next_freq;
> trace_cpu_frequency(next_freq, smp_processor_id());
> - } else {
> + } else if (!sg_policy->work_in_progress) {
> sg_policy->work_in_progress = true;
> irq_work_queue(&sg_policy->irq_work);
> }
> @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + /*
> + * For slow-switch systems, single policy requests can't run at the
> + * moment if update is in progress, unless we acquire update_lock.
> + */
> + if (sg_policy->work_in_progress)
> + return;
> +
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> static void sugov_work(struct kthread_work *work)
> {
> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> + unsigned int freq;
> + unsigned long flags;
> +
> + /*
> + * Hold sg_policy->update_lock shortly to handle the case where:
> + * incase sg_policy->next_freq is read here, and then updated by
> + * sugov_update_shared just before work_in_progress is set to false
> + * here, we may miss queueing the new update.
> + *
> + * Note: If a work was queued after the update_lock is released,
> + * sugov_work will just be called again by kthread_work code; and the
> + * request will be proceed before the sugov thread sleeps.
> + */
> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> + freq = sg_policy->next_freq;
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> mutex_lock(&sg_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> - CPUFREQ_RELATION_L);
> + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> mutex_unlock(&sg_policy->work_lock);
> -
> - sg_policy->work_in_progress = false;
> }
And I do see a race here for single policy systems doing slow switching.
Kthread Sched update
sugov_work() sugov_update_single()
lock();
// The CPU is free to rearrange below
// two in any order, so it may clear
// the flag first and then read next
// freq. Lets assume it does.
work_in_progress = false
if (work_in_progress)
return;
sg_policy->next_freq = 0;
freq = sg_policy->next_freq;
sg_policy->next_freq = real-next-freq;
unlock();
Is the above theory right or am I day dreaming ? :)
--
viresh
Hi Viresh,
thanks for clarifying...
On 22-May 15:53, Viresh Kumar wrote:
> On 21-05-18, 10:20, Joel Fernandes wrote:
> > On Mon, May 21, 2018 at 06:00:50PM +0100, Patrick Bellasi wrote:
> > > If that's the case, this means that if, for example, during a
> > > frequency switch you get a request to reduce the frequency (e.g.
> > > deadline task passing the 0-lag time) and right after a request to
> > > increase the frequency (e.g. the current FAIR task tick)... you will
> > > enqueue a freq drop followed by a freq increase and actually do two
> > > frequency hops?
>
> I don't think so.
>
> Consider the kthread as running currently and has just cleared the
> work_in_progress flag. Sched update comes at that time and we decide
> to reduce the frequency, we queue another work and update next_freq.
> Now if another sched update comes before the kthread finishes its
> previous loop, we will simply update next_freq and return. So when the
> next time kthread runs, it will pick the most recent update.
Mmm... right... looking better at the two execution contexts:
// A) Frequency update requests
sugov_update_commit() {
sg_policy->next_freq = next_freq;
if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
}
}
// B) Actual frequency updates
sugov_work() {
freq = sg_policy->next_freq;
sg_policy->work_in_progress = false;
__cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
}
It's true that A will enqueue only one B at the first next_freq update
and then it will keep just updating the next_freq.
Thus, we should be ensure to have always just one kwork pending in the
queue.
> Where is the problem both of you see ?
Perhaps the confusion comes just from the naming of
"work_in_progress", which is confusing since we use it now to
represent that we enqueued a frequency change and we wait for the
kwork to pick it up.
Maybe it can help to rename it to something like kwork_queued or
update_pending, update_queued... ?
--
#include <best/regards.h>
Patrick Bellasi
On Tuesday, May 22, 2018 12:50:06 PM CEST Viresh Kumar wrote:
> On 22-05-18, 16:04, Viresh Kumar wrote:
> > Okay, me and Rafael were discussing this patch, locking and races around this.
> >
> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index e13df951aca7..5c482ec38610 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> > > return false;
> > >
> > > - if (sg_policy->work_in_progress)
> > > - return false;
> > > -
> > > if (unlikely(sg_policy->need_freq_update)) {
> > > sg_policy->need_freq_update = false;
> > > /*
> > > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> > >
> > > policy->cur = next_freq;
> > > trace_cpu_frequency(next_freq, smp_processor_id());
> > > - } else {
> > > + } else if (!sg_policy->work_in_progress) {
> > > sg_policy->work_in_progress = true;
> > > irq_work_queue(&sg_policy->irq_work);
> > > }
> > > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > >
> > > ignore_dl_rate_limit(sg_cpu, sg_policy);
> > >
> > > + /*
> > > + * For slow-switch systems, single policy requests can't run at the
> > > + * moment if update is in progress, unless we acquire update_lock.
> > > + */
> > > + if (sg_policy->work_in_progress)
> > > + return;
> > > +
> > > if (!sugov_should_update_freq(sg_policy, time))
> > > return;
> > >
> > > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> > > static void sugov_work(struct kthread_work *work)
> > > {
> > > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > > + unsigned int freq;
> > > + unsigned long flags;
> > > +
> > > + /*
> > > + * Hold sg_policy->update_lock shortly to handle the case where:
> > > + * incase sg_policy->next_freq is read here, and then updated by
> > > + * sugov_update_shared just before work_in_progress is set to false
> > > + * here, we may miss queueing the new update.
> > > + *
> > > + * Note: If a work was queued after the update_lock is released,
> > > + * sugov_work will just be called again by kthread_work code; and the
> > > + * request will be proceed before the sugov thread sleeps.
> > > + */
> > > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > > + freq = sg_policy->next_freq;
> > > + sg_policy->work_in_progress = false;
> > > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> > >
> > > mutex_lock(&sg_policy->work_lock);
> > > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > > - CPUFREQ_RELATION_L);
> > > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> > > mutex_unlock(&sg_policy->work_lock);
> > > -
> > > - sg_policy->work_in_progress = false;
> > > }
> >
> > And I do see a race here for single policy systems doing slow switching.
> >
> > Kthread Sched update
> >
> > sugov_work() sugov_update_single()
> >
> > lock();
> > // The CPU is free to rearrange below
> > // two in any order, so it may clear
> > // the flag first and then read next
> > // freq. Lets assume it does.
> > work_in_progress = false
> >
> > if (work_in_progress)
> > return;
> >
> > sg_policy->next_freq = 0;
> > freq = sg_policy->next_freq;
> > sg_policy->next_freq = real-next-freq;
> > unlock();
> >
> >
> >
> > Is the above theory right or am I day dreaming ? :)
>
> And here comes the ugly fix:
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 35826f4ec43c..1665da31862e 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -283,6 +283,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + if (!policy->fast_switch_enabled)
> + raw_spin_lock(&sg_policy->update_lock);
> +
> /*
> * For slow-switch systems, single policy requests can't run at the
> * moment if update is in progress, unless we acquire update_lock.
> @@ -312,6 +315,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> }
>
> sugov_update_commit(sg_policy, time, next_f);
> +
> + if (!policy->fast_switch_enabled)
> + raw_spin_unlock(&sg_policy->update_lock);
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>
>
>
Ugly indeed.
On 22-05-18, 16:04, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
>
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e13df951aca7..5c482ec38610 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> > return false;
> >
> > - if (sg_policy->work_in_progress)
> > - return false;
> > -
> > if (unlikely(sg_policy->need_freq_update)) {
> > sg_policy->need_freq_update = false;
> > /*
> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> >
> > policy->cur = next_freq;
> > trace_cpu_frequency(next_freq, smp_processor_id());
> > - } else {
> > + } else if (!sg_policy->work_in_progress) {
> > sg_policy->work_in_progress = true;
> > irq_work_queue(&sg_policy->irq_work);
> > }
> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >
> > ignore_dl_rate_limit(sg_cpu, sg_policy);
> >
> > + /*
> > + * For slow-switch systems, single policy requests can't run at the
> > + * moment if update is in progress, unless we acquire update_lock.
> > + */
> > + if (sg_policy->work_in_progress)
> > + return;
> > +
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> > static void sugov_work(struct kthread_work *work)
> > {
> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > + unsigned int freq;
> > + unsigned long flags;
> > +
> > + /*
> > + * Hold sg_policy->update_lock shortly to handle the case where:
> > + * incase sg_policy->next_freq is read here, and then updated by
> > + * sugov_update_shared just before work_in_progress is set to false
> > + * here, we may miss queueing the new update.
> > + *
> > + * Note: If a work was queued after the update_lock is released,
> > + * sugov_work will just be called again by kthread_work code; and the
> > + * request will be proceed before the sugov thread sleeps.
> > + */
> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > + freq = sg_policy->next_freq;
> > + sg_policy->work_in_progress = false;
> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >
> > mutex_lock(&sg_policy->work_lock);
> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > - CPUFREQ_RELATION_L);
> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> > mutex_unlock(&sg_policy->work_lock);
> > -
> > - sg_policy->work_in_progress = false;
> > }
>
> And I do see a race here for single policy systems doing slow switching.
>
> Kthread Sched update
>
> sugov_work() sugov_update_single()
>
> lock();
> // The CPU is free to rearrange below
> // two in any order, so it may clear
> // the flag first and then read next
> // freq. Lets assume it does.
> work_in_progress = false
>
> if (work_in_progress)
> return;
>
> sg_policy->next_freq = 0;
> freq = sg_policy->next_freq;
> sg_policy->next_freq = real-next-freq;
> unlock();
>
>
>
> Is the above theory right or am I day dreaming ? :)
And here comes the ugly fix:
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 35826f4ec43c..1665da31862e 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -283,6 +283,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
ignore_dl_rate_limit(sg_cpu, sg_policy);
+ if (!policy->fast_switch_enabled)
+ raw_spin_lock(&sg_policy->update_lock);
+
/*
* For slow-switch systems, single policy requests can't run at the
* moment if update is in progress, unless we acquire update_lock.
@@ -312,6 +315,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
}
sugov_update_commit(sg_policy, time, next_f);
+
+ if (!policy->fast_switch_enabled)
+ raw_spin_unlock(&sg_policy->update_lock);
}
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
--
viresh
On 22-May 16:04, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
>
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> > static void sugov_work(struct kthread_work *work)
> > {
> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > + unsigned int freq;
> > + unsigned long flags;
> > +
> > + /*
> > + * Hold sg_policy->update_lock shortly to handle the case where:
> > + * incase sg_policy->next_freq is read here, and then updated by
> > + * sugov_update_shared just before work_in_progress is set to false
> > + * here, we may miss queueing the new update.
> > + *
> > + * Note: If a work was queued after the update_lock is released,
> > + * sugov_work will just be called again by kthread_work code; and the
> > + * request will be proceed before the sugov thread sleeps.
> > + */
> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > + freq = sg_policy->next_freq;
> > + sg_policy->work_in_progress = false;
> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >
> > mutex_lock(&sg_policy->work_lock);
> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > - CPUFREQ_RELATION_L);
> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> > mutex_unlock(&sg_policy->work_lock);
> > -
> > - sg_policy->work_in_progress = false;
> > }
>
> And I do see a race here for single policy systems doing slow switching.
>
> Kthread Sched update
>
> sugov_work() sugov_update_single()
>
> lock();
> // The CPU is free to rearrange below
> // two in any order, so it may clear
> // the flag first and then read next
> // freq. Lets assume it does.
> work_in_progress = false
>
> if (work_in_progress)
> return;
>
> sg_policy->next_freq = 0;
> freq = sg_policy->next_freq;
> sg_policy->next_freq = real-next-freq;
> unlock();
>
>
>
> Is the above theory right or am I day dreaming ? :)
It could happen, but using:
raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
freq = READ_ONCE(sg_policy->next_freq)
WRITE_ONCE(sg_policy->work_in_progress, false);
raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
if (!READ_ONCE(sg_policy->work_in_progress)) {
WRITE_ONCE(sg_policy->work_in_progress, true);
irq_work_queue(&sg_policy->irq_work);
}
should fix it by enforcing the ordering as well as documenting the
concurrent access.
However, in the "sched update" side, where do we have the sequence:
sg_policy->next_freq = 0;
sg_policy->next_freq = real-next-freq;
AFAICS we always use locals for next_freq and do one single assignment
in sugov_update_commit(), isn't it?
--
#include <best/regards.h>
Patrick Bellasi
On 22-05-18, 12:50, Rafael J. Wysocki wrote:
> Ugly indeed.
Hehe. I was thinking, maybe we can write wrapper helpers around lock/unlock
which are stored as pointers in sg_policy. So that those are only set to
non-NULL values (or non-Noop routines) for slow-switching single policy or
any-switching shared policy systems. Then we can get rid of such conditional
locking attempts :)
--
viresh
On 22-05-18, 11:51, Patrick Bellasi wrote:
> It could happen, but using:
>
> raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> freq = READ_ONCE(sg_policy->next_freq)
> WRITE_ONCE(sg_policy->work_in_progress, false);
> raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> if (!READ_ONCE(sg_policy->work_in_progress)) {
> WRITE_ONCE(sg_policy->work_in_progress, true);
> irq_work_queue(&sg_policy->irq_work);
> }
I think its better to get locking in place for non-fast switching case in
single-policy systems right now.
> should fix it by enforcing the ordering as well as documenting the
> concurrent access.
>
> However, in the "sched update" side, where do we have the sequence:
>
> sg_policy->next_freq = 0;
> sg_policy->next_freq = real-next-freq;
Ah, that was just an example of what a compiler may do (though it shouldn't do).
--
viresh
On Tuesday, May 22, 2018 12:02:24 PM CEST Rafael J. Wysocki wrote:
> On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <[email protected]> wrote:
> > On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote:
> >> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
> >> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> >> >> From: "Joel Fernandes (Google)" <[email protected]>
> >> >>
> >> >> Currently there is a chance of a schedutil cpufreq update request to be
> >> >> dropped if there is a pending update request. This pending request can
> >> >> be delayed if there is a scheduling delay of the irq_work and the wake
> >> >> up of the schedutil governor kthread.
> >> >>
> >> >> A very bad scenario is when a schedutil request was already just made,
> >> >> such as to reduce the CPU frequency, then a newer request to increase
> >> >> CPU frequency (even sched deadline urgent frequency increase requests)
> >> >> can be dropped, even though the rate limits suggest that its Ok to
> >> >> process a request. This is because of the way the work_in_progress flag
> >> >> is used.
> >> >>
> >> >> This patch improves the situation by allowing new requests to happen
> >> >> even though the old one is still being processed. Note that in this
> >> >> approach, if an irq_work was already issued, we just update next_freq
> >> >> and don't bother to queue another request so there's no extra work being
> >> >> done to make this happen.
> >> >
> >> > Now that this isn't an RFC anymore, you shouldn't have added below
> >> > paragraph here. It could go to the comments section though.
> >> >
> >> >> I had brought up this issue at the OSPM conference and Claudio had a
> >> >> discussion RFC with an alternate approach [1]. I prefer the approach as
> >> >> done in the patch below since it doesn't need any new flags and doesn't
> >> >> cause any other extra overhead.
> >> >>
> >> >> [1] https://patchwork.kernel.org/patch/10384261/
> >> >>
> >> >> LGTMed-by: Viresh Kumar <[email protected]>
> >> >> LGTMed-by: Juri Lelli <[email protected]>
> >> >
> >> > Looks like a Tag you just invented ? :)
> >>
> >> Yeah.
> >>
> >> The LGTM from Juri can be converned into an ACK silently IMO. That
> >> said I have added Looks-good-to: tags to a couple of commits. :-)
> >
> > Cool, I'll covert them to Acks :-)
>
> So it looks like I should expect an update of this patch, right?
>
> Or do you prefer the current one to be applied and work on top of it?
Well, sorry, I can't apply this one as it is racy.
On Tuesday, May 22, 2018 12:54:29 PM CEST Viresh Kumar wrote:
> On 22-05-18, 12:50, Rafael J. Wysocki wrote:
> > Ugly indeed.
>
> Hehe. I was thinking, maybe we can write wrapper helpers around lock/unlock
> which are stored as pointers in sg_policy. So that those are only set to
> non-NULL values (or non-Noop routines) for slow-switching single policy or
> any-switching shared policy systems. Then we can get rid of such conditional
> locking attempts :)
>
>
So below is my (compiled-only) version of the $subject patch, obviously based
on the Joel's work.
Roughly, what it does is to move the fast_switch_enabled path entirely to
sugov_update_single() and take the spinlock around sugov_update_commit()
in the one-CPU case too.
---
kernel/sched/cpufreq_schedutil.c | 57 ++++++++++++++++++++++++++-------------
1 file changed, 38 insertions(+), 19 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -92,9 +92,6 @@ static bool sugov_should_update_freq(str
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
- if (sg_policy->work_in_progress)
- return false;
-
if (unlikely(sg_policy->need_freq_update))
return true;
@@ -103,25 +100,25 @@ static bool sugov_should_update_freq(str
return delta_ns >= sg_policy->freq_update_delay_ns;
}
-static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
- unsigned int next_freq)
+static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
{
- struct cpufreq_policy *policy = sg_policy->policy;
-
if (sg_policy->next_freq == next_freq)
- return;
+ return false;
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
- if (policy->fast_switch_enabled) {
- next_freq = cpufreq_driver_fast_switch(policy, next_freq);
- if (!next_freq)
- return;
+ return true;
+}
- policy->cur = next_freq;
- trace_cpu_frequency(next_freq, smp_processor_id());
- } else {
+static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
+{
+ if (!sugov_update_next_freq(sg_policy, time, next_freq))
+ return;
+
+ if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
}
@@ -277,6 +274,7 @@ static void sugov_update_single(struct u
{
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+ struct cpufreq_policy *policy = sg_policy->policy;
unsigned long util, max;
unsigned int next_f;
bool busy;
@@ -307,7 +305,23 @@ static void sugov_update_single(struct u
sg_policy->cached_raw_freq = 0;
}
- sugov_update_commit(sg_policy, time, next_f);
+ if (policy->fast_switch_enabled) {
+ if (!sugov_update_next_freq(sg_policy, time, next_f))
+ return;
+
+ next_f = cpufreq_driver_fast_switch(policy, next_f);
+ if (!next_f)
+ return;
+
+ policy->cur = next_f;
+ trace_cpu_frequency(next_f, smp_processor_id());
+ } else {
+ raw_spin_lock(&sg_policy->update_lock);
+
+ sugov_update_commit(sg_policy, time, next_f);
+
+ raw_spin_unlock(&sg_policy->update_lock);
+ }
}
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
@@ -376,13 +390,18 @@ sugov_update_shared(struct update_util_d
static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
+ unsigned int next_freq;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
+ next_freq = sg_policy->next_freq;
+ sg_policy->work_in_progress = false;
+ raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
mutex_lock(&sg_policy->work_lock);
- __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
+ __cpufreq_driver_target(sg_policy->policy, next_freq,
CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
-
- sg_policy->work_in_progress = false;
}
static void sugov_irq_work(struct irq_work *irq_work)
On 22-05-18, 13:31, Rafael J. Wysocki wrote:
> So below is my (compiled-only) version of the $subject patch, obviously based
> on the Joel's work.
>
> Roughly, what it does is to move the fast_switch_enabled path entirely to
> sugov_update_single() and take the spinlock around sugov_update_commit()
> in the one-CPU case too.
>
> ---
> kernel/sched/cpufreq_schedutil.c | 57 ++++++++++++++++++++++++++-------------
> 1 file changed, 38 insertions(+), 19 deletions(-)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(str
> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> return false;
>
> - if (sg_policy->work_in_progress)
> - return false;
> -
> if (unlikely(sg_policy->need_freq_update))
> return true;
>
> @@ -103,25 +100,25 @@ static bool sugov_should_update_freq(str
> return delta_ns >= sg_policy->freq_update_delay_ns;
> }
>
> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> - unsigned int next_freq)
> +static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> + unsigned int next_freq)
> {
> - struct cpufreq_policy *policy = sg_policy->policy;
> -
> if (sg_policy->next_freq == next_freq)
> - return;
> + return false;
>
> sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
>
> - if (policy->fast_switch_enabled) {
> - next_freq = cpufreq_driver_fast_switch(policy, next_freq);
> - if (!next_freq)
> - return;
> + return true;
> +}
>
> - policy->cur = next_freq;
> - trace_cpu_frequency(next_freq, smp_processor_id());
> - } else {
> +static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> + unsigned int next_freq)
> +{
> + if (!sugov_update_next_freq(sg_policy, time, next_freq))
> + return;
> +
> + if (!sg_policy->work_in_progress) {
> sg_policy->work_in_progress = true;
> irq_work_queue(&sg_policy->irq_work);
> }
> @@ -277,6 +274,7 @@ static void sugov_update_single(struct u
> {
> struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> + struct cpufreq_policy *policy = sg_policy->policy;
> unsigned long util, max;
> unsigned int next_f;
> bool busy;
> @@ -307,7 +305,23 @@ static void sugov_update_single(struct u
> sg_policy->cached_raw_freq = 0;
> }
>
> - sugov_update_commit(sg_policy, time, next_f);
> + if (policy->fast_switch_enabled) {
Why do you assume that fast switch isn't possible in shared policy
cases ? It infact is already enabled for few drivers.
--
viresh
On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
> On 22-05-18, 13:31, Rafael J. Wysocki wrote:
>> So below is my (compiled-only) version of the $subject patch, obviously based
>> on the Joel's work.
>>
>> Roughly, what it does is to move the fast_switch_enabled path entirely to
>> sugov_update_single() and take the spinlock around sugov_update_commit()
>> in the one-CPU case too.
>>
>> ---
>> kernel/sched/cpufreq_schedutil.c | 57 ++++++++++++++++++++++++++-------------
>> 1 file changed, 38 insertions(+), 19 deletions(-)
>>
>> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
>> ===================================================================
>> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
>> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
>> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(str
>> !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> return false;
>>
>> - if (sg_policy->work_in_progress)
>> - return false;
>> -
>> if (unlikely(sg_policy->need_freq_update))
>> return true;
>>
>> @@ -103,25 +100,25 @@ static bool sugov_should_update_freq(str
>> return delta_ns >= sg_policy->freq_update_delay_ns;
>> }
>>
>> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> - unsigned int next_freq)
>> +static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>> + unsigned int next_freq)
>> {
>> - struct cpufreq_policy *policy = sg_policy->policy;
>> -
>> if (sg_policy->next_freq == next_freq)
>> - return;
>> + return false;
>>
>> sg_policy->next_freq = next_freq;
>> sg_policy->last_freq_update_time = time;
>>
>> - if (policy->fast_switch_enabled) {
>> - next_freq = cpufreq_driver_fast_switch(policy, next_freq);
>> - if (!next_freq)
>> - return;
>> + return true;
>> +}
>>
>> - policy->cur = next_freq;
>> - trace_cpu_frequency(next_freq, smp_processor_id());
>> - } else {
>> +static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> + unsigned int next_freq)
>> +{
>> + if (!sugov_update_next_freq(sg_policy, time, next_freq))
>> + return;
>> +
>> + if (!sg_policy->work_in_progress) {
>> sg_policy->work_in_progress = true;
>> irq_work_queue(&sg_policy->irq_work);
>> }
>> @@ -277,6 +274,7 @@ static void sugov_update_single(struct u
>> {
>> struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
>> struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> + struct cpufreq_policy *policy = sg_policy->policy;
>> unsigned long util, max;
>> unsigned int next_f;
>> bool busy;
>> @@ -307,7 +305,23 @@ static void sugov_update_single(struct u
>> sg_policy->cached_raw_freq = 0;
>> }
>>
>> - sugov_update_commit(sg_policy, time, next_f);
>> + if (policy->fast_switch_enabled) {
>
> Why do you assume that fast switch isn't possible in shared policy
> cases ? It infact is already enabled for few drivers.
OK, so the fast_switch thing needs to be left outside of the spinlock
in the single case only. Fair enough.
On Tuesday, May 22, 2018 1:42:05 PM CEST Rafael J. Wysocki wrote:
> On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
> > On 22-05-18, 13:31, Rafael J. Wysocki wrote:
> >> So below is my (compiled-only) version of the $subject patch, obviously based
> >> on the Joel's work.
> >>
> >> Roughly, what it does is to move the fast_switch_enabled path entirely to
> >> sugov_update_single() and take the spinlock around sugov_update_commit()
> >> in the one-CPU case too.
[cut]
> >
> > Why do you assume that fast switch isn't possible in shared policy
> > cases ? It infact is already enabled for few drivers.
I hope that fast_switch is not used with devfs_possible_from_any_cpu set in the
one-CPU policy case, as that looks racy even without any patching.
> OK, so the fast_switch thing needs to be left outside of the spinlock
> in the single case only. Fair enough.
That would be something like the patch below (again, compiled-only).
---
kernel/sched/cpufreq_schedutil.c | 67 +++++++++++++++++++++++++++------------
1 file changed, 47 insertions(+), 20 deletions(-)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -92,9 +92,6 @@ static bool sugov_should_update_freq(str
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
- if (sg_policy->work_in_progress)
- return false;
-
if (unlikely(sg_policy->need_freq_update))
return true;
@@ -103,25 +100,41 @@ static bool sugov_should_update_freq(str
return delta_ns >= sg_policy->freq_update_delay_ns;
}
-static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
- unsigned int next_freq)
+static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
{
- struct cpufreq_policy *policy = sg_policy->policy;
-
if (sg_policy->next_freq == next_freq)
- return;
+ return false;
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
- if (policy->fast_switch_enabled) {
- next_freq = cpufreq_driver_fast_switch(policy, next_freq);
- if (!next_freq)
- return;
+ return true;
+}
- policy->cur = next_freq;
- trace_cpu_frequency(next_freq, smp_processor_id());
- } else {
+static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
+{
+ struct cpufreq_policy *policy = sg_policy->policy;
+
+ if (!sugov_update_next_freq(sg_policy, time, next_freq))
+ return;
+
+ next_freq = cpufreq_driver_fast_switch(policy, next_freq);
+ if (!next_freq)
+ return;
+
+ policy->cur = next_freq;
+ trace_cpu_frequency(next_freq, smp_processor_id());
+}
+
+static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
+{
+ if (!sugov_update_next_freq(sg_policy, time, next_freq))
+ return;
+
+ if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
}
@@ -307,7 +320,13 @@ static void sugov_update_single(struct u
sg_policy->cached_raw_freq = 0;
}
- sugov_update_commit(sg_policy, time, next_f);
+ if (sg_policy->policy->fast_switch_enabled) {
+ sugov_fast_switch(sg_policy, time, next_f);
+ } else {
+ raw_spin_lock(&sg_policy->update_lock);
+ sugov_update_commit(sg_policy, time, next_f);
+ raw_spin_unlock(&sg_policy->update_lock);
+ }
}
static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
@@ -367,7 +386,10 @@ sugov_update_shared(struct update_util_d
if (sugov_should_update_freq(sg_policy, time)) {
next_f = sugov_next_freq_shared(sg_cpu, time);
- sugov_update_commit(sg_policy, time, next_f);
+ if (sg_policy->policy->fast_switch_enabled)
+ sugov_fast_switch(sg_policy, time, next_f);
+ else
+ sugov_update_commit(sg_policy, time, next_f);
}
raw_spin_unlock(&sg_policy->update_lock);
@@ -376,13 +398,18 @@ sugov_update_shared(struct update_util_d
static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
+ unsigned int next_freq;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
+ next_freq = sg_policy->next_freq;
+ sg_policy->work_in_progress = false;
+ raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
mutex_lock(&sg_policy->work_lock);
- __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
+ __cpufreq_driver_target(sg_policy->policy, next_freq,
CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
-
- sg_policy->work_in_progress = false;
}
static void sugov_irq_work(struct irq_work *irq_work)
On Tue, May 22, 2018 at 2:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, May 22, 2018 1:42:05 PM CEST Rafael J. Wysocki wrote:
>> On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
>> > On 22-05-18, 13:31, Rafael J. Wysocki wrote:
>> >> So below is my (compiled-only) version of the $subject patch, obviously based
>> >> on the Joel's work.
>> >>
>> >> Roughly, what it does is to move the fast_switch_enabled path entirely to
>> >> sugov_update_single() and take the spinlock around sugov_update_commit()
>> >> in the one-CPU case too.
>
> [cut]
>
>> >
>> > Why do you assume that fast switch isn't possible in shared policy
>> > cases ? It infact is already enabled for few drivers.
>
> I hope that fast_switch is not used with devfs_possible_from_any_cpu set in the
> one-CPU policy case, as that looks racy even without any patching.
Which would be the only case in which sugov_update_single() would run
on a CPU that is not the target.
And running sugov_update_single() concurrently on two different CPUs
for the same target is a no-no, as we don't prevent concurrent updates
from occurring in that path.
Which means that the original patch from Joel will be sufficient as
long as we ensure that sugov_update_single() can only run on one CPU
at a time.
On Tue, May 22, 2018 at 12:02 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <[email protected]> wrote:
>> On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote:
>>> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
>>> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>>> >> From: "Joel Fernandes (Google)" <[email protected]>
>>> >>
>>> >> Currently there is a chance of a schedutil cpufreq update request to be
>>> >> dropped if there is a pending update request. This pending request can
>>> >> be delayed if there is a scheduling delay of the irq_work and the wake
>>> >> up of the schedutil governor kthread.
>>> >>
>>> >> A very bad scenario is when a schedutil request was already just made,
>>> >> such as to reduce the CPU frequency, then a newer request to increase
>>> >> CPU frequency (even sched deadline urgent frequency increase requests)
>>> >> can be dropped, even though the rate limits suggest that its Ok to
>>> >> process a request. This is because of the way the work_in_progress flag
>>> >> is used.
>>> >>
>>> >> This patch improves the situation by allowing new requests to happen
>>> >> even though the old one is still being processed. Note that in this
>>> >> approach, if an irq_work was already issued, we just update next_freq
>>> >> and don't bother to queue another request so there's no extra work being
>>> >> done to make this happen.
>>> >
>>> > Now that this isn't an RFC anymore, you shouldn't have added below
>>> > paragraph here. It could go to the comments section though.
>>> >
>>> >> I had brought up this issue at the OSPM conference and Claudio had a
>>> >> discussion RFC with an alternate approach [1]. I prefer the approach as
>>> >> done in the patch below since it doesn't need any new flags and doesn't
>>> >> cause any other extra overhead.
>>> >>
>>> >> [1] https://patchwork.kernel.org/patch/10384261/
>>> >>
>>> >> LGTMed-by: Viresh Kumar <[email protected]>
>>> >> LGTMed-by: Juri Lelli <[email protected]>
>>> >
>>> > Looks like a Tag you just invented ? :)
>>>
>>> Yeah.
>>>
>>> The LGTM from Juri can be converned into an ACK silently IMO. That
>>> said I have added Looks-good-to: tags to a couple of commits. :-)
>>
>> Cool, I'll covert them to Acks :-)
>
> So it looks like I should expect an update of this patch, right?
>
> Or do you prefer the current one to be applied and work on top of it?
>
[cut]
>>
>> I just realized that on a single policy switch that uses the governor thread,
>> there will be 1 thread per-CPU. The sugov_update_single will be called on the
>> same CPU with interrupts disabled.
>
> sugov_update_single() doesn't have to run on the target CPU.
Which sadly is a bug IMO. :-/
>> In sugov_work, we are doing a
>> raw_spin_lock_irqsave which also disables interrupts. So I don't think
>> there's any possibility of a race happening on the same CPU between the
>> frequency update request and the sugov_work executing. In other words, I feel
>> we can drop the above if (..) statement for single policies completely and
>> only keep the changes for the shared policy. Viresh since you brought up the
>> single policy issue initially which made me add this if statememnt, could you
>> let me know if you agree with what I just said?
>
> Which is why you need the spinlock too.
And you totally have a point. With the above bug fixed, disabling
interrupts should be sufficient to prevent concurrent updates from
occurring in the one-CPU policy case and the work_in_progress check in
sugov_update_single() isn't necessary.
On Tue, May 22, 2018 at 5:30 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tue, May 22, 2018 at 12:02 PM, Rafael J. Wysocki <[email protected]> wrote:
>> On Mon, May 21, 2018 at 6:13 PM, Joel Fernandes <[email protected]> wrote:
>>> On Mon, May 21, 2018 at 10:29:52AM +0200, Rafael J. Wysocki wrote:
>>>> On Mon, May 21, 2018 at 7:14 AM, Viresh Kumar <[email protected]> wrote:
>>>> > On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>>>> >> From: "Joel Fernandes (Google)" <[email protected]>
>>>> >>
>>>> >> Currently there is a chance of a schedutil cpufreq update request to be
>>>> >> dropped if there is a pending update request. This pending request can
>>>> >> be delayed if there is a scheduling delay of the irq_work and the wake
>>>> >> up of the schedutil governor kthread.
>>>> >>
>>>> >> A very bad scenario is when a schedutil request was already just made,
>>>> >> such as to reduce the CPU frequency, then a newer request to increase
>>>> >> CPU frequency (even sched deadline urgent frequency increase requests)
>>>> >> can be dropped, even though the rate limits suggest that its Ok to
>>>> >> process a request. This is because of the way the work_in_progress flag
>>>> >> is used.
>>>> >>
>>>> >> This patch improves the situation by allowing new requests to happen
>>>> >> even though the old one is still being processed. Note that in this
>>>> >> approach, if an irq_work was already issued, we just update next_freq
>>>> >> and don't bother to queue another request so there's no extra work being
>>>> >> done to make this happen.
>>>> >
>>>> > Now that this isn't an RFC anymore, you shouldn't have added below
>>>> > paragraph here. It could go to the comments section though.
>>>> >
>>>> >> I had brought up this issue at the OSPM conference and Claudio had a
>>>> >> discussion RFC with an alternate approach [1]. I prefer the approach as
>>>> >> done in the patch below since it doesn't need any new flags and doesn't
>>>> >> cause any other extra overhead.
>>>> >>
>>>> >> [1] https://patchwork.kernel.org/patch/10384261/
>>>> >>
>>>> >> LGTMed-by: Viresh Kumar <[email protected]>
>>>> >> LGTMed-by: Juri Lelli <[email protected]>
>>>> >
>>>> > Looks like a Tag you just invented ? :)
>>>>
>>>> Yeah.
>>>>
>>>> The LGTM from Juri can be converned into an ACK silently IMO. That
>>>> said I have added Looks-good-to: tags to a couple of commits. :-)
>>>
>>> Cool, I'll covert them to Acks :-)
>>
>> So it looks like I should expect an update of this patch, right?
>>
>> Or do you prefer the current one to be applied and work on top of it?
>>
>
> [cut]
>
>>>
>>> I just realized that on a single policy switch that uses the governor thread,
>>> there will be 1 thread per-CPU. The sugov_update_single will be called on the
>>> same CPU with interrupts disabled.
>>
>> sugov_update_single() doesn't have to run on the target CPU.
>
> Which sadly is a bug IMO. :-/
My bad.
sugov_update_single() runs under rq->lock, so it need not run on a
target CPU so long as the CPU running it can update the frequency for
the target and there is the requisite check for that in
sugov_should_update_freq().
That means that sugov_update_single() will not run concurrently on two
different CPUs for the same target, but it may be running concurrently
with the kthread (as pointed out by Viresh).
>>> In sugov_work, we are doing a
>>> raw_spin_lock_irqsave which also disables interrupts. So I don't think
>>> there's any possibility of a race happening on the same CPU between the
>>> frequency update request and the sugov_work executing. In other words, I feel
>>> we can drop the above if (..) statement for single policies completely and
>>> only keep the changes for the shared policy. Viresh since you brought up the
>>> single policy issue initially which made me add this if statememnt, could you
>>> let me know if you agree with what I just said?
>>
>> Which is why you need the spinlock too.
>
> And you totally have a point.
Not really, sorry about that.
It is necessary to take the spinlock in the non-fast-switch case,
because of the possible race with the kthread, so something like my
patch at https://patchwork.kernel.org/patch/10418551/ is needed after
all.
On Tue, May 22, 2018 at 05:27:11PM +0200, Rafael J. Wysocki wrote:
> On Tue, May 22, 2018 at 2:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Tuesday, May 22, 2018 1:42:05 PM CEST Rafael J. Wysocki wrote:
> >> On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
> >> > On 22-05-18, 13:31, Rafael J. Wysocki wrote:
> >> >> So below is my (compiled-only) version of the $subject patch, obviously based
> >> >> on the Joel's work.
> >> >>
> >> >> Roughly, what it does is to move the fast_switch_enabled path entirely to
> >> >> sugov_update_single() and take the spinlock around sugov_update_commit()
> >> >> in the one-CPU case too.
> >
> > [cut]
> >
> >> >
> >> > Why do you assume that fast switch isn't possible in shared policy
> >> > cases ? It infact is already enabled for few drivers.
> >
> > I hope that fast_switch is not used with devfs_possible_from_any_cpu set in the
> > one-CPU policy case, as that looks racy even without any patching.
>
> Which would be the only case in which sugov_update_single() would run
> on a CPU that is not the target.
>
> And running sugov_update_single() concurrently on two different CPUs
> for the same target is a no-no, as we don't prevent concurrent updates
> from occurring in that path.
>
> Which means that the original patch from Joel will be sufficient as
> long as we ensure that sugov_update_single() can only run on one CPU
> at a time.
Since target CPU's runqueue lock is held, I don't see how we can run
sugov_update_single concurrently with any other CPU for single policy, so
protecting such race shouldn't be necessary.
Also the "if (work_in_progress)" check I added to the sugov_update_single
doesn't change the behavior of single policy from what it is in mainline
since we were doing the same thing in already sugov_should_update_freq.
thanks,
- Joel
On Tue, May 22, 2018 at 11:41 PM, Joel Fernandes <[email protected]> wrote:
> On Tue, May 22, 2018 at 05:27:11PM +0200, Rafael J. Wysocki wrote:
>> On Tue, May 22, 2018 at 2:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Tuesday, May 22, 2018 1:42:05 PM CEST Rafael J. Wysocki wrote:
>> >> On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
>> >> > On 22-05-18, 13:31, Rafael J. Wysocki wrote:
>> >> >> So below is my (compiled-only) version of the $subject patch, obviously based
>> >> >> on the Joel's work.
>> >> >>
>> >> >> Roughly, what it does is to move the fast_switch_enabled path entirely to
>> >> >> sugov_update_single() and take the spinlock around sugov_update_commit()
>> >> >> in the one-CPU case too.
>> >
>> > [cut]
>> >
>> >> >
>> >> > Why do you assume that fast switch isn't possible in shared policy
>> >> > cases ? It infact is already enabled for few drivers.
>> >
>> > I hope that fast_switch is not used with devfs_possible_from_any_cpu set in the
>> > one-CPU policy case, as that looks racy even without any patching.
>>
>> Which would be the only case in which sugov_update_single() would run
>> on a CPU that is not the target.
>>
>> And running sugov_update_single() concurrently on two different CPUs
>> for the same target is a no-no, as we don't prevent concurrent updates
>> from occurring in that path.
>>
>> Which means that the original patch from Joel will be sufficient as
>> long as we ensure that sugov_update_single() can only run on one CPU
>> at a time.
>
> Since target CPU's runqueue lock is held, I don't see how we can run
> sugov_update_single concurrently with any other CPU for single policy, so
> protecting such race shouldn't be necessary.
If dvfs_possible_from_any_cpu is set, any CPU can run
sugov_update_single(), but the kthread will only run on the target
itself. So another CPU running sugov_update_single() for the target
may be racing with the target's kthread.
> Also the "if (work_in_progress)" check I added to the sugov_update_single
> doesn't change the behavior of single policy from what it is in mainline
> since we were doing the same thing in already sugov_should_update_freq.
No, it doesn't, which doesn't mean that this is all OK. :-)
On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
>
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e13df951aca7..5c482ec38610 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> > return false;
> >
> > - if (sg_policy->work_in_progress)
> > - return false;
> > -
> > if (unlikely(sg_policy->need_freq_update)) {
> > sg_policy->need_freq_update = false;
> > /*
> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> >
> > policy->cur = next_freq;
> > trace_cpu_frequency(next_freq, smp_processor_id());
> > - } else {
> > + } else if (!sg_policy->work_in_progress) {
> > sg_policy->work_in_progress = true;
> > irq_work_queue(&sg_policy->irq_work);
> > }
> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >
> > ignore_dl_rate_limit(sg_cpu, sg_policy);
> >
> > + /*
> > + * For slow-switch systems, single policy requests can't run at the
> > + * moment if update is in progress, unless we acquire update_lock.
> > + */
> > + if (sg_policy->work_in_progress)
> > + return;
> > +
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> > static void sugov_work(struct kthread_work *work)
> > {
> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > + unsigned int freq;
> > + unsigned long flags;
> > +
> > + /*
> > + * Hold sg_policy->update_lock shortly to handle the case where:
> > + * incase sg_policy->next_freq is read here, and then updated by
> > + * sugov_update_shared just before work_in_progress is set to false
> > + * here, we may miss queueing the new update.
> > + *
> > + * Note: If a work was queued after the update_lock is released,
> > + * sugov_work will just be called again by kthread_work code; and the
> > + * request will be proceed before the sugov thread sleeps.
> > + */
> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > + freq = sg_policy->next_freq;
> > + sg_policy->work_in_progress = false;
> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >
> > mutex_lock(&sg_policy->work_lock);
> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > - CPUFREQ_RELATION_L);
> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> > mutex_unlock(&sg_policy->work_lock);
> > -
> > - sg_policy->work_in_progress = false;
> > }
>
> And I do see a race here for single policy systems doing slow switching.
>
> Kthread Sched update
>
> sugov_work() sugov_update_single()
>
> lock();
> // The CPU is free to rearrange below
> // two in any order, so it may clear
> // the flag first and then read next
> // freq. Lets assume it does.
> work_in_progress = false
>
> if (work_in_progress)
> return;
>
> sg_policy->next_freq = 0;
> freq = sg_policy->next_freq;
> sg_policy->next_freq = real-next-freq;
> unlock();
>
I agree with the race you describe for single policy slow-switch. Good find :)
The mainline sugov_work could also do such reordering in sugov_work, I think. Even
with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
be reordered by the CPU to happen before the read of next_freq. AIUI,
mutex_unlock is expected to be only a release-barrier.
Although to be safe, I could just put an smp_mb() there. I believe with that,
no locking would be needed for such case.
I'll send out a v3 with Acks for the original patch, and the send out the
smp_mb() as a separate patch if that's Ok.
thanks,
- Joel
On Tue, May 22, 2018 at 11:52:46PM +0200, Rafael J. Wysocki wrote:
> On Tue, May 22, 2018 at 11:41 PM, Joel Fernandes <[email protected]> wrote:
> > On Tue, May 22, 2018 at 05:27:11PM +0200, Rafael J. Wysocki wrote:
> >> On Tue, May 22, 2018 at 2:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> > On Tuesday, May 22, 2018 1:42:05 PM CEST Rafael J. Wysocki wrote:
> >> >> On Tue, May 22, 2018 at 1:38 PM, Viresh Kumar <[email protected]> wrote:
> >> >> > On 22-05-18, 13:31, Rafael J. Wysocki wrote:
> >> >> >> So below is my (compiled-only) version of the $subject patch, obviously based
> >> >> >> on the Joel's work.
> >> >> >>
> >> >> >> Roughly, what it does is to move the fast_switch_enabled path entirely to
> >> >> >> sugov_update_single() and take the spinlock around sugov_update_commit()
> >> >> >> in the one-CPU case too.
> >> >
> >> > [cut]
> >> >
> >> >> >
> >> >> > Why do you assume that fast switch isn't possible in shared policy
> >> >> > cases ? It infact is already enabled for few drivers.
> >> >
> >> > I hope that fast_switch is not used with devfs_possible_from_any_cpu set in the
> >> > one-CPU policy case, as that looks racy even without any patching.
> >>
> >> Which would be the only case in which sugov_update_single() would run
> >> on a CPU that is not the target.
> >>
> >> And running sugov_update_single() concurrently on two different CPUs
> >> for the same target is a no-no, as we don't prevent concurrent updates
> >> from occurring in that path.
> >>
> >> Which means that the original patch from Joel will be sufficient as
> >> long as we ensure that sugov_update_single() can only run on one CPU
> >> at a time.
> >
> > Since target CPU's runqueue lock is held, I don't see how we can run
> > sugov_update_single concurrently with any other CPU for single policy, so
> > protecting such race shouldn't be necessary.
>
> If dvfs_possible_from_any_cpu is set, any CPU can run
> sugov_update_single(), but the kthread will only run on the target
> itself. So another CPU running sugov_update_single() for the target
> may be racing with the target's kthread.
>
Yes, I agree. I thought you meant the case of sugov_update_single running
currently with other sugov_update_single. So just to be on the same page,
I'll fix the commit log and repost this one as is.
And then I'll post the smp_rmb() patch separately to address the memory order
issue (which I believe is in mainline as well). Basically I was thinking to
address Viresh's issue, there should be an smp_mb() after the next_freq is
read, but before the write to work_in_progress.
thanks,
- Joel
On Wed, May 23, 2018 at 12:09 AM, Joel Fernandes <[email protected]> wrote:
> On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
>> Okay, me and Rafael were discussing this patch, locking and races around this.
>>
>> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index e13df951aca7..5c482ec38610 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> > return false;
>> >
>> > - if (sg_policy->work_in_progress)
>> > - return false;
>> > -
>> > if (unlikely(sg_policy->need_freq_update)) {
>> > sg_policy->need_freq_update = false;
>> > /*
>> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> >
>> > policy->cur = next_freq;
>> > trace_cpu_frequency(next_freq, smp_processor_id());
>> > - } else {
>> > + } else if (!sg_policy->work_in_progress) {
>> > sg_policy->work_in_progress = true;
>> > irq_work_queue(&sg_policy->irq_work);
>> > }
>> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>> >
>> > ignore_dl_rate_limit(sg_cpu, sg_policy);
>> >
>> > + /*
>> > + * For slow-switch systems, single policy requests can't run at the
>> > + * moment if update is in progress, unless we acquire update_lock.
>> > + */
>> > + if (sg_policy->work_in_progress)
>> > + return;
>> > +
>> > if (!sugov_should_update_freq(sg_policy, time))
>> > return;
>> >
>> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>> > static void sugov_work(struct kthread_work *work)
>> > {
>> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
>> > + unsigned int freq;
>> > + unsigned long flags;
>> > +
>> > + /*
>> > + * Hold sg_policy->update_lock shortly to handle the case where:
>> > + * incase sg_policy->next_freq is read here, and then updated by
>> > + * sugov_update_shared just before work_in_progress is set to false
>> > + * here, we may miss queueing the new update.
>> > + *
>> > + * Note: If a work was queued after the update_lock is released,
>> > + * sugov_work will just be called again by kthread_work code; and the
>> > + * request will be proceed before the sugov thread sleeps.
>> > + */
>> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
>> > + freq = sg_policy->next_freq;
>> > + sg_policy->work_in_progress = false;
>> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>> >
>> > mutex_lock(&sg_policy->work_lock);
>> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
>> > - CPUFREQ_RELATION_L);
>> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
>> > mutex_unlock(&sg_policy->work_lock);
>> > -
>> > - sg_policy->work_in_progress = false;
>> > }
>>
>> And I do see a race here for single policy systems doing slow switching.
>>
>> Kthread Sched update
>>
>> sugov_work() sugov_update_single()
>>
>> lock();
>> // The CPU is free to rearrange below
>> // two in any order, so it may clear
>> // the flag first and then read next
>> // freq. Lets assume it does.
>> work_in_progress = false
>>
>> if (work_in_progress)
>> return;
>>
>> sg_policy->next_freq = 0;
>> freq = sg_policy->next_freq;
>> sg_policy->next_freq = real-next-freq;
>> unlock();
>>
>
> I agree with the race you describe for single policy slow-switch. Good find :)
>
> The mainline sugov_work could also do such reordering in sugov_work, I think. Even
> with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
> be reordered by the CPU to happen before the read of next_freq. AIUI,
> mutex_unlock is expected to be only a release-barrier.
>
> Although to be safe, I could just put an smp_mb() there. I believe with that,
> no locking would be needed for such case.
Yes, but leaving the work_in_progress check in sugov_update_single()
means that the original problem is still there in the one-CPU policy
case. Namely, utilization updates coming in between setting
work_in_progress in sugov_update_commit() and clearing it in
sugov_work() will be discarded in the one-CPU policy case, but not in
the shared policy case.
> I'll send out a v3 with Acks for the original patch,
OK
> and the send out the smp_mb() as a separate patch if that's Ok.
I would prefer to use a spinlock in the one-CPU policy non-fast-switch
case and remove the work_in_progress check from sugov_update_single().
I can do a patch on top of yours for that. In fact, I've done that already. :-)
Thanks,
Rafael
On 22-05-18, 15:09, Joel Fernandes wrote:
> I agree with the race you describe for single policy slow-switch. Good find :)
>
> The mainline sugov_work could also do such reordering in sugov_work, I think. Even
> with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
> be reordered by the CPU to happen before the read of next_freq. AIUI,
> mutex_unlock is expected to be only a release-barrier.
>
> Although to be safe, I could just put an smp_mb() there. I believe with that,
> no locking would be needed for such case.
>
> I'll send out a v3 with Acks for the original patch, and the send out the
> smp_mb() as a separate patch if that's Ok.
Maybe it would be better to get the fix (with smp_mb) first and then
this optimization patch on the top? That would mean that the fix can
get applied to stable kernels easily.
--
viresh
On May 23, 2018 2:01:01 AM PDT, Viresh Kumar <[email protected]> wrote:
>On 22-05-18, 15:09, Joel Fernandes wrote:
>> I agree with the race you describe for single policy slow-switch.
>Good find :)
>>
>> The mainline sugov_work could also do such reordering in sugov_work,
>I think. Even
>> with the mutex_unlock in mainline's sugov_work, that work_in_progress
>write could
>> be reordered by the CPU to happen before the read of next_freq. AIUI,
>> mutex_unlock is expected to be only a release-barrier.
>>
>> Although to be safe, I could just put an smp_mb() there. I believe
>with that,
>> no locking would be needed for such case.
>>
>> I'll send out a v3 with Acks for the original patch, and the send out
>the
>> smp_mb() as a separate patch if that's Ok.
>
>Maybe it would be better to get the fix (with smp_mb) first and then
>this optimization patch on the top? That would mean that the fix can
>get applied to stable kernels easily.
Probably. But then Rafael is changing single policy to use the lock so then barrier wouldn't be needed at all. In that case, both mine and Rafael new patch can go into stable which handles your race ( optimization == fix in this case :P )
thanks,
- Joel
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 23-05-18, 02:42, Joel Fernandes wrote:
> Probably. But then Rafael is changing single policy to use the lock
> so then barrier wouldn't be needed at all. In that case, both mine
> and Rafael new patch can go into stable which handles your race (
> optimization == fix in this case :P )
Yeah, we discussed that offline.
Go get some sleep. There are no barriers in this world :)
--
viresh