From: jun qian <[email protected]>
When get the pending softirqs, it need to process all the pending
softirqs in the while loop. If the processing time of each pending
softirq is need more than 2 msec in this loop, or one of the softirq
will running a long time, according to the original code logic, it
will process all the pending softirqs without wakeuping ksoftirqd,
which will cause a relatively large scheduling delay on the
corresponding CPU, which we do not wish to see. The patch will check
the total time to process pending softirq, if the time exceeds 2 ms
we need to wakeup the ksofirqd to aviod large sched delay.
Signed-off-by: jun qian <[email protected]>
---
kernel/softirq.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c4201b7f..d572ce4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
#include <linux/smpboot.h>
#include <linux/tick.h>
#include <linux/irq.h>
+#include <linux/sched/clock.h>
#define CREATE_TRACE_POINTS
#include <trace/events/irq.h>
@@ -200,17 +201,15 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
/*
* We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
* but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
+ * In the loop, if the processing time of the softirq has exceeded 2
+ * milliseconds, we also need to break the loop to wakeup the ksofirqd.
*
* These limits have been established via experimentation.
* The two things to balance is latency against fairness -
* we want to handle softirqs as soon as possible, but they
* should not be able to lock up the box.
*/
-#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME_NS 2000000
#define MAX_SOFTIRQ_RESTART 10
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -248,7 +247,7 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
asmlinkage __visible void __softirq_entry __do_softirq(void)
{
- unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+ u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
unsigned long old_flags = current->flags;
int max_restart = MAX_SOFTIRQ_RESTART;
struct softirq_action *h;
@@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
}
h++;
pending >>= softirq_bit;
+
+ /*
+ * the softirq's action has been running for too much time
+ * so it may need to wakeup the ksoftirqd
+ */
+ if (need_resched() && sched_clock() > end) {
+ /*
+ * Ensure that the remaining pending bits are
+ * handled.
+ */
+ or_softirq_pending(pending << (vec_nr + 1));
+ break;
+ }
}
if (__this_cpu_read(ksoftirqd) == current)
@@ -307,8 +319,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched() &&
- --max_restart)
+ if (!need_resched() && --max_restart &&
+ sched_clock() <= end)
goto restart;
wakeup_softirqd();
--
1.8.3.1
On Fri, Jul 24, 2020 at 10:31:23AM -0400, [email protected] wrote:
> From: jun qian <[email protected]>
>
> When get the pending softirqs, it need to process all the pending
> softirqs in the while loop. If the processing time of each pending
> softirq is need more than 2 msec in this loop, or one of the softirq
> will running a long time, according to the original code logic, it
> will process all the pending softirqs without wakeuping ksoftirqd,
> which will cause a relatively large scheduling delay on the
> corresponding CPU, which we do not wish to see. The patch will check
> the total time to process pending softirq, if the time exceeds 2 ms
> we need to wakeup the ksofirqd to aviod large sched delay.
>
> Signed-off-by: jun qian <[email protected]>
> ---
> kernel/softirq.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c4201b7f..d572ce4 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -25,6 +25,7 @@
> #include <linux/smpboot.h>
> #include <linux/tick.h>
> #include <linux/irq.h>
> +#include <linux/sched/clock.h>
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/irq.h>
> @@ -200,17 +201,15 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
> /*
> * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> * but break the loop if need_resched() is set or after 2 ms.
> - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> - * certain cases, such as stop_machine(), jiffies may cease to
> - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> - * well to make sure we eventually return from this method.
> + * In the loop, if the processing time of the softirq has exceeded 2
> + * milliseconds, we also need to break the loop to wakeup the ksofirqd.
> *
> * These limits have been established via experimentation.
> * The two things to balance is latency against fairness -
> * we want to handle softirqs as soon as possible, but they
> * should not be able to lock up the box.
> */
> -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME_NS 2000000
> #define MAX_SOFTIRQ_RESTART 10
>
> #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -248,7 +247,7 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
>
> asmlinkage __visible void __softirq_entry __do_softirq(void)
> {
> - unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> + u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
> unsigned long old_flags = current->flags;
> int max_restart = MAX_SOFTIRQ_RESTART;
> struct softirq_action *h;
> @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> }
> h++;
> pending >>= softirq_bit;
> +
> + /*
> + * the softirq's action has been running for too much time
> + * so it may need to wakeup the ksoftirqd
> + */
> + if (need_resched() && sched_clock() > end) {
> + /*
> + * Ensure that the remaining pending bits are
> + * handled.
> + */
> + or_softirq_pending(pending << (vec_nr + 1));
> + break;
> + }
> }
>
> if (__this_cpu_read(ksoftirqd) == current)
> @@ -307,8 +319,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>
> pending = local_softirq_pending();
> if (pending) {
> - if (time_before(jiffies, end) && !need_resched() &&
> - --max_restart)
> + if (!need_resched() && --max_restart &&
> + sched_clock() <= end)
> goto restart;
>
> wakeup_softirqd();
>
To me it looks OKr.
Thank you for fixing the "2 msec resolution case".
--
Vlad Rezki
Qian,
[email protected] writes:
> /*
> * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> * but break the loop if need_resched() is set or after 2 ms.
> - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> - * certain cases, such as stop_machine(), jiffies may cease to
> - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> - * well to make sure we eventually return from this method.
> + * In the loop, if the processing time of the softirq has exceeded 2
> + * milliseconds, we also need to break the loop to wakeup the
> ksofirqd.
You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
gets adjusted later on. Also while sched_clock() is granular on many
systems it still can be jiffies based and then the above problem
persists.
> @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> }
> h++;
> pending >>= softirq_bit;
> +
> + /*
> + * the softirq's action has been running for too much time
> + * so it may need to wakeup the ksoftirqd
> + */
> + if (need_resched() && sched_clock() > end) {
> + /*
> + * Ensure that the remaining pending bits are
> + * handled.
> + */
> + or_softirq_pending(pending << (vec_nr + 1));
To or the value interrupts need to be disabled because otherwise you can
lose a bit when an interrupt happens in the middle of the RMW operation
and raises a softirq which is not in @pending and not in the per CPU
local softirq pending storage.
There is another problem. Assume bit 0 and 1 are pending when the
processing starts. Now it breaks out after bit 0 has been handled and
stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
again. ksoftirqd runs and handles bit 0, which takes more than the
timeout. As a result the bit 0 processing can starve all other softirqs.
So this needs more thought.
Thanks,
tglx
On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <[email protected]> wrote:
>
> Qian,
>
> [email protected] writes:
> > /*
> > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> > * but break the loop if need_resched() is set or after 2 ms.
> > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> > - * certain cases, such as stop_machine(), jiffies may cease to
> > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> > - * well to make sure we eventually return from this method.
> > + * In the loop, if the processing time of the softirq has exceeded 2
> > + * milliseconds, we also need to break the loop to wakeup the
> > ksofirqd.
>
> You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
> have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
> gets adjusted later on. Also while sched_clock() is granular on many
> systems it still can be jiffies based and then the above problem
> persists.
>
> > @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> > }
> > h++;
> > pending >>= softirq_bit;
> > +
> > + /*
> > + * the softirq's action has been running for too much time
> > + * so it may need to wakeup the ksoftirqd
> > + */
> > + if (need_resched() && sched_clock() > end) {
> > + /*
> > + * Ensure that the remaining pending bits are
> > + * handled.
> > + */
> > + or_softirq_pending(pending << (vec_nr + 1));
>
> To or the value interrupts need to be disabled because otherwise you can
> lose a bit when an interrupt happens in the middle of the RMW operation
> and raises a softirq which is not in @pending and not in the per CPU
> local softirq pending storage.
>
> There is another problem. Assume bit 0 and 1 are pending when the
> processing starts. Now it breaks out after bit 0 has been handled and
> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> again. ksoftirqd runs and handles bit 0, which takes more than the
> timeout. As a result the bit 0 processing can starve all other softirqs.
>
I got it. I will try to slove this problem. Thanks.
> So this needs more thought.
>
> Thanks,
>
> tglx
On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <[email protected]> wrote:
>
> Qian,
>
> [email protected] writes:
> > /*
> > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> > * but break the loop if need_resched() is set or after 2 ms.
> > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> > - * certain cases, such as stop_machine(), jiffies may cease to
> > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> > - * well to make sure we eventually return from this method.
> > + * In the loop, if the processing time of the softirq has exceeded 2
> > + * milliseconds, we also need to break the loop to wakeup the
> > ksofirqd.
>
> You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
> have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
> gets adjusted later on. Also while sched_clock() is granular on many
> systems it still can be jiffies based and then the above problem
> persists.
>
> > @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> > }
> > h++;
> > pending >>= softirq_bit;
> > +
> > + /*
> > + * the softirq's action has been running for too much time
> > + * so it may need to wakeup the ksoftirqd
> > + */
> > + if (need_resched() && sched_clock() > end) {
> > + /*
> > + * Ensure that the remaining pending bits are
> > + * handled.
> > + */
> > + or_softirq_pending(pending << (vec_nr + 1));
>
> To or the value interrupts need to be disabled because otherwise you can
> lose a bit when an interrupt happens in the middle of the RMW operation
> and raises a softirq which is not in @pending and not in the per CPU
> local softirq pending storage.
>
I can't understand the problem described above, could you explain in detail.
thanks.
> There is another problem. Assume bit 0 and 1 are pending when the
> processing starts. Now it breaks out after bit 0 has been handled and
> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> again. ksoftirqd runs and handles bit 0, which takes more than the
> timeout. As a result the bit 0 processing can starve all other softirqs.
>
May I use a percpu val to record the order of processing softirq, by the order
val, when it is in ksoftirqd we can process the pending softirq in order. In the
scenario you described above, before ksoftirqd runs, bit 0 gets raised again,
ksoftirqd runs and handles bit 1 by the percpu val. just like a ring.
Looking forward to your suggestions
Thanks
> So this needs more thought.
>
> Thanks,
>
> tglx
Qian,
jun qian <[email protected]> writes:
> On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <[email protected]> wrote:
>> > + or_softirq_pending(pending << (vec_nr + 1));
>>
>> To or the value interrupts need to be disabled because otherwise you can
>> lose a bit when an interrupt happens in the middle of the RMW operation
>> and raises a softirq which is not in @pending and not in the per CPU
>> local softirq pending storage.
>>
> I can't understand the problem described above, could you explain in
> detail.
Oring a value to a memory location is a Read Modify Write (RMW)
operation.
x |= a;
translates to pseudo code:
R1 = x // Load content of memory location x into register R1
R1 |= a // Or value a on R1
x = R1 // Write back result
So assume:
x = 0
a = 1
R1 = x --> R1 == 0
R1 |= a --> R1 == 1
interrupt sets bit 1 in x --> x == 0x02
x = R1 --> x == 1
So you lost the set bit 1, right? Not really what you want.
>> There is another problem. Assume bit 0 and 1 are pending when the
>> processing starts. Now it breaks out after bit 0 has been handled and
>> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
>> again. ksoftirqd runs and handles bit 0, which takes more than the
>> timeout. As a result the bit 0 processing can starve all other softirqs.
>>
> May I use a percpu val to record the order of processing softirq, by the order
> val, when it is in ksoftirqd we can process the pending softirq in order. In the
> scenario you described above, before ksoftirqd runs, bit 0 gets raised again,
> ksoftirqd runs and handles bit 1 by the percpu val. just like a ring.
Yes, you need something to save information about the not-processed
bits. Keeping track of which bit to process next works, but whether
that's going to result in efficient and simple code is a different
question.
Thanks,
tglx
On Wed, Jul 29, 2020 at 8:16 PM Thomas Gleixner <[email protected]> wrote:
>
> Qian,
>
> jun qian <[email protected]> writes:
> > On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <[email protected]> wrote:
> >> > + or_softirq_pending(pending << (vec_nr + 1));
> >>
> >> To or the value interrupts need to be disabled because otherwise you can
> >> lose a bit when an interrupt happens in the middle of the RMW operation
> >> and raises a softirq which is not in @pending and not in the per CPU
> >> local softirq pending storage.
> >>
> > I can't understand the problem described above, could you explain in
> > detail.
>
> Oring a value to a memory location is a Read Modify Write (RMW)
> operation.
>
> x |= a;
>
> translates to pseudo code:
>
> R1 = x // Load content of memory location x into register R1
> R1 |= a // Or value a on R1
> x = R1 // Write back result
>
> So assume:
>
> x = 0
> a = 1
>
> R1 = x --> R1 == 0
> R1 |= a --> R1 == 1
>
> interrupt sets bit 1 in x --> x == 0x02
>
> x = R1 --> x == 1
>
> So you lost the set bit 1, right? Not really what you want.
>
wow, thanks a lot, i got it.
> >> There is another problem. Assume bit 0 and 1 are pending when the
> >> processing starts. Now it breaks out after bit 0 has been handled and
> >> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> >> again. ksoftirqd runs and handles bit 0, which takes more than the
> >> timeout. As a result the bit 0 processing can starve all other softirqs.
> >>
> > May I use a percpu val to record the order of processing softirq, by the order
> > val, when it is in ksoftirqd we can process the pending softirq in order. In the
> > scenario you described above, before ksoftirqd runs, bit 0 gets raised again,
> > ksoftirqd runs and handles bit 1 by the percpu val. just like a ring.
>
> Yes, you need something to save information about the not-processed
> bits. Keeping track of which bit to process next works, but whether
> that's going to result in efficient and simple code is a different
> question.
>
ok, i will modify it in the next version.
> Thanks,
>
> tglx
>
Thomas Gleixner <[email protected]> 于2020年7月29日周三 下午8:16写道:
>
> Qian,
>
> jun qian <[email protected]> writes:
> > On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <[email protected]> wrote:
> >> > + or_softirq_pending(pending << (vec_nr + 1));
> >>
> >> To or the value interrupts need to be disabled because otherwise you can
> >> lose a bit when an interrupt happens in the middle of the RMW operation
> >> and raises a softirq which is not in @pending and not in the per CPU
> >> local softirq pending storage.
> >>
> > I can't understand the problem described above, could you explain in
> > detail.
>
> Oring a value to a memory location is a Read Modify Write (RMW)
> operation.
>
> x |= a;
>
> translates to pseudo code:
>
> R1 = x // Load content of memory location x into register R1
> R1 |= a // Or value a on R1
> x = R1 // Write back result
>
> So assume:
>
> x = 0
> a = 1
>
> R1 = x --> R1 == 0
> R1 |= a --> R1 == 1
>
> interrupt sets bit 1 in x --> x == 0x02
>
> x = R1 --> x == 1
>
> So you lost the set bit 1, right? Not really what you want.
>
> >> There is another problem. Assume bit 0 and 1 are pending when the
> >> processing starts. Now it breaks out after bit 0 has been handled and
> >> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> >> again. ksoftirqd runs and handles bit 0, which takes more than the
> >> timeout. As a result the bit 0 processing can starve all other softirqs.
> >>
> > May I use a percpu val to record the order of processing softirq, by the order
> > val, when it is in ksoftirqd we can process the pending softirq in order. In the
> > scenario you described above, before ksoftirqd runs, bit 0 gets raised again,
> > ksoftirqd runs and handles bit 1 by the percpu val. just like a ring.
>
> Yes, you need something to save information about the not-processed
> bits. Keeping track of which bit to process next works, but whether
> that's going to result in efficient and simple code is a different
> question.
>
> Thanks,
>
> tglx
>
Hi Thomas, I am so sorry, For personal reasons, the modification of
this patch was delayed, I will submit the next modified version in
these two days, sorry again
Thomas Gleixner <[email protected]> 于2020年7月27日周一 下午11:41写道:
>
> Qian,
>
> [email protected] writes:
> > /*
> > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> > * but break the loop if need_resched() is set or after 2 ms.
> > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> > - * certain cases, such as stop_machine(), jiffies may cease to
> > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> > - * well to make sure we eventually return from this method.
> > + * In the loop, if the processing time of the softirq has exceeded 2
> > + * milliseconds, we also need to break the loop to wakeup the
> > ksofirqd.
>
> You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
> have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
> gets adjusted later on. Also while sched_clock() is granular on many
> systems it still can be jiffies based and then the above problem
> persists.
>
> > @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> > }
> > h++;
> > pending >>= softirq_bit;
> > +
> > + /*
> > + * the softirq's action has been running for too much time
> > + * so it may need to wakeup the ksoftirqd
> > + */
> > + if (need_resched() && sched_clock() > end) {
> > + /*
> > + * Ensure that the remaining pending bits are
> > + * handled.
> > + */
> > + or_softirq_pending(pending << (vec_nr + 1));
>
> To or the value interrupts need to be disabled because otherwise you can
> lose a bit when an interrupt happens in the middle of the RMW operation
> and raises a softirq which is not in @pending and not in the per CPU
> local softirq pending storage.
>
> There is another problem. Assume bit 0 and 1 are pending when the
> processing starts. Now it breaks out after bit 0 has been handled and
> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> again. ksoftirqd runs and handles bit 0, which takes more than the
> timeout. As a result the bit 0 processing can starve all other softirqs.
>
> So this needs more thought.
HI Thomas, The problem as you described, i am trying to solve it, but
i found the
other problem,just like this,for example
the pending bits is 1010110110, when the bit4 was handled, and the
loop processing
time more than 2ms, in my patch, the loop will break early. In the
next time, the loop
will process the bit5, if before the soft action, the
bit6,bit8,bit0,bit3 were set, the loop
will process the bit5,6,7,8,9,0,1,2,3, that will be the bit6 and bit8
wil be processed before
bit0 and bit3.
Do we need to consider this scenario. Do you have any good suggestions.
Thanks.
>
> Thanks,
>
> tglx