> -----Original Message-----
> From: liujian (CE)
> Sent: Wednesday, February 15, 2023 4:34 PM
> To: 'John Stultz' <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; Paul E. McKenney <[email protected]>
> Subject: RE: [Question] softlockup in run_timer_softirq
>
>
>
> > -----Original Message-----
> > From: John Stultz [mailto:[email protected]]
> > Sent: Tuesday, February 14, 2023 4:01 AM
> > To: liujian (CE) <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; Paul E. McKenney
> > <[email protected]>
> > Subject: Re: [Question] softlockup in run_timer_softirq
> >
> > On Fri, Feb 10, 2023 at 1:51 AM liujian (CE) <[email protected]> wrote:
> > >
> > > During the syz test, we encountered many problems with various timer
> > > handler functions softlockup.
> > >
> > > We analyze __run_timers() and find the following problem.
> > >
> > > In the while loop of __run_timers(), because there are too many
> > > timers or improper timer handler functions, if the processing time
> > > of the expired timers is always greater than the time wheel's
> > > next_expiry, the function will loop infinitely.
> > >
> > > The following extreme test case can be used to reproduce the problem.
> > > An extreme test case[1] is constructed to reproduce the problem.
> >
> > Thanks for reporting and sending out this data:
> >
> > First, any chance you might submit this as a in-kernel-stress test?
> > Maybe utilizing the kernel/torture.c framework?
> >
> Okay, I'll learn this framework and do this thing.
> > (Though the test may need to occasionally take a break so the system
> > can eventually catch up)
> >
> > > Is this a problem or an unreasonable use?
> > >
> > > Can we limit the running time of __run_timers() [2]?
> > >
> > > Does anyone have a good idea to solve this problem?
> >
> > So your patch reminds me of Peter's softirq_needs_break() logic:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?
> > h=co
> > re/softirq
> >
> > Maybe it could extend that series for the timer softirq as well?
> >
> Thank you. Yes.
> Base on the patchset and the extended patch for timer [1], the soft lockup
> problem does not occur.
>
> By the way, I see this is a very old patchset? Will this patchset push the main
> line? @John @Peter
>
Hi, peter,
Do you have an upstream plan for this patchset? Or other ideas.
I want to use softirq_needs_break() to limit the runtime of timer soft interrupt handler function, wonder if this is appropriate?
Thank you~
>
> [1]
> Author: Liu Jian <[email protected]>
> Date: Tue Feb 14 09:53:46 2023 +0800
>
> softirq, timer: Use softirq_needs_break()
>
> In the while loop of __run_timers(), because there are too many timers or
> improper timer handler functions, if the processing time of the expired
> timers is always greater than the time wheel's next_expiry, the function
> will loop infinitely.
>
> To prevent this, use the timeout/break logic provided by SoftIRQs.If the
> running time exceeds the limit, break the loop and an additional
> TIMER_SOFTIRQ is triggered.
>
> Signed-off-by: Liu Jian <[email protected]>
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c index
> 63a8ce7177dd..70744a469a39 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1992,7 +1992,7 @@ void timer_clear_idle(void)
> * __run_timers - run all expired timers (if any) on this CPU.
> * @base: the timer vector to be processed.
> */
> -static inline void __run_timers(struct timer_base *base)
> +static inline void __run_timers(struct timer_base *base, struct
> +softirq_action *h)
> {
> struct hlist_head heads[LVL_DEPTH];
> int levels;
> @@ -2020,6 +2020,12 @@ static inline void __run_timers(struct timer_base
> *base)
>
> while (levels--)
> expire_timers(base, heads + levels);
> +
> + if (softirq_needs_break(h)) {
> + if (time_after_eq(jiffies, base->next_expiry))
> + __raise_softirq_irqoff(TIMER_SOFTIRQ);
> + break;
> + }
> }
> raw_spin_unlock_irq(&base->lock);
> timer_base_unlock_expiry(base);
> @@ -2032,9 +2038,9 @@ static __latent_entropy void
> run_timer_softirq(struct softirq_action *h) {
> struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
>
> - __run_timers(base);
> + __run_timers(base, h);
> if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
> - __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
> + __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]), h);
> }
>
> /*
> > thanks
> > -john
On Fri, Mar 3, 2023 at 2:55 AM liujian (CE) <[email protected]> wrote:
> From: liujian (CE)
> > From: John Stultz [mailto:[email protected]]
> > > On Fri, Feb 10, 2023 at 1:51 AM liujian (CE) <[email protected]> wrote:
> > > > Can we limit the running time of __run_timers() [2]?
> > > >
> > > > Does anyone have a good idea to solve this problem?
> > >
> > > So your patch reminds me of Peter's softirq_needs_break() logic:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?
> > > h=co
> > > re/softirq
> > >
> > > Maybe it could extend that series for the timer softirq as well?
> > >
> > Thank you. Yes.
> > Base on the patchset and the extended patch for timer [1], the soft lockup
> > problem does not occur.
> >
> > By the way, I see this is a very old patchset? Will this patchset push the main
> > line? @John @Peter
> >
> Hi, peter,
> Do you have an upstream plan for this patchset? Or other ideas.
> I want to use softirq_needs_break() to limit the runtime of timer soft interrupt handler function, wonder if this is appropriate?
My understanding is that the series was proposed but maybe caused some
regressions for the networking softirqs it was initially targeting, so
it's been stalled out.
However, if you utilize the logic to help with the timer softirq
first, that might help land the logic, and then the networking folks
can try to adapt and slowly sort out the regressions.
So I'd suggest if you have the patches working for softirqs and
showing benefit, submit them for review.
I personally was looking at the series for the block softirq (which I
got working against a 5.10 kernel), but unfortunately that code has
since been reworked to use lockless lists for its work items, it's
hard to stop if the need_break signal is set, as once we've dequeued
list entries from the list, we have to run them, as we can't easily
re-add them back in the proper list order. :/ Something I need to
re-visit eventually.
thanks
-john