Date:   Fri, 3 Mar 2023 17:25:35 -0800
From:   "Paul E. McKenney" <paulmck@kernel.org>
To:     Jakub Kicinski <kuba@kernel.org>
Cc:     Thomas Gleixner <tglx@linutronix.de>, peterz@infradead.org,
        jstultz@google.com, edumazet@google.com, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()
Message-ID: <20230304012535.GF1301832@paulmck-ThinkPad-P17-Gen-1>
Reply-To: paulmck@kernel.org
References: <20221222221244.1290833-1-kuba@kernel.org>
 <20221222221244.1290833-3-kuba@kernel.org>
 <87r0u6j721.ffs@tglx>
 <20230303133143.7b35433f@kernel.org>
 <20230303223739.GC1301832@paulmck-ThinkPad-P17-Gen-1>
 <20230303233627.GA2136520@paulmck-ThinkPad-P17-Gen-1>
 <20230303154413.1d846ac3@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230303154413.1d846ac3@kernel.org>
Precedence: bulk

On Fri, Mar 03, 2023 at 03:44:13PM -0800, Jakub Kicinski wrote:
> On Fri, 3 Mar 2023 15:36:27 -0800 Paul E. McKenney wrote:
> > On Fri, Mar 03, 2023 at 02:37:39PM -0800, Paul E. McKenney wrote:
> > > On Fri, Mar 03, 2023 at 01:31:43PM -0800, Jakub Kicinski wrote:  
> > > > Now - now about the max loop count. I ORed the pending softirqs every
> > > > time we get to the end of the loop. Looks like vast majority of the
> > > > loop counter wake ups are exclusively due to RCU:
> > > > 
> > > > @looped[512]: 5516
> > > > 
> > > > Where 512 is the ORed pending mask over all iterations
> > > > 512 == 1 << RCU_SOFTIRQ.
> > > > 
> > > > And they usually take less than 100us to consume the 10 iterations.
> > > > Histogram of usecs consumed when we run out of loop iterations:
> > > > 
> > > > [16, 32)               3 |                                                    |
> > > > [32, 64)            4786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > > > [64, 128)            871 |@@@@@@@@@                                           |
> > > > [128, 256)            34 |                                                    |
> > > > [256, 512)             9 |                                                    |
> > > > [512, 1K)            262 |@@                                                  |
> > > > [1K, 2K)              35 |                                                    |
> > > > [2K, 4K)               1 |                                                    |
> > > > 
> > > > Paul, is this expected? Is RCU not trying too hard to be nice?  
> > > 
> > > This is from way back in the day, so it is quite possible that better
> > > tuning and/or better heuristics should be applied.
> > > 
> > > On the other hand, 100 microseconds is a good long time from an
> > > CONFIG_PREEMPT_RT=y perspective!
> > >   
> > > > # cat /sys/module/rcutree/parameters/blimit
> > > > 10
> > > > 
> > > > Or should we perhaps just raise the loop limit? Breaking after less 
> > > > than 100usec seems excessive :(  
> > > 
> > > But note that RCU also has rcutree.rcu_divisor, which defaults to 7.
> > > And an rcutree.rcu_resched_ns, which defaults to three milliseconds
> > > (3,000,000 nanoseconds).  This means that RCU will do:
> > > 
> > > o	All the callbacks if there are less than ten.
> > > 
> > > o	Ten callbacks or 1/128th of them, whichever is larger.
> > > 
> > > o	Unless the larger of them is more than 100 callbacks, in which
> > > 	case there is an additional limit of three milliseconds worth
> > > 	of them.
> > > 
> > > Except that if a given CPU ends up with more than 10,000 callbacks
> > > (rcutree.qhimark), that CPU's blimit is set to 10,000.  
> > 
> > Also, if in the context of a softirq handler (as opposed to ksoftirqd)
> > that interrupted the idle task with no pending task, the count of
> > callbacks is ignored and only the 3-millisecond limit counts.  In the
> > context of ksoftirq, the only limit is that which the scheduler chooses
> > to impose.
> > 
> > But it sure seems like the ksoftirqd case should also pay attention to
> > that 3-millisecond limit.  I will queue a patch to that effect, and maybe
> > Eric Dumazet will show me the error of my ways.
> 
> Just to be sure - have you seen Peter's patches?
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/softirq
> 
> I think it feeds the time limit to the callback from softirq,
> so the local 3ms is no more?

I might or might not have back in September of 2020.  ;-)

But either way, the question remains:  Should RCU_SOFTIRQ do time checking
in ksoftirqd context?  Seems like the answer should be "yes", independently
of Peter's patches.

							Thanx, Paul