Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965120AbeALSye (ORCPT + 1 other); Fri, 12 Jan 2018 13:54:34 -0500 Received: from mail.kernel.org ([198.145.29.99]:53548 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965020AbeALSyd (ORCPT ); Fri, 12 Jan 2018 13:54:33 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0DB422173D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=frederic@kernel.org Date: Fri, 12 Jan 2018 19:54:29 +0100 From: Frederic Weisbecker To: Linus Torvalds Cc: Eric Dumazet , LKML , Levin Alexander , Peter Zijlstra , Hannes Frederic Sowa , "Paul E . McKenney" , Wanpeng Li , Dmitry Safonov , Thomas Gleixner , Radu Rendec , Ingo Molnar , Stanislaw Gruszka , Paolo Abeni , Rik van Riel , Andrew Morton , David Miller Subject: Re: [RFC PATCH 1/2] softirq: Account time and iteration stats per vector Message-ID: <20180112185428.GE1950@lerouge> References: <1515735354-19279-1-git-send-email-frederic@kernel.org> <1515735354-19279-2-git-send-email-frederic@kernel.org> <20180112143448.GA1950@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Fri, Jan 12, 2018 at 10:12:32AM -0800, Linus Torvalds wrote: > On Fri, Jan 12, 2018 at 6:34 AM, Frederic Weisbecker > wrote: > > > > That's right. But I thought it was bit large for the stack: > > > > struct { > > u64 time; > > u64 count; > > } [NR_SOFTIRQS] > > Note that you definitely don't want "u64" here. > > Both of these values had better be very limited. The "count" is on the > order of 10 - it fits in 4 _bits_ without any overflow. > > And 'time' is on the order of 2ms, so even if it's in nanoseconds, we > already know that we want to limit it to a single ms or so (yes, yes, > right now our limit is 2ms, but I think that's long). So even that > doesn't need 64-bit. Ok. > > Finally, I think you can join them. If we do a "time or count" limit, > let's just make the "count" act as some arbitrary fixed time, so that > we limit things that way. > > Say, if we want to limit it to 2ms, consider one count to be 0.2ms. So > instead of keeping track of count at all, just say "make each softirq > call count as at least 200,000ns even if the scheduler clock says it's > less". End result: we'd loop at most ten times. > > So now you only need one value, and you know it can't be bigger than 2 > million, so it can be a 32-bit one. Boom. Done. Right. Now I believe that the time was added as a limit because count alone was not reliable enough to diagnose a softirq overrun. But if everyone is fine with keeping the count as a single metric, I would be much happier because that means less overhead, no need to fetch the clock, etc... > > Also, don't you want these to be percpu, and keep accumulating them > until you decide to either age them away (just clear it in timer > interrupt?) or if the value gets so big that you want o fall back to > the thread instead (and then the thread can clear it every iteration, > so you don't need to track whether the thread is active or not). > > I don't know. I'm traveling today, so I didn't actually have time to > really look at the patches, I'm just reacting to Eric's reaction. Clearing the accumulation on tick and flush, that sounds like a good plan. Well I'm probably not going to use the tick for that because of nohz (again) but I can check if jiffies changed since we started the accumulation and reset it if so. I'm going to respin, thanks!