Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754562AbeAKCNP (ORCPT + 1 other); Wed, 10 Jan 2018 21:13:15 -0500 Received: from mail-it0-f66.google.com ([209.85.214.66]:35141 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754525AbeAKCNC (ORCPT ); Wed, 10 Jan 2018 21:13:02 -0500 X-Google-Smtp-Source: ACJfBovuwGpGhoH3mf8hFeS4MVGUYDfKx+Ic+IpAUp59JCRidnwy9McbaA5NZ/Q+IzkSSpPL/pFPnIYBs9KO/WbseRY= MIME-Version: 1.0 In-Reply-To: <1515620880.3350.44.camel@arista.com> References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com> <1515620880.3350.44.camel@arista.com> From: Linus Torvalds Date: Wed, 10 Jan 2018 18:13:01 -0800 X-Google-Sender-Auth: 0Zj-tT-6qXnsHkc1eFLBwVUQnG8 Message-ID: Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context To: Dmitry Safonov Cc: Eric Dumazet , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Andrew Morton , David Miller , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Paolo Abeni , "Paul E. McKenney" , Peter Zijlstra , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 10, 2018 at 1:48 PM, Dmitry Safonov wrote: > Hmm, what if we use some other logic for deferring/non-deferring > like checking how many softirqs where serviced during process's > timeslice and decide if proceed with __do_softirq() or defer it > not to starve a task? Might that make sense? Yes, but it might also be hard to come up with a good heuristic. We actually *have* a fairly good heuristic right now: we end up punting to softirqd if we have too much work at one synchronous event. We simply count, and refuse to do too much, and say "Ok, wake up ksoftirqd". That has worked fairly well for a long time, and I think it's fundamentally the right thing to do. I think that the problem with the "once you punt to ksoftirqd, _keep_ punting to it" in commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") was that it simply went much too far. Doing it under heavy load once is fine. But then what happens is that ksoftirqd keeps running (for the same reason that we woke it up in the first place), and then eventually it gets scheduled away because it's doing a lot of work. And I think THAT is when the ksoftirqd scheduling latencies get bad. Not on initial "push things to ksoftirqd". If ksoftirqd hasn't been running, then the scheduler will be pretty eager to pick it. But if ksoftirqd has been using CPU time, and gets preempted by other things (and it's pretty eager to do so - see the whole "need_resched()" in __do_softirq()), now we're really talking long latencies when there are other runnable processes. And dammit, softirq latencies are *MORE IMPORTANT* than some random user process scheduling. But the ksoftirqd_running() code will just see "ok, it's runnable, I'm not going to run anything synchronously", and let those softirq things wait. We're talking packet scheduling, but we're talking other things too. So just saying "hey, ksoftirq is runnable - but maybe not running _now"" and ignoring softirqs entirely is just stupid. Even if we could easily do another small bunch of them, at least the non-networking ones. So maybe that "ksoftirqd_running()" check should actually be something like static bool ksoftirqd_running(void) { struct task_struct *tsk = __this_cpu_read(ksoftirqd); return tsk == current; } which actually checks that ksoftirq is running right *now*, and not scheduled away because somebody is running a perl script. Linus