MIME-Version: 1.0
In-Reply-To: <1515620880.3350.44.camel@arista.com>
References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com>
 <CANn89iK3M97MN0Pf3nXb+UAqqhUWOdSthHRBTYCwP75Ax_hO8Q@mail.gmail.com> <1515620880.3350.44.camel@arista.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 10 Jan 2018 18:13:01 -0800
Message-ID: <CA+55aFyKKt4_5RT9RT8ZH-W26hC8=AvRYf8YxBm98dGSWwFs8g@mail.gmail.com>
Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context
To: Dmitry Safonov <dima@arista.com>
Cc: Eric Dumazet <edumazet@google.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Dmitry Safonov <0x7f454c46@gmail.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        David Miller <davem@davemloft.net>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Hannes Frederic Sowa <hannes@stressinduktion.org>,
        Ingo Molnar <mingo@kernel.org>,
        "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>,
        Paolo Abeni <pabeni@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Radu Rendec <rrendec@arista.com>,
        Rik van Riel <riel@redhat.com>,
        Stanislaw Gruszka <sgruszka@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Wanpeng Li <wanpeng.li@hotmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org

On Wed, Jan 10, 2018 at 1:48 PM, Dmitry Safonov <dima@arista.com> wrote:
> Hmm, what if we use some other logic for deferring/non-deferring
> like checking how many softirqs where serviced during process's
> timeslice and decide if proceed with __do_softirq() or defer it
> not to starve a task? Might that make sense?

Yes, but it might also be hard to come up with a good heuristic.

We actually *have* a fairly good heuristic right now: we end up
punting to softirqd if we have too much work at one synchronous event.
We simply count, and refuse to do too much, and say "Ok, wake up
ksoftirqd".

That has worked fairly well for a long time, and I think it's
fundamentally the right thing to do.

I think that the problem with the "once you punt to ksoftirqd, _keep_
punting to it" in commit 4cd13c21b207 ("softirq: Let ksoftirqd do its
job") was that it simply went much too far.

Doing it under heavy load once is fine. But then what happens is that
ksoftirqd keeps running (for the same reason that we woke it up in the
first place), and then eventually it gets scheduled away because it's
doing a lot of work.

And I think THAT is when the ksoftirqd scheduling latencies get bad.
Not on initial "push things to ksoftirqd". If ksoftirqd hasn't been
running, then the scheduler will be pretty eager to pick it.

But if ksoftirqd has been using CPU time, and gets preempted by other
things (and it's pretty eager to do so - see the whole
"need_resched()" in __do_softirq()), now we're really talking long
latencies when there are other runnable processes.

And dammit, softirq latencies are *MORE IMPORTANT* than some random
user process scheduling. But the ksoftirqd_running() code will just
see "ok, it's runnable, I'm not going to run anything synchronously",
and let those softirq things wait. We're talking packet scheduling,
but we're talking other things too.

So just saying "hey, ksoftirq is runnable - but maybe not running
_now"" and ignoring softirqs entirely is just stupid. Even if we could
easily do another small bunch of them, at least the non-networking
ones.

So maybe that "ksoftirqd_running()" check should actually be something like

  static bool ksoftirqd_running(void)
  {
        struct task_struct *tsk = __this_cpu_read(ksoftirqd);

        return tsk == current;
  }

which actually checks that ksoftirq is running right *now*, and not
scheduled away because somebody is running a perl script.

                Linus