MIME-Version: 1.0
In-Reply-To: <CA+55aFx+1tFpnLBXjZKoYsMMVPakeP8nycyfMpF7agUXz_kGkQ@mail.gmail.com>
References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com>
 <CANn89iK3M97MN0Pf3nXb+UAqqhUWOdSthHRBTYCwP75Ax_hO8Q@mail.gmail.com>
 <1515620880.3350.44.camel@arista.com> <CA+55aFyKKt4_5RT9RT8ZH-W26hC8=AvRYf8YxBm98dGSWwFs8g@mail.gmail.com>
 <20180111032232.GA11633@lerouge> <CA+55aFx_3zwQJ0YbDCL4YxpWEWhcEZfJnn42LzWBWDi3h1VdGA@mail.gmail.com>
 <20180111044456.GC11633@lerouge> <1515681091.3039.21.camel@arista.com>
 <CANn89i+mVmzrZ14Kttt=J0wsDOMHhm8CHiMRLQwEZXMxiVpftg@mail.gmail.com>
 <20180111163204.GE6176@hirez.programming.kicks-ass.net> <CA+55aFwc3CP-sKOyVvaLab3azmr3LnPfADnGJXDcxYz9dT75=A@mail.gmail.com>
 <CANn89i+ZTLtA5ZLRAbCgM_Cx-2xiwRbDXM4x=-QiM78r5ptcqg@mail.gmail.com>
 <CA+55aFyZPzkjwkLXWWXp3KUfLD7MUtGxSu1Q6vc0O5i9Ea6ZKw@mail.gmail.com>
 <CANn89iJzekwx_Hs0t0O==+gwAfqMyVHBg=gemayZZJXb4bYJdQ@mail.gmail.com> <CA+55aFx+1tFpnLBXjZKoYsMMVPakeP8nycyfMpF7agUXz_kGkQ@mail.gmail.com>
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 11 Jan 2018 12:16:42 -0800
Message-ID: <CANn89i+ehJg_7YfOCicgv_EuQWR6Xn7GHi+g5=atigeXDeNMHw@mail.gmail.com>
Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Dmitry Safonov <dima@arista.com>,
        Frederic Weisbecker <frederic@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Dmitry Safonov <0x7f454c46@gmail.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        David Miller <davem@davemloft.net>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Hannes Frederic Sowa <hannes@stressinduktion.org>,
        Ingo Molnar <mingo@kernel.org>,
        "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>,
        Paolo Abeni <pabeni@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Radu Rendec <rrendec@arista.com>,
        Rik van Riel <riel@redhat.com>,
        Stanislaw Gruszka <sgruszka@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Wanpeng Li <wanpeng.li@hotmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org

On Thu, Jan 11, 2018 at 12:03 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 11, 2018 at 11:48 AM, Eric Dumazet <edumazet@google.com> wrote:
>> That was the purpose on the last patch : As soon as ksoftirqd is scheduled
>> (by some kind of jitter in the 99,000 pps workload, or antagonist wakeup),
>> we then switch to a mode where process scheduler can make decisions
>> based on threads prios and cpu usage.
>
> Yeah, but that really screws up everybody else.
>
> It really is a soft *interrupt*. That was what it was designed for.
> The thread handling is not primary, it's literally a fallback to avoid
> complete starvation.
>
> The fact that networking has now - for several years - tried to make
> it some kind of thread and get fairness with user threads is all
> entirely antithetical to what softirq was designed for.
>
>> Then, as soon as the load was able to finish in its quantum the
>> pending irqs, we re-enter the mode
>> where softirq are immediately serviced.
>
> Except that's not at all how the code works.
>
> As I pointed out, the softirq thread can be scheduled away, but the
> "softiq_running()" wil stilll return true - and the networking code
> has now screwed up all the *other* softirqs too!
>
> I really suspect that what networking wants is more like the
> workqueues. Or at least more isolation between different softirq
> users, but that's fairly fundamentally hard, given how softirqs are
> designed.
>

Note that when I implemented TCP Small queues, I did experiments between
using a work queue or a tasklet, and workqueues added unacceptable P99
latencies,
when many user threads are competing with kernel threads.

I suspect that firing a worqueue for networking RX will likely have
the same effect :/

Note that the current __do_softirq() implementation suffers from the following :

Say we receive NET_RX softirq
-> While processing the packet, we wakeup on thread (thus
need_resched() becomes true),
but also raise a tasklet (because a particular driver needs some extra
processing in tasklet context instead of NET_RX ???)

-> Then we exit __do_softirq() _and_ schedule ksoftirqd (because
tasklet  needs to be serviced)