Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932816AbeAKUQr (ORCPT + 1 other); Thu, 11 Jan 2018 15:16:47 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:41166 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932437AbeAKUQp (ORCPT ); Thu, 11 Jan 2018 15:16:45 -0500 X-Google-Smtp-Source: ACJfBosOtRu0DrcggzoBUtBNZ+k/PL3IGR1L3yPqpORImxNfGa2wqcTleTynjEK0Gs5W3mY890fE3wWkJsTIeEKr6Ek= MIME-Version: 1.0 In-Reply-To: References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com> <1515620880.3350.44.camel@arista.com> <20180111032232.GA11633@lerouge> <20180111044456.GC11633@lerouge> <1515681091.3039.21.camel@arista.com> <20180111163204.GE6176@hirez.programming.kicks-ass.net> From: Eric Dumazet Date: Thu, 11 Jan 2018 12:16:42 -0800 Message-ID: Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context To: Linus Torvalds Cc: Peter Zijlstra , Dmitry Safonov , Frederic Weisbecker , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Andrew Morton , David Miller , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Paolo Abeni , "Paul E. McKenney" , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 11, 2018 at 12:03 PM, Linus Torvalds wrote: > On Thu, Jan 11, 2018 at 11:48 AM, Eric Dumazet wrote: >> That was the purpose on the last patch : As soon as ksoftirqd is scheduled >> (by some kind of jitter in the 99,000 pps workload, or antagonist wakeup), >> we then switch to a mode where process scheduler can make decisions >> based on threads prios and cpu usage. > > Yeah, but that really screws up everybody else. > > It really is a soft *interrupt*. That was what it was designed for. > The thread handling is not primary, it's literally a fallback to avoid > complete starvation. > > The fact that networking has now - for several years - tried to make > it some kind of thread and get fairness with user threads is all > entirely antithetical to what softirq was designed for. > >> Then, as soon as the load was able to finish in its quantum the >> pending irqs, we re-enter the mode >> where softirq are immediately serviced. > > Except that's not at all how the code works. > > As I pointed out, the softirq thread can be scheduled away, but the > "softiq_running()" wil stilll return true - and the networking code > has now screwed up all the *other* softirqs too! > > I really suspect that what networking wants is more like the > workqueues. Or at least more isolation between different softirq > users, but that's fairly fundamentally hard, given how softirqs are > designed. > Note that when I implemented TCP Small queues, I did experiments between using a work queue or a tasklet, and workqueues added unacceptable P99 latencies, when many user threads are competing with kernel threads. I suspect that firing a worqueue for networking RX will likely have the same effect :/ Note that the current __do_softirq() implementation suffers from the following : Say we receive NET_RX softirq -> While processing the packet, we wakeup on thread (thus need_resched() becomes true), but also raise a tasklet (because a particular driver needs some extra processing in tasklet context instead of NET_RX ???) -> Then we exit __do_softirq() _and_ schedule ksoftirqd (because tasklet needs to be serviced)