Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935428AbeAKSsv (ORCPT + 1 other); Thu, 11 Jan 2018 13:48:51 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:44128 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932703AbeAKSst (ORCPT ); Thu, 11 Jan 2018 13:48:49 -0500 X-Google-Smtp-Source: ACJfBotKAjNATpyjd4RYObch9me5iXsDNgOEStENrZb3ptmpKP9Cgqsavc60kGxbNr54jEAVwgph2h8chthCpePDUns= MIME-Version: 1.0 In-Reply-To: <20180111163204.GE6176@hirez.programming.kicks-ass.net> References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com> <1515620880.3350.44.camel@arista.com> <20180111032232.GA11633@lerouge> <20180111044456.GC11633@lerouge> <1515681091.3039.21.camel@arista.com> <20180111163204.GE6176@hirez.programming.kicks-ass.net> From: Linus Torvalds Date: Thu, 11 Jan 2018 10:48:48 -0800 X-Google-Sender-Auth: vVQllUOUmVlqyC9l_JkwR_8EisY Message-ID: Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context To: Peter Zijlstra Cc: Eric Dumazet , Dmitry Safonov , Frederic Weisbecker , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Andrew Morton , David Miller , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Paolo Abeni , "Paul E. McKenney" , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 11, 2018 at 8:32 AM, Peter Zijlstra wrote: > On Thu, Jan 11, 2018 at 08:20:18AM -0800, Eric Dumazet wrote: >> diff --git a/kernel/softirq.c b/kernel/softirq.c >> index 2f5e87f1bae22f3df44fa4493fcc8b255882267f..d2f20daf77d14dc8ebde00d7c4a0237152d082ba >> 100644 >> --- a/kernel/softirq.c >> +++ b/kernel/softirq.c >> @@ -192,7 +192,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); >> >> /* >> * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, >> - * but break the loop if need_resched() is set or after 2 ms. >> + * but break the loop after 2 ms. >> * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in >> * certain cases, such as stop_machine(), jiffies may cease to >> * increment and so we need the MAX_SOFTIRQ_RESTART limit as >> @@ -299,8 +299,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) >> >> pending = local_softirq_pending(); >> if (pending) { >> - if (time_before(jiffies, end) && !need_resched() && >> - --max_restart) >> + if (time_before(jiffies, end) && --max_restart) >> goto restart; >> >> wakeup_softirqd(); > > You just introduced a 2ms preempt-disable region I think, that's not > cool for PREEMPT and a plain bug on PREEMPT_RT. I actually think that he whole "time_before()" is garbage and should be removed. If doing 10 rounds of softirq processing takes several jiffies, you have bigger issue. Looking at the history, the original time_before() was _replacing_ the "max_retry". See commit c10d73671ad3 ("softirq: reduce latencies"). But then "max_restart" was put back in because jiffies just isn't reliable in the first place. And I think *this* is the real issue. Networking does a *shitton* of things in softirq context, and the problem is not the softirq code, it's the networking code. The softirq code doesn't understand that the networking code does billions of operations, and one softirq can take ages and ages because it is handling millions of packets. My feeling is that it's the networking code that should notice "I have a billion packets in my softirq queue, and I just keep getting more, now *I* should start doing things in a thread instead". Because none of those limits really make sense for any of the other softirq users. Everybody else just wants a low-latency "soft interrupt", not some stupid thread. Who knew? Maybe the name should have given the thing away? If you want a thread, use a tasklet and add_task_work(). Maybe networking should reconsider its use of softirqs entirely? Linus