Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934561AbeAKQUX (ORCPT + 1 other); Thu, 11 Jan 2018 11:20:23 -0500 Received: from mail-wr0-f182.google.com ([209.85.128.182]:40976 "EHLO mail-wr0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933450AbeAKQUV (ORCPT ); Thu, 11 Jan 2018 11:20:21 -0500 X-Google-Smtp-Source: ACJfBouyTno4z3k2Y6edn5CBsphtMKXLBLbjT9rhFJfOPpt5AYV9LvmNPMOB9dNWbbY562RXcz06Sh33TF1RHzHDm9I= MIME-Version: 1.0 In-Reply-To: <1515681091.3039.21.camel@arista.com> References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com> <1515620880.3350.44.camel@arista.com> <20180111032232.GA11633@lerouge> <20180111044456.GC11633@lerouge> <1515681091.3039.21.camel@arista.com> From: Eric Dumazet Date: Thu, 11 Jan 2018 08:20:18 -0800 Message-ID: Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context To: Dmitry Safonov Cc: Frederic Weisbecker , Linus Torvalds , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Andrew Morton , David Miller , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Paolo Abeni , "Paul E. McKenney" , Peter Zijlstra , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 11, 2018 at 6:31 AM, Dmitry Safonov wrote: > On Thu, 2018-01-11 at 05:44 +0100, Frederic Weisbecker wrote: >> On Wed, Jan 10, 2018 at 08:19:49PM -0800, Linus Torvalds wrote: >> > On Wed, Jan 10, 2018 at 7:22 PM, Frederic Weisbecker >> > wrote: >> > > >> > > Makes sense, but I think you need to keep the TASK_RUNNING check. >> > >> > Yes, good point. >> > >> > > So perhaps it should be: >> > > >> > > - return tsk && (tsk->state == TASK_RUNNING); >> > > + return (tsk == current) && (tsk->state == TASK_RUNNING); >> > >> > Looks good to me - definitely worth trying. >> > >> > Maybe that weakens the thing so much that it doesn't actually help >> > the >> > UDP packet storm case? >> > >> > And maybe it's not sufficient for the dvb issue. >> > >> > But I think it's worth at least testing. Maybe it makes neither >> > side >> > entirely happy, but maybe it might be a good halfway point? >> >> Yes I believe Dmitry is facing a different problem where he would >> rather >> see ksoftirqd scheduled more often to handle the queue as a deferred >> batch >> instead of having it served one by one on the tails of IRQ storms. >> (Dmitry correct me if I misunderstood). > > Quite so, what I see is that ksoftirqd is rarely (close to never) > scheduled in case of UDP packet storm. That's because the up coming irq > is too late in __do_softirq(). > So, there is no wakeup on UDP storm here: > : pending = local_softirq_pending(); > : if (pending & mask) { > : if (time_before(jiffies, end) && !need_resched() && > : --max_restart) > : goto restart; > : > : wakeup_softirqd(); > : } > (as there is yet no pending softirq). It comes a bit late to schedule > ksoftirqd and in result the next softirq is processed on the context of > the task again, not in the scheduled ksoftirqd. > That results in cpu-time starvation for the process on irq storm. > > While I saw that on out-of-tree driver, I believe that on some > frequencies (lower than storm) one can observe the same on mainstream > drivers. And I *think* that I've reproduced that on mainstream with > virtio driver and package size of 1500 in VMs (thou I don't quite like > the perf testing in VMs). > > So, ITOW, maybe there is a bit better way to *detect* that cpu time > spent on serving softirqs is close to storm and that userspace starts > starving? (and launch ksoftirqd in the result or balance between > deferring and serving softirq right-there). > >> But your patch still seems to make sense for the case you described: >> when >> ksoftirqd is voluntarily preempted off and the current IRQ could >> handle the >> queue. Note that ksoftirqd being kicked (TASK_RUNNING) is the sign of softirq pressure. Or maybe we lack one bit to signal that __do_softirq() had to wakep_softirq() because of pressure. (If I remember well, I added such state when submitting my first patch, https://www.spinics.net/lists/netdev/msg377172.html then Peter suggested to use tsk->state == TASK_RUNNING https://www.spinics.net/lists/netdev/msg377210.html Maybe the problem is not the new patch, but use of need_resched() in __do_softirq() that I added in 2013 ( commit c10d73671ad30f54692f7f69f0e09e75d3a8926a ) combined with the new patch. diff --git a/kernel/softirq.c b/kernel/softirq.c index 2f5e87f1bae22f3df44fa4493fcc8b255882267f..d2f20daf77d14dc8ebde00d7c4a0237152d082ba 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -192,7 +192,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); /* * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, - * but break the loop if need_resched() is set or after 2 ms. + * but break the loop after 2 ms. * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in * certain cases, such as stop_machine(), jiffies may cease to * increment and so we need the MAX_SOFTIRQ_RESTART limit as @@ -299,8 +299,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) && !need_resched() && - --max_restart) + if (time_before(jiffies, end) && --max_restart) goto restart; wakeup_softirqd();