Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934007AbeAKObg (ORCPT + 1 other); Thu, 11 Jan 2018 09:31:36 -0500 Received: from mail-wm0-f41.google.com ([74.125.82.41]:45498 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933378AbeAKObe (ORCPT ); Thu, 11 Jan 2018 09:31:34 -0500 X-Google-Smtp-Source: ACJfBouA4TaY1oKY3w9bQtu4R8bqVbLjnbXSbOLiuyPx2LGL953YBInmSHE5gs63GJzzGUJFIrheZQ== Message-ID: <1515681091.3039.21.camel@arista.com> Subject: Re: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context From: Dmitry Safonov To: Frederic Weisbecker , Linus Torvalds Cc: Eric Dumazet , LKML , Dmitry Safonov <0x7f454c46@gmail.com>, Andrew Morton , David Miller , Frederic Weisbecker , Hannes Frederic Sowa , Ingo Molnar , "Levin, Alexander (Sasha Levin)" , Paolo Abeni , "Paul E. McKenney" , Peter Zijlstra , Radu Rendec , Rik van Riel , Stanislaw Gruszka , Thomas Gleixner , Wanpeng Li Date: Thu, 11 Jan 2018 14:31:31 +0000 In-Reply-To: <20180111044456.GC11633@lerouge> References: <20180109133623.10711-1-dima@arista.com> <20180109133623.10711-2-dima@arista.com> <1515620880.3350.44.camel@arista.com> <20180111032232.GA11633@lerouge> <20180111044456.GC11633@lerouge> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.24.6 (3.24.6-1.fc26) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, 2018-01-11 at 05:44 +0100, Frederic Weisbecker wrote: > On Wed, Jan 10, 2018 at 08:19:49PM -0800, Linus Torvalds wrote: > > On Wed, Jan 10, 2018 at 7:22 PM, Frederic Weisbecker > > wrote: > > > > > > Makes sense, but I think you need to keep the TASK_RUNNING check. > > > > Yes, good point. > > > > > So perhaps it should be: > > > > > > - return tsk && (tsk->state == TASK_RUNNING); > > > + return (tsk == current) && (tsk->state == TASK_RUNNING); > > > > Looks good to me - definitely worth trying. > > > > Maybe that weakens the thing so much that it doesn't actually help > > the > > UDP packet storm case? > > > > And maybe it's not sufficient for the dvb issue. > > > > But I think it's worth at least testing. Maybe it makes neither > > side > > entirely happy, but maybe it might be a good halfway point? > > Yes I believe Dmitry is facing a different problem where he would > rather > see ksoftirqd scheduled more often to handle the queue as a deferred > batch > instead of having it served one by one on the tails of IRQ storms. > (Dmitry correct me if I misunderstood). Quite so, what I see is that ksoftirqd is rarely (close to never) scheduled in case of UDP packet storm. That's because the up coming irq is too late in __do_softirq(). So, there is no wakeup on UDP storm here: : pending = local_softirq_pending(); : if (pending & mask) { : if (time_before(jiffies, end) && !need_resched() && : --max_restart) : goto restart; : : wakeup_softirqd(); : } (as there is yet no pending softirq). It comes a bit late to schedule ksoftirqd and in result the next softirq is processed on the context of the task again, not in the scheduled ksoftirqd. That results in cpu-time starvation for the process on irq storm. While I saw that on out-of-tree driver, I believe that on some frequencies (lower than storm) one can observe the same on mainstream drivers. And I *think* that I've reproduced that on mainstream with virtio driver and package size of 1500 in VMs (thou I don't quite like the perf testing in VMs). So, ITOW, maybe there is a bit better way to *detect* that cpu time spent on serving softirqs is close to storm and that userspace starts starving? (and launch ksoftirqd in the result or balance between deferring and serving softirq right-there). > But your patch still seems to make sense for the case you described: > when > ksoftirqd is voluntarily preempted off and the current IRQ could > handle the > queue.