Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp26280094rwd; Mon, 3 Jul 2023 07:41:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlEZoGFY5BWHCg7EdqqTikbxO9vyhqlZMCNm6N7YXy6bmfuajMDvPROrsuqIGObOwq7j2tx1 X-Received: by 2002:a17:90a:9f89:b0:262:df1d:8e16 with SMTP id o9-20020a17090a9f8900b00262df1d8e16mr8210548pjp.33.1688395279758; Mon, 03 Jul 2023 07:41:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688395279; cv=none; d=google.com; s=arc-20160816; b=eZV82CiV8awk9MHKBZrukPjvUIMIIxL+JEx5AI4iROF/KFIzR0BY08WqaJOX6b/bYS 3SV0C8aDxmbXmXjuO6WzRSYC4dqLTl7U8YxMXItNlnSTErud02fOLbJxED/rcNo44yyB TIUrOsCFb49zphjjSXo54Ns6CP0i57roYzw0ZkrOqsHeVoTYCvWzjnanPI1PDa7/F8QB wS/t+KYTEKk+mrh5E9a7e5sI0bvoyIzrYt/h+p0AxoYzUNUtscKrwFV9kzXZoX0aY+PI 2WmoEz6/w4v2YSbOAF3N88Vq76qQ8oYKkDMeUYVcxHu/WaPneZTUG+OU/oJg3+BJ+v14 Mdng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:dkim-signature:dkim-signature:date; bh=/4eXTSstuIJtNfCSirswPN+SDcANih1rBvCVlFV4lb4=; fh=ObfboR00WxUSKPWSq8vWSqVLVnbak0VQQFrkzDm8XS8=; b=F7CukPLFnSAADdLJkFgpKb0oCtkLSRYp3HueKdWggKWKaesU15KOccZwDUlOAZA1C9 VWetC8qN7cqAsXDd0VQq9l62wfRLWTNxP+JHPiVWYOGy1Brvl7XJR7tb3oyoxKyUrj5M T/eyn7KgCYXxV3mWIWA4doYT6tJvt0Kz3aB7BdSx7coKNyd5huacTqXntJIbNJCYonMe DlcgVxwl6e/kFma+J49J3TJd5YGT3GIi3/12mHAjpxgYm/kHvHcFX4+VZy5zvojIakDr yfMAzueOBn34WCIKCBC2M+5l9hYpuAjn2qiJdZvGIepCcvUZqTUbP2q6jcyTobBEgK7M 7X7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="Hy/0MJ7R"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EzUh5gfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h185-20020a6383c2000000b0055baed7e7e9si1903517pge.653.2023.07.03.07.41.03; Mon, 03 Jul 2023 07:41:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="Hy/0MJ7R"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EzUh5gfl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229973AbjGCO3t (ORCPT + 99 others); Mon, 3 Jul 2023 10:29:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbjGCO3R (ORCPT ); Mon, 3 Jul 2023 10:29:17 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCE27E59; Mon, 3 Jul 2023 07:29:15 -0700 (PDT) Date: Mon, 3 Jul 2023 16:29:08 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1688394553; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/4eXTSstuIJtNfCSirswPN+SDcANih1rBvCVlFV4lb4=; b=Hy/0MJ7RO7NXMGP8iuM0zBhLtQFzhh2IcLKMKCoPxX90Zf0UbbR3ZrIKwIsp3qaxkKfkHw d/ESnbEiW7oo784QU2dG6sMXeQmbBhHwde1mkVw8KpOda0X34/59ks1zp6OwZrCECChPoK DbecWG+j0rSXCXd9+a+lX1rHQxxUFvz4vOHA3Fj9mlDky+sEYUEW0W0itWYqA0wwXQ4mEq qSgwlZaT3IhJ63ujd7ti+o+TyiTjoSrvkv8iBoAUU/0IAF4OFMKHUNDYxspnppOP3LnuAo SvH1a8nj82qrgfthfBggc5fZnr0FjROFvfdET955is1hp8rzscv+cgKnKrdTWg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1688394553; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/4eXTSstuIJtNfCSirswPN+SDcANih1rBvCVlFV4lb4=; b=EzUh5gfl6vKO4+59Fotn/unNfk2mVlawDqvGPIgo5E3VDXYDEwJmsK8Mf2v+C55F1h678m zfBXwIpQuHawyZAA== From: Sebastian Andrzej Siewior To: Wander Lairson Costa Cc: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, juri.lelli@redhat.com Subject: Re: Splat in kernel RT while processing incoming network packets Message-ID: <20230703142908.RcxjjF_E@linutronix.de> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023-07-03 09:47:26 [-0300], Wander Lairson Costa wrote: > Dear all, Hi, > I am writing to report a splat issue we encountered while running the > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet > Steering). >=20 > During some testing of the RT kernel version 6.4.0 with Network RPS enabl= ed, > we observed a splat occurring in the SoftIRQ subsystem. The splat message= is as > follows: >=20 > [ 37.168920] ------------[ cut here ]------------ > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_= post_smp_call_flush+0x2d/0x60 =E2=80=A6 > [ 37.169060] ---[ end trace 0000000000000000 ]--- >=20 > It comes from [1]. >=20 > The issue lies in the mechanism of RPS to defer network packets processin= g to > other CPUs. It sends an IPI to the to the target CPU. The registered call= back > is rps_trigger_softirq, which will raise a softirq, leading to the follow= ing > scenario: >=20 > CPU0 CPU1 > | netif_rx() | > | | enqueue_to_backlog(cpu=3D1) | > | | | net_rps_send_ipi() | > | | flush_smp_call_function_queue() > | | | was_pending =3D local_softirq= _pending() > | | | __flush_smp_call_function_que= ue() > | | | rps_trigger_softirq() > | | | | __raise_softirq_irqoff() > | | | do_softirq_post_smp_call_flus= h() >=20 > That has the undesired side effect of raising a softirq in a function cal= l, > leading to the aforementioned splat. correct. > The kernel version is kernel-ark [1], os-build-rt branch. It is essential= ly the > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can > provide the .config. It is fine, I see it. > The only solution I imagined so far was to modify RPS to process packtes = in a > kernel thread in RT. But I wonder how would be that be different than pro= cessing > them in ksoftirqd. >=20 > Any inputs on the issue? Not sure how to proceed. One thing you could do is a hack similar like net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and remove the warning because we have now commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") Prior that, raising softirq from hardirq would wake ksoftirqd which in turn would collect all pending softirqs. As a consequence all following softirqs (networking, =E2=80=A6) would run as SCHED_OTHER and compete with SCHED_OTHER tasks for resources. Not good because the networking work is no longer processed within the networking interrupt thread. Also not a DDoS kind of situation where one could want to delay processing. With that change, this isn't the case anymore. Only an "unrelated" IRQ thread could pick up the networking work which is less then ideal. That is because the global softirq set is added, ksoftirq is marked for a wakeup and could be delayed because other tasks are busy. Then the disk interrupt (for instance) could pick it up as part of its threaded interrupt. Now that I think about, we could make the backlog pseudo device a thread. NAPI threading enables one thread but here we would need one thread per-CPU. So it would remain kind of special. But we would avoid clobbering the global state and delay everything to ksoftird. Processing it in ksoftirqd might not be ideal from performance point of view. > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 >=20 > Cheers, > Wander Sebastian