Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2413045rdg; Mon, 16 Oct 2023 03:57:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFKbu3qNe6lEAl4TKpbFuT0iteh3fRb2FAbYukrQ7bGTBw/O47Bvf7rSeHH8AXYIoeekeP2 X-Received: by 2002:a05:6a00:2d87:b0:6be:308:e61b with SMTP id fb7-20020a056a002d8700b006be0308e61bmr2073145pfb.10.1697453844180; Mon, 16 Oct 2023 03:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697453844; cv=none; d=google.com; s=arc-20160816; b=vMQSumgq0RAijPvrgT67u/ayqM8r69ZEGhibDZqb7eLR24Xws0XJbU4E89M08xdoE/ 2y4dmQ9d4qhPJykoct+NOEGdGOBg5KgaMdMecUEEf+/far67MdcbOFae38XnuzioK9R0 t0AT2JcUOKQkb1GhWk9cGCWgQLBgTXiHMMKDax6tH5nQXWGyuKVeRmgyT9iXZLQDeyhU KIV+OJ75FCouiNUsjXWLR5QjHhtmVvyxgT5tdEl9FU2u7uE8HradBZjAUuNPrRe0j0nR f+FczkszF//CLOWGVKjv90K6HT7vWQ8F7i1mhvacFtnhIpvpU8eR4Dlyk1AFHMRj19xv 7r0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=+GKEx4RoNEwbCJcZsFqe13UzLKoclvdVCKQuYzrcdEk=; fh=B6DIYVGGuJpsCccRSy4MOdj5eSTX7uH6YyLZcxQPPd0=; b=rrOU1ZsvLZ7CpBiTUSDnc1STH1Mfcc8VT8INLqLEkLoHN774LGmKvuEIf6aM5fkSjk Sq5DklYs6N+n4sXcPUmznr0nOT5+jeyVkxPW0CjEvjBNH7WgSVEsfWfwT0jeUX2rPhgJ HQCdKgKMClz4L/S9vK055eAgWIVu2dJTSvArY4jKf72yEwSY3lBXQ88LCiCUIYf2xpZd iqE7sUDX759FTBB57xnXgBvDic+mQ0BuA1C/t3Z7s+qixjrIcOo/P0isV01wTr/kJcIJ UXU8eWEU15BgbLcjG5LoIKDP617s8FAed3g07i9XIr1RXBJRpbU9JsYA8KIVF8TGnBbU /w4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ZNeu9lHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id n5-20020a635905000000b0057808b558cesi7741881pgb.124.2023.10.16.03.57.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 03:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ZNeu9lHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id C0A3380ADC4A; Mon, 16 Oct 2023 03:57:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232090AbjJPK5O (ORCPT + 99 others); Mon, 16 Oct 2023 06:57:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229668AbjJPK5M (ORCPT ); Mon, 16 Oct 2023 06:57:12 -0400 Received: from mail-vk1-xa31.google.com (mail-vk1-xa31.google.com [IPv6:2607:f8b0:4864:20::a31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19FF583 for ; Mon, 16 Oct 2023 03:57:11 -0700 (PDT) Received: by mail-vk1-xa31.google.com with SMTP id 71dfb90a1353d-49d6bd3610cso1804041e0c.1 for ; Mon, 16 Oct 2023 03:57:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697453830; x=1698058630; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+GKEx4RoNEwbCJcZsFqe13UzLKoclvdVCKQuYzrcdEk=; b=ZNeu9lHErRyqWr6c+qe/f+2vVUVxRIAzpLiZ2cp3oU0OwNWSoeiiK+nILfW8PXJqCn 45375gOznjX558tGgiTJUb590lKMf4+Qa0WTWcSKvPZLbCHxFt+2Mt5tBbOETWjoD5tJ o1Ed81B2zrKjZCtyh+JTOg7z7Sea+M3IHOcS39ktVb0CA/M1brAcQpSf15ki8yIW8+bq fVPxJfRfi7CfrR9Q0qoPmE320cjpFXYe7JvnG/kUmjQJ4ahIbOKEyEj8zOdBPOVK955t yTeO6y2TKxz/DI/olbp+ROJv4EO29SQbjVvypJnNaJms+mpT5DNpoQRnRL6iM49BKOIg maOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697453830; x=1698058630; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+GKEx4RoNEwbCJcZsFqe13UzLKoclvdVCKQuYzrcdEk=; b=u4OIAwOlLI5e9mh92J1pYWlS7/7x9Tl++SNHMj/G1A/prhMoxud7SOi5X4jhKcgxni DhxFqPWKNg/D3TlDUazSb9/i4m3P5T95lQt3A4Qqv1mOs+9qjVW7OyZPwTjVh8wlmnHd Q9hHNt2p5q7N2aUs0c4rjHw8cV4tnr/3vKyvNcBLaMf2j9LaIaEDD2buCzigPj/eXwgQ 2mcwrgtrSFHgSVJRxqGbYt0dn1i1ji6BvY0PU+NWibLy2HxPzipIpK8ntDSmfWKo+aU7 5rGcDBfM8kqytXizKupUH2FC0TEXPHghBiA3pywijjw6ByGF1NKXRO6YHJAu8BcGEkOX CxMA== X-Gm-Message-State: AOJu0Yz8L27FOY5Vkq9w1pNakreKMAPMp8deYS8TVTpfCXiVfIGhHXZS WX/FIlWbDhFympC34PTLlkpnbeAvCgqevHhRrjav6gG2lz8L3Q== X-Received: by 2002:a1f:2c93:0:b0:4a8:4218:804b with SMTP id s141-20020a1f2c93000000b004a84218804bmr1519757vks.12.1697453830057; Mon, 16 Oct 2023 03:57:10 -0700 (PDT) MIME-Version: 1.0 References: <20231015033750.1747-1-hdanton@sina.com> In-Reply-To: From: Ian Kumlien Date: Mon, 16 Oct 2023 12:56:58 +0200 Message-ID: Subject: Re: [bug] 6.5.7 - ixbe freezes and causes RCU deadlock? To: Hillf Danton Cc: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 16 Oct 2023 03:57:21 -0700 (PDT) And again, no oops visible this time 135476.059611] ixgbe 0000:07:00.0 eno3: Reset adapter [135483.747803] rcu: INFO: rcu_preempt self-detected stall on CPU [135483.753749] rcu: 3-....: (20999 ticks this GP) idle=3Dddf4/1/0x4000000000000000 softirq=3D997198/997198 fqs=3D3594 [135483.763852] rcu: (t=3D21015 jiffies g=3D4687825 q=3D371 ncpus=3D12) [135483.769694] rcu: rcu_preempt kthread timer wakeup didn't happen for 6637 jiffies! g4687825 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x402 [135483.781436] rcu: Possible timer handling issue on cpu=3D8 timer-softirq=3D960866 [135483.788752] rcu: rcu_preempt kthread starved for 6660 jiffies! g4687825 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x402 ->cpu=3D8 [135483.799540] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [135483.808849] rcu: RCU grace-period kthread stack dump: [135483.814249] rcu: Stack dump where RCU GP kthread last ran: [135546.819253] rcu: INFO: rcu_preempt self-detected stall on CPU [135546.825177] rcu: 3-....: (83999 ticks this GP) idle=3Dddf4/1/0x4000000000000000 softirq=3D997198/997198 fqs=3D3594 [135546.835276] rcu: (t=3D84088 jiffies g=3D4687825 q=3D802 ncpus=3D12) [135546.841114] rcu: rcu_preempt kthread starved for 69713 jiffies! g4687825 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x0 ->cpu=3D8 [135546.851835] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [135546.861177] rcu: RCU grace-period kthread stack dump: [135546.866576] rcu: Stack dump where RCU GP kthread last ran: On Sun, Oct 15, 2023 at 2:01=E2=80=AFPM Ian Kumlien = wrote: > > On Sun, Oct 15, 2023 at 5:38=E2=80=AFAM Hillf Danton w= rote: > > > > On Sun, 15 Oct 2023 00:11:41 +0200 Ian Kumlien > > > So, this keeps happening - it's happened for quite some time now... > > > I can't really reproduce it but it starts with a network adapter > > > freezing and ends with RCU errors > > > and watchdog reboot... :/ > > > > > > cat bug.txt | ./scripts/decode_stacktrace.sh vmlinux > > > [185433.169006] ------------[ cut here ]------------ > > > [185433.169018] NETDEV WATCHDOG: eno3 (ixgbe): transmit queue 2 timed= out 9736 ms > > > [185433.169094] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:525 > > > dev_watchdog (net/sched/sch_generic.c:525 (discriminator 3)) > > > > Watchdog reported eno3 tx hang. > > ... > > > > > > And in the IPMI console: > > > [185433.169621] ixgbe 0000:07:00.0 eno3: Reset adapter > > > [185444.166717] rcu: INFO: rcu_preempt self-detected stall on CPU > > > [185444.172665] rcu: 0-...!: (20999 ticks this GP) > > > idle=3D8d84/1/0x4000000000000000 softirq=3D1976223/1976223 fqs=3D2 > > > [185444.182681] rcu: (t=3D21015 jiffies g=3D6787421 q=3D738 ncpus=3D1= 2) > > > [185444.188523] rcu: rcu_preempt kthread timer wakeup didn't happen > > > for 21009 jiffies! g6787421 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x402 > > > [185444.200361] rcu: Possible timer handling issue on cpu=3D8 timer-s= oftirq=3D1196063 > > > > Timer on CPU8 is suspected to cause RCU stall. > > > > > [185444.207761] rcu: rcu_preempt kthread starved for 21032 jiffies! > > > g6787421 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x402 ->cpu=3D8 > > > [185444.218639] rcu: Unless rcu_preempt kthread gets sufficient CPU > > > time, OOM is now expected behavior. > > > [185444.227946] rcu: RCU grace-period kthread stack dump: > > > [185444.233347] rcu: Stack dump where RCU GP kthread last ran: > > > [185507.243156] rcu: INFO: rcu_preempt self-detected stall on CPU > > > [185507.249098] rcu: 0-....: (84002 ticks this GP) > > > idle=3D8d84/1/0x4000000000000000 softirq=3D1976223/1976223 fqs=3D1559 > > > [185507.259375] rcu: (t=3D84094 jiffies g=3D6787421 q=3D1213 ncpus=3D= 12) > > > [185570.265595] rcu: INFO: rcu_preempt self-detected stall on CPU > > > [185570.271532] rcu: 0-....: (147002 ticks this GP) > > > idle=3D8d84/1/0x4000000000000000 softirq=3D1976223/1976223 fqs=3D1384= 4 > > > [185570.282016] rcu: (t=3D147117 jiffies g=3D6787421 q=3D1273 ncpus= =3D12) > > > [185570.288049] rcu: rcu_preempt kthread timer wakeup didn't happen > > > for 13787 jiffies! g6787421 f0x0 RCU_GP_WAIT_FQS(5) ->state=3D0x402 > > > [185570.299914] rcu: Possible timer handling issue on cpu=3D9 timer-s= oftirq=3D1211534 > > > > Ditto on CPU9. > > > > No answer yet to why rcu stall was reported without any info about the = timers > > on CPU8/9. > > Well... I can't really give you anymore information, all i can say is > that it leads to complete deadlock and eventual reboot by the hardware > watchdog...