Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp1124773imw; Tue, 5 Jul 2022 04:18:37 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sG93QEScKc9vu4UebyXzEZ1i1ZF4uyJ8rfAPCkvfhHAcwWY7i8G+0Ireneo/7KrQyPe3bi X-Received: by 2002:a05:6a00:1592:b0:525:7809:42c6 with SMTP id u18-20020a056a00159200b00525780942c6mr40937579pfk.64.1657019916899; Tue, 05 Jul 2022 04:18:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657019916; cv=none; d=google.com; s=arc-20160816; b=fU7K6m22cKqintLNEoS96eK6U4DvnQRwwZ1aG6SafoNSts4FjslDL8qnGDKiOiBddo tsHuD4Hv5TiFBpz6m+CKoD+r14EIhMhXBri7YjT9PDrqnx2k+xRxnPW03504MU8t4XIe Q3OI7WsrpRH/4GUfCZPamMD41bcnUxAJbLg2mfawwCYmS15XtvADPYekXcbpVDw3sdKx sY8wfIyC6kS3jWsTwmjRmO3zSuj2YcUAG27PDimw1yFVN15kggIespQwQ6VFQWMJDbf8 BxDhSQLargVzwnu3llexhzwvs7e/Inj4wDbqRrH59oVIgstLf6R9wqvyZxOmB5TNYeyh ZMyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=SJc56bSE7LOiJgExdCYTrY3gxwd32cnt2KsfS6wU56M=; b=VtmLvjtC9SUKneVAfDHHChW2cAJjJXXLoQ/STGSrB6EM+5dMZD+L5iOJyiisiLVZA+ APLGsWcz5VQFIHRYN4IeMCgXp2/0WssNFGGsig0g0B/OqnjY67PZhLK2t4/vJDmtAdcG xKfH3XPQoAwKagbHzdxbX4jX6TKnnZpSmSBJ/hDB0yWYh9RBAP+k/V/fb2XNdcfPi5wE TeRqhjYDeJ2gcbu/LgmX8SeVlrTLPxePCMa3wyqfrR0yC3F9oeeen6OjyzAut5dnC8r/ uPDhtL0Y1TWUgOVIhxbsyvpUUPPnNZFx+QOngx8PSJsF3tFXB9lzF/5a+EhK8XqO+uzQ 8rVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=RH2U60dk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s127-20020a632c85000000b004115b38e8e9si11543348pgs.705.2022.07.05.04.18.24; Tue, 05 Jul 2022 04:18:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=RH2U60dk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232573AbiGELHg (ORCPT + 99 others); Tue, 5 Jul 2022 07:07:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231703AbiGELHe (ORCPT ); Tue, 5 Jul 2022 07:07:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 126041402E; Tue, 5 Jul 2022 04:07:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9E75D6103F; Tue, 5 Jul 2022 11:07:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BEFAC341C7; Tue, 5 Jul 2022 11:07:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1657019253; bh=s7uIwx4/KhnTz5Vqkm5gM2z8b9anZC8yzAB3ZypeG10=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RH2U60dklemlOAU3ucknIjczljTkZgvCU0Cl7gsdYdy7LMP37Sd3j6mDApiUhUMAJ sSF3HiehbcCpWUP7yYjoWBsByRKJK+elXyot+UY82DbHH65wTWMHTA/PSdJg7WBSTs 77icbx9AFFr2+LHvH6f23pMQCPv+Uu6NS5NPRxSIqFRw9rGizc+Prbj8VdomqQbacg DfemT3Osg8ngmSZTnyxQRodOeX/uJrsRGn3yOIoKLT5wcOWrN51q125ZheuMo1tQ4t eKEA3uU65v5eQfTZFSOKuCF5uSNuFRF4hE8xyKNRkOApC1zbdcbcFIuXDgVlgm7rMY frRf/oDJGd+1A== Date: Tue, 5 Jul 2022 12:07:25 +0100 From: Will Deacon To: Kajetan Puchalski Cc: Florian Westphal , Pablo Neira Ayuso , Jozsef Kadlecsik , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Mel Gorman , lukasz.luba@arm.com, dietmar.eggemann@arm.com, mark.rutland@arm.com, mark.brown@arm.com, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, stable@vger.kernel.org, regressions@lists.linux.dev, linux-kernel@vger.kernel.org, peterz@infradead.org Subject: Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere Altra Message-ID: <20220705110724.GB711@willie-the-truck> References: <20220701200110.GA15144@breakpoint.cc> <20220702205651.GB15144@breakpoint.cc> <20220705105749.GA711@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220705105749.GA711@willie-the-truck> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 05, 2022 at 11:57:49AM +0100, Will Deacon wrote: > On Tue, Jul 05, 2022 at 11:53:22AM +0100, Kajetan Puchalski wrote: > > On Mon, Jul 04, 2022 at 10:22:24AM +0100, Kajetan Puchalski wrote: > > > On Sat, Jul 02, 2022 at 10:56:51PM +0200, Florian Westphal wrote: > > > > > That would make sense, from further experiments I ran it somehow seems > > > > > to be related to the number of workers being spawned by stress-ng along > > > > > with the CPUs/cores involved. > > > > > > > > > > For instance, running the test with <=25 workers (--udp-flood 25 etc.) > > > > > results in the test running fine for at least 15 minutes. > > > > > > > > Ok. I will let it run for longer on the machines I have access to. > > > > > > > > In mean time, you could test attached patch, its simple s/refcount_/atomic_/ > > > > in nf_conntrack. > > > > > > > > If mainline (patch vs. HEAD 69cb6c6556ad89620547318439) crashes for you > > > > but works with attached patch someone who understands aarch64 memory ordering > > > > would have to look more closely at refcount_XXX functions to see where they > > > > might differ from atomic_ ones. > > > > > > I can confirm that the patch seems to solve the issue. > > > With it applied on top of the 5.19-rc5 tag the test runs fine for at > > > least 15 minutes which was not the case before so it looks like it is > > > that aarch64 memory ordering problem. > > > > I'm CCing some people who should be able to help with aarch64 memory > > ordering, maybe they could take a look. > > > > (re-sending due to a typo in CC, sorry for duplicate emails!) > > Sorry, but I have absolutely no context here. We have a handy document > describing the differences between atomic_t and refcount_t: > > Documentation/core-api/refcount-vs-atomic.rst > > What else do you need to know? Hmm, and I see a tonne of *_inc_not_zero() conversions in 719774377622 ("netfilter: conntrack: convert to refcount_t api") which mean that you no longer have ordering to subsequent reads in the absence of an address dependency. Will