Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2683676rwd; Wed, 14 Jun 2023 06:15:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ66G83SLGVCNbiBsaRQfK+3yiqfvHj2Wk2d39Ol4ClYcqaORMJenBCa7UVW4HYNHqSYEiVx X-Received: by 2002:a17:907:a44:b0:953:37eb:7727 with SMTP id be4-20020a1709070a4400b0095337eb7727mr14318515ejc.43.1686748502994; Wed, 14 Jun 2023 06:15:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686748502; cv=none; d=google.com; s=arc-20160816; b=caTcSWX9yckXyRvFWkUJxGfSVVhi4zWy7iLIEESR/+dUCauqVqIyXiLfedDMg3fVfk bkFqKrc9WaZvV+8pJmN1+GokgO8YtKg9ax/N30oAbkkqKI7WMEDeLr3LU8nszxcQjOE8 KFAUSKiud1MLUMRHFnK9bkhj0Ex0/BnsQERqaGGpmgrR1TmSJyZOZPnuiXpcWeMAfTk5 /Hjs8dY3MUp0fjENt5LygoXUwfiuOg7y7kscXrxtgv+zq9Au9BYacHIbxj5ahB2fZtqX 1WpDnTO3jW8nSyunx3I/9BzHtMnhejyiYEFNQ7lz1jLDgw5EMhB5c4fSpDYiYvema7fz AJkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=7J+WeyL9vuRm7vz4qO9VPyyieEGrzWEnSV0+4PYLnqU=; b=k4KAZQ8zG0VwH52ou/72AfgiCoP4qkDQvi4ZyNAoUWibzVs+D8JtlbcaVM2NHSy9vp 0OPmmaUhuva/dWiQFYxlxx53EJUwC2ee4t3CZKd8vzS1XpujUi6zhBPuNpxTEldFTL4U iLPvK9lQZSr/ctMPNnL7LTLCjhbhrXAxQp0tVNwIWbXEKpDO0wlebWYUGkl8daBXiTwZ gdIyzuYBkjkddW5VcPNIb0zChFf66W/dRvrQuVuZY5XdbK2fEsHnAqT3E8Krhb+97KBy hT1oSqnstgNaudKZ2IUYG4x54ysg5EljPbTh3+QPNOpt2BwHzNW5gxruLB/lyzwsB4tW 3d4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=oq6CoNoJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id la8-20020a170906ad8800b009745635eee3si9435561ejb.143.2023.06.14.06.14.37; Wed, 14 Jun 2023 06:15:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=oq6CoNoJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235231AbjFNNJN (ORCPT + 99 others); Wed, 14 Jun 2023 09:09:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235643AbjFNNJL (ORCPT ); Wed, 14 Jun 2023 09:09:11 -0400 Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2F21193; Wed, 14 Jun 2023 06:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1686748151; x=1718284151; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=7J+WeyL9vuRm7vz4qO9VPyyieEGrzWEnSV0+4PYLnqU=; b=oq6CoNoJz1+p8yU29GlhO37zXa0maAx7fmIsqdoRas4IpjgQpFz92V3k hwoFFFjAqxOUJPk7nQs1exlh5zbKmP9i/AA3xiqKfn0UkST/q1xgfRIf7 fljZbVwDht6zXS3SRATLTmifMj/tPgwRqBJosVe3da1EFQx6p8EYIXWXH g=; X-IronPort-AV: E=Sophos;i="6.00,242,1681171200"; d="scan'208";a="654136441" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-iad-1box-1dm6-7f722725.us-east-1.amazon.com) ([10.25.36.214]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 13:09:04 +0000 Received: from EX19MTAUEA001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1box-1dm6-7f722725.us-east-1.amazon.com (Postfix) with ESMTPS id 0E850160D4B; Wed, 14 Jun 2023 13:09:02 +0000 (UTC) Received: from EX19D028UEC003.ant.amazon.com (10.252.137.159) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Wed, 14 Jun 2023 13:08:55 +0000 Received: from [10.95.176.26] (10.95.176.26) by EX19D028UEC003.ant.amazon.com (10.252.137.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Wed, 14 Jun 2023 13:08:53 +0000 Message-ID: <9fa0fcae-a857-eca4-6aea-2213af62d1ef@amazon.com> Date: Wed, 14 Jun 2023 09:08:51 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees Content-Language: en-US To: "gregkh@linuxfoundation.org" , "Bhatnagar, Rishabh" CC: , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "sashal@kernel.org" , , "stable@vger.kernel.org" References: <12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com> <2023061457-king-broadcast-f47e@gregkh> From: Luiz Capitulino In-Reply-To: <2023061457-king-broadcast-f47e@gregkh> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.95.176.26] X-ClientProxiedBy: EX19D040UWB002.ant.amazon.com (10.13.138.89) To EX19D028UEC003.ant.amazon.com (10.252.137.159) X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023-06-14 05:14, gregkh@linuxfoundation.org wrote: > > > > On Tue, Jun 13, 2023 at 11:58:05AM -0700, Bhatnagar, Rishabh wrote: >> >> On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote: >>> Hi Sebastian/Greg >>> >>> We are seeing RCU stall warnings from recent stable tree updates: >>> 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards. >>> This is seen in the upstream stable trees without any downstream patches. >>> >>> The issue is seen few minutes after booting without any workload. >>> We launch hundred's of virtual instances and this shows up in 1-2 >>> instances, so its hard to reproduce. >>> Attaching a few stack traces below. >>> >>> The issue can be seen on virtual and baremetal instances. >>> Another interesting point is we only see this on x86 based instances. >>> We also did test this on linux-mainline but were not able to reproduce >>> the issue. >>> So maybe there's a fixup or related commit that has gone in? >>> >>> We tried bisecting the stable trees and found that after reverting the >>> below commit we couldn't reproduce this in any of the kernels >>> consistently. >>> >>> tick/common: Align tick period with the HZ tick. [ Upstream commit >>> e9523a0d81899361214d118ad60ef76f0e92f71d ] >>> >>> >>> Not exactly sure how this commit is affecting all stable kernels. >>> Can you take a look at this issue and share your insight? > > Does this issue also show up in 6.3.y and in 6.4-rc5? We haven't tried those yet, will try it today. Just to give you a bit of context: we have a quick and a long duration reproducer for this (which is our internal testing infrastructure). In the quick reproducer we can more or less reliably reproduce with 5.4.246 and 5.10.183 but not with 5.15.113, 6.1.33 and latest Linus tree (64569520920a3ca5d456ddd9f4f95fc6ea9b8b45). However, we did reproduce something similar in the long reproducer with our downstream versions of 5.15.113 and 6.1.33 (starting with 6.1.28). - Luiz