Received: by 2002:a05:7412:bc1a:b0:d7:7d3a:4fe2 with SMTP id ki26csp507103rdb; Sat, 19 Aug 2023 10:26:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHTEaZ6ebwQM3+H6LcG2wezJijYoASteb3vaFF8cX4D2AIyaydMRGxBx/1Da2qzPSgoe82M X-Received: by 2002:a17:90a:2f41:b0:26d:4ade:fcf0 with SMTP id s59-20020a17090a2f4100b0026d4adefcf0mr2264689pjd.4.1692466017508; Sat, 19 Aug 2023 10:26:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692466017; cv=none; d=google.com; s=arc-20160816; b=NXm/wEwmqzl+TPwVC1THKurEuJRsE/anfZshB7rcLDiChoenf4eFm3YAR3jTheT/2Y ccFYJB6CL4gXA6uwip0kVyCLrX4yAxJQ9wWZwLFQR0gJAd+H6Sohpn0m6WxQO/+wfQEE lINqza/LGoxYli/Yl3mloEPDuZ81qcV8Zp9h3+6fEv6/zEtXmtHN2TkVvwnt7qrpcxgx RwP8hh5tDapWS6on/u11K52EXkx5UZAyFvUcJ/CcXlxMk65oZ9Aj4tjtAWZBT9GlhQED 0n0y5J4RiSTbvCu8/f7BfTm2ylDL3QdDYdr++Rce+XUTQ/OhQkqEfakspRzFi5Age8Q5 CWiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:subject:dkim-signature; bh=DWVG/Gj+SaC5/yNAxZXP/Jv3cKCKZjBE62SFwQ3nnBA=; fh=RYN1NbGxfSawpj3z/7GZ6qQCHPE3gCsyOvYV97/r1vY=; b=gCF65mqpw45OKQfIgCYawQUdFHNcjjs8UoUcKmgzRztoe8TiLVUfc43Kx8c5PrsV7K agdGAAgJ6abN8shcSuVfgIr5efD8pvrL7WDzXadobHdY6/xWRK9Egboe5yODsSbs+axV +mVCIOdxRV/cKK4dB9czENrDYmdP4ru501uv0LpvQF/t3eKmYHY+b2t88pqPoIiESX0A 6M8TLA8YUOscJ23mLJ1XmpQJxBCRhHmhqOJJENikBi9XJBXKG+Yzh7IMcVSbHw7hjNR1 0e9RY0zHO/5VUMbb2FGW+/+1M054LcbaAqXD1IbfrLOSqfUFD/Hv3JtPGWNidbIEYypi pK6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=upgVILbK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s1-20020a17090a13c100b0026800336358si5395691pjf.122.2023.08.19.10.26.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 19 Aug 2023 10:26:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=upgVILbK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1269646724; Sat, 19 Aug 2023 01:30:51 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345555AbjHPSdg (ORCPT + 99 others); Wed, 16 Aug 2023 14:33:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345567AbjHPSdK (ORCPT ); Wed, 16 Aug 2023 14:33:10 -0400 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B7B72136; Wed, 16 Aug 2023 11:32:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1692210778; x=1723746778; h=message-id:date:mime-version:to:cc:references:from: in-reply-to:content-transfer-encoding:subject; bh=DWVG/Gj+SaC5/yNAxZXP/Jv3cKCKZjBE62SFwQ3nnBA=; b=upgVILbK4XMoi5fhfCl4kJOOAbi3pKlOJv4D4Z/FkgtAjLN8v+coQ4bC Xi/sQAR3nQnmHAcpg6B8dgvixldTZz4Hh+WjdIVoEQNvxKcm/3vwewLmf Mp+VtmJhwsNdM/RV22ypbdXZN0gsBPTI7qL6zDD1jS9KS36jQtYD0nt5/ Y=; X-IronPort-AV: E=Sophos;i="6.01,177,1684800000"; d="scan'208";a="22926156" Subject: Re: Tasks stuck jbd2 for a long time Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-7fa2de02.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2023 18:32:57 +0000 Received: from EX19MTAUWC001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-2b-m6i4x-7fa2de02.us-west-2.amazon.com (Postfix) with ESMTPS id 9BEB940DB5; Wed, 16 Aug 2023 18:32:57 +0000 (UTC) Received: from EX19D002UWC004.ant.amazon.com (10.13.138.186) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Wed, 16 Aug 2023 18:32:49 +0000 Received: from [10.94.35.220] (10.94.35.220) by EX19D002UWC004.ant.amazon.com (10.13.138.186) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Wed, 16 Aug 2023 18:32:48 +0000 Message-ID: Date: Wed, 16 Aug 2023 11:32:47 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Content-Language: en-US To: Jan Kara CC: Theodore Ts'o , , , "linux-kernel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "Park, SeongJae" References: <153d081d-e738-b916-4f72-364b2c1cc36a@amazon.com> <20230816022851.GH2247938@mit.edu> <17b6398c-859e-4ce7-b751-8688a7288b47@amazon.com> <20230816145310.giogco2nbzedgak2@quack3> From: "Bhatnagar, Rishabh" In-Reply-To: <20230816145310.giogco2nbzedgak2@quack3> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.94.35.220] X-ClientProxiedBy: EX19D038UWC003.ant.amazon.com (10.13.139.209) To EX19D002UWC004.ant.amazon.com (10.13.138.186) X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/16/23 7:53 AM, Jan Kara wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Tue 15-08-23 20:57:14, Bhatnagar, Rishabh wrote: >> On 8/15/23 7:28 PM, Theodore Ts'o wrote: >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>> >>> >>> >>> It would be helpful if you can translate address in the stack trace to >>> line numbers. See [1] and the script in >>> ./scripts/decode_stacktrace.sh in the kernel sources. (It is >>> referenced in the web page at [1].) >>> >>> [1] https://docs.kernel.org/admin-guide/bug-hunting.html >>> >>> Of course, in order to interpret the line numbers, we'll need a >>> pointer to the git repo of your kernel sources and the git commit ID >>> you were using that presumably corresponds to 5.10.184-175.731.amzn2.x86_64. >>> >>> The stack trace for which I am particularly interested is the one for >>> the jbd2/md0-8 task, e.g.: >> Thanks for checking Ted. >> >> We don't have fast_commit feature enabled. So it should correspond to this >> line: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/jbd2/commit.c?h=linux-5.10.y#n496 >> >>>> Not tainted 5.10.184-175.731.amzn2.x86_64 #1 >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> task:jbd2/md0-8 state:D stack: 0 pid: 8068 ppid: 2 >>>> flags:0x00004080 >>>> Call Trace: >>>> __schedule+0x1f9/0x660 >>>> schedule+0x46/0xb0 >>>> jbd2_journal_commit_transaction+0x35d/0x1880 [jbd2] <--------- line #? >>>> ? update_load_avg+0x7a/0x5d0 >>>> ? add_wait_queue_exclusive+0x70/0x70 >>>> ? lock_timer_base+0x61/0x80 >>>> ? kjournald2+0xcf/0x360 [jbd2] >>>> kjournald2+0xcf/0x360 [jbd2] >>> Most of the other stack traces you refenced are tasks that are waiting >>> for the transaction commit to complete so they can proceed with some >>> file system operation. The stack traces which have >>> start_this_handle() in them are examples of this going on. Stack >>> traces of tasks that do *not* have start_this_handle() would be >>> specially interesting. >> I see all other stacks apart from kjournald have "start_this_handle". > That would be strange. Can you post full output of "echo w >> /proc/sysrq-trigger" to dmesg, ideally passed through scripts/faddr2line as > Ted suggests. Thanks! Sure i'll try to collect that. The system freezes when such a situation happens and i'm not able to collect much information. I'll try to crash the kernel and collect kdump and see if i can get that info. Can low available memory be a reason for a thread to not be able to close the transaction handle for a long time? Maybe some writeback thread starts the handle but is not able to complete writeback? > > Honza > -- > Jan Kara > SUSE Labs, CR