Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp3537042rdg; Tue, 17 Oct 2023 19:57:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFfP6pSgaLHM2WoFXPQAiR2eGmtpy6FQvI+hSzyAidSAofjdfFp70tvWfeFCFDJcW4R00G X-Received: by 2002:a81:6d56:0:b0:5a7:c640:7003 with SMTP id i83-20020a816d56000000b005a7c6407003mr4504970ywc.23.1697597832969; Tue, 17 Oct 2023 19:57:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697597832; cv=none; d=google.com; s=arc-20160816; b=lMPx8jtc++sV6+nQ+M9Jpylrmfd12+YrHBBoo3O9Afkg+oBHGhovbw+1ynYzb2BiSW Rn5RlJOpEnpy5JxWOx7OBOVEknWPlaDOZfN+phNpw2auyUXCLhxZtMRKlAfvkXGEM0KY AJhwSfKDJXOZrJ47eHOFaEQpbIYpS42hWpFet+/1Y+AjvsBqBSPX07q2uka19/53lPkY HI2Yo1KgSNWR4zNrzBSF5VSZ5DbygzgeaI9Ofbcjfi69AV75WhA4C5pjSKnRg+rkihIE uBrxrMkn4Yzp6Bh66LfME4FefvKq9mc4nQP3beWSKOdpLtqAFuY69hFp5uuNe/4fRh2x SJmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature:dkim-signature; bh=z4oHDsicD1rLu4S9/g9h5gETwcA1PGj1NpkRk4ugOM0=; fh=OSvGIxcoAo+KDEwmC0g+6dMJYkchcYF2EMPA0SPMA08=; b=Qg5jdFQvF4SHBd/YLyHF4crRdmMHSUHFhhFJtoU4axe5L0v7x/lTAJ+KgEhrsji7MC N/E3EBNs0n3yW+uIiMmyzl5yHa+iWwbPUk0hN4hMs42kygchyb7y3y2Hh3Br7fkrXP19 HaU1UzEkzQjpFEFIq7v+uSbRNrTAcjqWJgtT3XuZ+dpZTY9Qatb9Ume+AHEdbRDb82gf /9bAwAk9k1Kd1+k1+GOuA3YoH5QaKxB3JtTUb2ysmLKzIsXx8f7nkuGZw9ofpIVlZ4hC TaP16wd4viBnphSm5KJxUhLJ9k7H59bfv/atAKw+aQTmCpyD9a8rmIfSew05hcXeVKNT fQjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm3 header.b=fuUoZRRK; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=aYKCMRYE; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id bs2-20020a632802000000b005855bd1ebbasi1127883pgb.204.2023.10.17.19.57.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 19:57:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm3 header.b=fuUoZRRK; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=aYKCMRYE; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3BEBD80781EE; Tue, 17 Oct 2023 19:56:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229451AbjJRCuS (ORCPT + 99 others); Tue, 17 Oct 2023 22:50:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbjJRCuR (ORCPT ); Tue, 17 Oct 2023 22:50:17 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7837CAB for ; Tue, 17 Oct 2023 19:50:15 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 85C403200A12; Tue, 17 Oct 2023 22:50:12 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 17 Oct 2023 22:50:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1697597412; x=1697683812; bh=z4 oHDsicD1rLu4S9/g9h5gETwcA1PGj1NpkRk4ugOM0=; b=fuUoZRRKm+XgMDA2XT huSFR7PIJUjgIXmT0hCna1UVNf5OB2D9tiK+amsyl3zLLNPXYu0/t6V/8rwok+do CjXAQEKom2776T+2jhjfp6d1j21IbypnQ0IfcmiRaRsdiFPrbLq+7MneZ0u3gRVb PTpdDsCP2iDE/HewhvkGjSH82JwsJ0viDaF8suaJpRB4k9LO5/JHFQia125UxAfh 48/4oYSA2wyI7KAUlWMm97orvzvyWiPbCgnM9nbh6XPmvQQ3oAYNGRCpna685A/e pq+9hf+EsD6lCm8qbDb9UxIj76Z2xAYIyyabsakPYfez6ooWm32nx+XUyoAZZBYr AhGA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1697597412; x=1697683812; bh=z4oHDsicD1rLu 4S9/g9h5gETwcA1PGj1NpkRk4ugOM0=; b=aYKCMRYENItiD0nqLypQckdompVuG kvZprG75SEGiZS2LRua1nfE4tD/3xB6WklML2qSUCyzqTm2MxLd2q2BCNRqeudpp kSe/sMBdHCk3tAP0aszcIsEQZiAIEVOV3MJgfHje/Zi4m51ZRq3xgx/YtULTd1Po +2pLiQFaUJj0WplQ4XMxzdMQJlmN3cl3N22VTSueu1RHFJ/zpRFae/ZVzTtXToZx 2G5cOFjmBx7wg05I2aZNu6mDUqmpN61kz+VQSVJas7XKb8YSpkjPTFEVxwilLKMW DWdrg7GtWBSoHfLriqP5k8o7Lgf6kQvoGNl/BFF4pl4kTziGqSViBIFMQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrjeefgdeiudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedvffefvefhteevffegieetfefhtddvffejvefhueetgeeludehteevudei tedtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 17 Oct 2023 22:50:11 -0400 (EDT) Date: Tue, 17 Oct 2023 19:50:09 -0700 From: Andres Freund To: Theodore Ts'o Cc: Thorsten Leemhuis , Shreeya Patel , linux-ext4@vger.kernel.org, Ricardo =?utf-8?Q?Ca=C3=B1uelo?= , gustavo.padovan@collabora.com, zsm@google.com, garrick@google.com, Linux regressions mailing list Subject: Re: task hung in ext4_fallocate #2 Message-ID: <20231018025009.ulkykpefwdgpfvzf@awork3.anarazel.de> References: <20231017033725.r6pfo5a4ayqisct7@awork3.anarazel.de> <20231018004335.GA593012@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231018004335.GA593012@mit.edu> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 17 Oct 2023 19:56:08 -0700 (PDT) Hi, On 2023-10-17 20:43:35 -0400, Theodore Ts'o wrote: > On Mon, Oct 16, 2023 at 08:37:25PM -0700, Andres Freund wrote: > > I just was able to reproduce the issue, after upgrading to 6.6-rc6 - this time > > it took ~55min of high load (io_uring using branch of postgres, running a > > write heavy transactional workload concurrently with concurrent bulk data > > load) to trigger the issue. > > > > For now I have left the system running, in case there's something you would > > like me to check while the system is hung. > > > > The first hanging task that I observed: > > > > cat /proc/57606/stack > > [<0>] inode_dio_wait+0xd5/0x100 > > [<0>] ext4_fallocate+0x12f/0x1040 > > [<0>] vfs_fallocate+0x135/0x360 > > [<0>] __x64_sys_fallocate+0x42/0x70 > > [<0>] do_syscall_64+0x38/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > This stack trace is from some process (presumably postgres) trying to > do a fallocate() system call: Yes, it's indeed postgres. > > [ 3194.579297] INFO: task iou-wrk-58004:58874 blocked for more than 122 seconds. > > [ 3194.579304] Not tainted 6.6.0-rc6-andres-00001-g01edcfe38260 #77 > > [ 3194.579310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 3194.579314] task:iou-wrk-58004 state:D stack:0 pid:58874 ppid:52606 flags:0x00004000 > > [ 3194.579325] Call Trace: > > [ 3194.579329] > > [ 3194.579334] __schedule+0x388/0x13e0 > > [ 3194.579349] schedule+0x5f/0xe0 > > [ 3194.579361] schedule_preempt_disabled+0x15/0x20 > > [ 3194.579374] rwsem_down_read_slowpath+0x26e/0x4c0 > > [ 3194.579385] down_read+0x44/0xa0 > > [ 3194.579393] ext4_file_write_iter+0x432/0xa80 > > [ 3194.579407] io_write+0x129/0x420 > > This could potentially be a interesting stack trace; but this is where > we really need to map the stack address to line numbers. Is that > something you could do? Converting the above with scripts/decode_stacktrace.sh yields: __schedule (kernel/sched/core.c:5382 kernel/sched/core.c:6695) schedule (./arch/x86/include/asm/preempt.h:85 (discriminator 13) kernel/sched/core.c:6772 (discriminator 13)) schedule_preempt_disabled (./arch/x86/include/asm/preempt.h:80 (discriminator 10) kernel/sched/core.c:6831 (discriminator 10)) rwsem_down_read_slowpath (kernel/locking/rwsem.c:1073 (discriminator 4)) down_read (./arch/x86/include/asm/preempt.h:95 kernel/locking/rwsem.c:1257 kernel/locking/rwsem.c:1263 kernel/locking/rwsem.c:1522) ext4_file_write_iter (./include/linux/fs.h:1073 fs/ext4/file.c:57 fs/ext4/file.c:564 fs/ext4/file.c:715) io_write (./include/linux/fs.h:1956 io_uring/rw.c:926) But I'm not sure that that stacktrace is quite right (e.g. what's ./include/linux/fs.h:1073 doing in this stacktrace?). But with an optimized build it's a bit hard to tell what's an optimization artifact and what's an off stack trace... I had the idea to look at the stacks (via /proc/$pid/stack) for all postgres processes and the associated io-uring threads, and then to deduplicate them. 22x: ext4_file_write_iter (./include/linux/fs.h:1073 fs/ext4/file.c:57 fs/ext4/file.c:564 fs/ext4/file.c:715) io_write (./include/linux/fs.h:1956 io_uring/rw.c:926) io_issue_sqe (io_uring/io_uring.c:1878) io_wq_submit_work (io_uring/io_uring.c:1960) io_worker_handle_work (io_uring/io-wq.c:596) io_wq_worker (io_uring/io-wq.c:258 io_uring/io-wq.c:648) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:312) 8x: __do_sys_io_uring_enter (io_uring/io_uring.c:2553 (discriminator 1) io_uring/io_uring.c:2622 (discriminator 1) io_uring/io_uring.c:3706 (discriminator 1)) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 2x: io_wq_worker (./arch/x86/include/asm/current.h:41 (discriminator 5) io_uring/io-wq.c:668 (discriminator 5)) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:312) 2x: futex_wait_queue (./arch/x86/include/asm/current.h:41 (discriminator 5) kernel/futex/waitwake.c:357 (discriminator 5)) futex_wait (kernel/futex/waitwake.c:660) do_futex (kernel/futex/syscalls.c:106 (discriminator 1)) __x64_sys_futex (kernel/futex/syscalls.c:183 kernel/futex/syscalls.c:164 kernel/futex/syscalls.c:164) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 1x: do_epoll_wait (fs/eventpoll.c:1921 (discriminator 1) fs/eventpoll.c:2318 (discriminator 1)) __x64_sys_epoll_wait (fs/eventpoll.c:2325) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 1x: inode_dio_wait (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:444 ./include/linux/atomic/atomic-instrumented.h:33 fs/inode.c:2429 fs/inode.c:2446) ext4_fallocate (fs/ext4/extents.c:4752) vfs_fallocate (fs/open.c:324) __x64_sys_fallocate (./include/linux/file.h:45 fs/open.c:348 fs/open.c:355 fs/open.c:353 fs/open.c:353) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) > > Once I hear that you don't want me to test something out on the running > > system, I think a sensible next step could be to compile with lockdep and see > > if that finds a problem? > > That's certainly a possibiity. But also please make sure that you can > compile with with debugging information enabled so that we can get > reliable line numbers. I use: > > CONFIG_DEBUG_INFO=y > CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y > CONFIG_DEBUG_INFO_REDUCED=y The kernel from above had debug info enabled, albeit forced to dwarf4 (IIRC I ran into issues with pahole not understanding all of dwarf5): $ zgrep DEBUG_INFO /proc/config.gz CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_INFO_NONE is not set # CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set CONFIG_DEBUG_INFO_DWARF4=y # CONFIG_DEBUG_INFO_DWARF5 is not set # CONFIG_DEBUG_INFO_REDUCED is not set CONFIG_DEBUG_INFO_COMPRESSED_NONE=y # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set # CONFIG_DEBUG_INFO_COMPRESSED_ZSTD is not set # CONFIG_DEBUG_INFO_SPLIT is not set CONFIG_DEBUG_INFO_BTF=y CONFIG_DEBUG_INFO_BTF_MODULES=y I switched it to CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y and am rebuilding with lockdep enabled. Curious to see how long it'll take to trigger the problem with it enabled... Greetings, Andres Freund