Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp3496023rdg; Tue, 17 Oct 2023 17:45:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEUE5qgkTtuxvYUXUMIUYikOoyosT7FAGzIYsk2strMZAJky9KXH9qnNttbynOWd1FPT9DQ X-Received: by 2002:a54:480d:0:b0:3a9:bb08:d468 with SMTP id j13-20020a54480d000000b003a9bb08d468mr3950664oij.55.1697589902767; Tue, 17 Oct 2023 17:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697589902; cv=none; d=google.com; s=arc-20160816; b=MyAKWAt/k2pRARDKI7IenAlnJMUdOSWn+IyiivHIx0ud7HOCdZh7f4jul6pFZKv+vb TOopMnFfpvAAmpqWWa4cPkisTgbZypQ3dk8JReTinFH4KCpUOX6v/iBIZLIoZ6uIZYvm ujNeEIZxExY0VRFP4xXAA2wKSIkaJld60A/sAVx3iqCglk4tWthuBZei+005GaAj8zZD vpQ71h5QGXtV6Li3m1W8kKow9fJ/4mr2YltxD+vrP33wLKU7SsTfFuduPxf7jugPTQ2J ye1TS1YnqElXMoXLRhsr6WtJ3UF0vs1G2uvnGveuuV4m+U38G/W0+JJ4ti742JRhzy5z zd3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=P9/9YNXWwQxn0dILxlsrcPkLVXG8uSZ/9sAG6wchdLs=; fh=oUkLqi/UqUEltNJ2q9n5YWIkBD8abEbIMBXpH2A8ZVk=; b=z2Bkx4aWACrX5M17OeBcST3rlLdnM1KbDFN1r7wgrtRj67EsdBZylEWBKijkxDzx77 0uaIMnjl6pHPbfR0+xfT4ST6t49ojE6llwpbI9ZsF9+FN0gLvWPHfGznR2AmCp6cjGUd Q9oAhmA++x8yDCLsf6jmRbKJp39pWd8M2JtCbEsncd9qriTlc8nBV9v79g0PBpuJCB+l b8EcIYf9eYHAJPwjBiaLG3+d2PZaYo1QVmZ8ak89TS3TzGWNV5pFxOhGPGyVGzLFi77L /44Q1E6+mRekshxVz97SWA+KjkgLf9NNT9+b4z7tq2VwRoOxOlJ4zDNSAQ7yN5J5l3cS xdKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=gWpxmVvL; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id m17-20020a656a11000000b0058a90b68abbsi1176586pgu.226.2023.10.17.17.45.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 17:45:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=gWpxmVvL; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 46DF680BE2D6; Tue, 17 Oct 2023 17:45:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229446AbjJRAoD (ORCPT + 99 others); Tue, 17 Oct 2023 20:44:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232009AbjJRAoD (ORCPT ); Tue, 17 Oct 2023 20:44:03 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14EDAF7 for ; Tue, 17 Oct 2023 17:44:00 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-111-143.bstnma.fios.verizon.net [173.48.111.143]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 39I0hZXS011900 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Oct 2023 20:43:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1697589818; bh=P9/9YNXWwQxn0dILxlsrcPkLVXG8uSZ/9sAG6wchdLs=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=gWpxmVvLJ1ex0TOrGNOUOEHmqFE20WTcXwWrLrd0cmIAaWsZfGa6nauLkjSZubAnc oegqAzfReC3nIwHDmoaSzNgZHtmplOXJt59nM+dTydZUGpnvSAPuCz6gM0Qp5CNlWc uki7a6lT6Qweog3YKDJLcCShn2ygd4QrUIT5FT/Mgju4fh8mBlGQNTC/Hw8axHUzBX M4d51sAeF1W7HK3gPbpAjdmd6Y4ScICF9U6zQf1A2o0qzd1yyOO+igPZTWKxTG0g6J 8TtH9jtgL2vcYBy+9rWIdei1WwBzAUXZv07peMOEIJCVSp82XMc8FdLTukB3Jox9AL POGIP/lP8WmHQ== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 8868A15C0243; Tue, 17 Oct 2023 20:43:35 -0400 (EDT) Date: Tue, 17 Oct 2023 20:43:35 -0400 From: "Theodore Ts'o" To: Andres Freund Cc: Thorsten Leemhuis , Shreeya Patel , linux-ext4@vger.kernel.org, Ricardo =?iso-8859-1?Q?Ca=F1uelo?= , gustavo.padovan@collabora.com, zsm@google.com, garrick@google.com, Linux regressions mailing list Subject: Re: task hung in ext4_fallocate #2 Message-ID: <20231018004335.GA593012@mit.edu> References: <20231017033725.r6pfo5a4ayqisct7@awork3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231017033725.r6pfo5a4ayqisct7@awork3.anarazel.de> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 17 Oct 2023 17:45:00 -0700 (PDT) On Mon, Oct 16, 2023 at 08:37:25PM -0700, Andres Freund wrote: > I just was able to reproduce the issue, after upgrading to 6.6-rc6 - this time > it took ~55min of high load (io_uring using branch of postgres, running a > write heavy transactional workload concurrently with concurrent bulk data > load) to trigger the issue. > > For now I have left the system running, in case there's something you would > like me to check while the system is hung. > > The first hanging task that I observed: > > cat /proc/57606/stack > [<0>] inode_dio_wait+0xd5/0x100 > [<0>] ext4_fallocate+0x12f/0x1040 > [<0>] vfs_fallocate+0x135/0x360 > [<0>] __x64_sys_fallocate+0x42/0x70 > [<0>] do_syscall_64+0x38/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This stack trace is from some process (presumably postgres) trying to do a fallocate() system call: /* Wait all existing dio workers, newcomers will block on i_rwsem */ inode_dio_wait(inode); The reason for this is that we can't manipulate the extent tree until any data block I/Os comlplete. This will block until iomap_dio_complete() in fs/iomap/direct-io.c calls inode_dio_end(). > [ 3194.579297] INFO: task iou-wrk-58004:58874 blocked for more than 122 seconds. > [ 3194.579304] Not tainted 6.6.0-rc6-andres-00001-g01edcfe38260 #77 > [ 3194.579310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 3194.579314] task:iou-wrk-58004 state:D stack:0 pid:58874 ppid:52606 flags:0x00004000 > [ 3194.579325] Call Trace: > [ 3194.579329] > [ 3194.579334] __schedule+0x388/0x13e0 > [ 3194.579349] schedule+0x5f/0xe0 > [ 3194.579361] schedule_preempt_disabled+0x15/0x20 > [ 3194.579374] rwsem_down_read_slowpath+0x26e/0x4c0 > [ 3194.579385] down_read+0x44/0xa0 > [ 3194.579393] ext4_file_write_iter+0x432/0xa80 > [ 3194.579407] io_write+0x129/0x420 This could potentially be a interesting stack trace; but this is where we really need to map the stack address to line numbers. Is that something you could do? > Once I hear that you don't want me to test something out on the running > system, I think a sensible next step could be to compile with lockdep and see > if that finds a problem? That's certainly a possibiity. But also please make sure that you can compile with with debugging information enabled so that we can get reliable line numbers. I use: CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_INFO_REDUCED=y Cheers, - Ted