Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp39060rdh; Wed, 25 Oct 2023 15:29:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHW0OxgKGyGHbp6auotNbQjpeplMCcTWVfyj26x9I5HLn2CO1P0vrWtw2UEcqxZweCp/tO6 X-Received: by 2002:a67:e051:0:b0:452:5a95:16a with SMTP id n17-20020a67e051000000b004525a95016amr14299398vsl.1.1698272941708; Wed, 25 Oct 2023 15:29:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698272941; cv=none; d=google.com; s=arc-20160816; b=oOKoc8u2x3UDQ3YyFmJjIhLOr8ui8ILWZJaCkffFQsp7whBE05bDhY3OMPZR6/ODkm gYiv9KrHsg41J97EUj+R/L4m7Q+PND50t5x1Va+NuUgRquTiWIvIR3paKjzOD3L1K2/A XS1kcdZth/b6IxJ7Qi1FOjtCoh+E7G801eE3Nlxd6EO5ClUDnrNJKWZEQMMY0zqTYw1A njsEpOu6hH6rIWmWu/3bD7eAiyo8BCmKacqmDFxlkoB/dmooCMdojcilBY1yp0SKVK23 50TrOMnBsYoTtKcbe/jJulBtVUoUHCXWsxLq+SIgnGJQWL3KFEGqiNqtBf0bFqjg9aEX SWUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=a/zohdYHYB8xgSRg6885BBGHf24YwLAdAnopLL9bSYk=; fh=XeucOVInh0SXcsV8ZIiKiWGjh+g7loItamJj8n976T4=; b=iZ41gdeafaZ3mj3yGITvQULKa95B8uQjSOxClBGxVqftaXgzfggTeCzC8U3gLkuDwp 6XHIIRT4ci1l/sUKfbZs1sxRrBIuDSz4IXM0fCLMrOv4JFrbj2tbevFORIa5aloTx74n V9jhremcKcolJ/l1gxGEoS6dAUx+JzrsleEmb0p5Sub9JuT+tPdQpRMd0B8f1nRjhB2m rHTtXZ32VH/Iclilf31UBpv8wtDnh3XNkpYU+T9Zj7yg+PfajCD5Wqm70l0HKxipEk+o +QGvD7pqHWOaVrDQbJu4kYtasf9TjOmDt5lmoMe6EK+YqL9NdFIJ3QbKS8fS0E8LVzzF k13g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="Zf/ZLaJD"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id x144-20020a25e096000000b00d749ab5cc15si4170762ybg.558.2023.10.25.15.28.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 15:29:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="Zf/ZLaJD"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 90C3A80CFD66; Wed, 25 Oct 2023 15:28:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231205AbjJYW2g (ORCPT + 99 others); Wed, 25 Oct 2023 18:28:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230376AbjJYW2f (ORCPT ); Wed, 25 Oct 2023 18:28:35 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F069D129 for ; Wed, 25 Oct 2023 15:28:32 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6b709048f32so238861b3a.0 for ; Wed, 25 Oct 2023 15:28:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1698272912; x=1698877712; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=a/zohdYHYB8xgSRg6885BBGHf24YwLAdAnopLL9bSYk=; b=Zf/ZLaJDofKdGB3ULX04n5wxgmY/ktCqghBvCEuJaPKJsNeCHRf14FLrikGcHKo17X blJ8tbhshAxm49xKnqcn2lNmYfPWAa8Mut9l+ueabIUg9ZVZyOtwI9XPZ8hKNm4uY1qG JT8bQ4yiOkXofJnioffd9m6uvI8Fej952h+2RwLPpEb0nZZPFqyaSXZMVKt9A0S+bFaX yrgz/ZHrBH9x2VKx5C5Ev8rv/803cSjJtdwqvUwEqXCBX3NYqHIIAkrVcfJOlXxSZMoN uSFucTnYcZooZhcJrH7HwH3rjBi4hDXFGsq5PSKCxetybXRD7RG0GW/qQ4M6ntmnLgXv R85g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698272912; x=1698877712; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=a/zohdYHYB8xgSRg6885BBGHf24YwLAdAnopLL9bSYk=; b=d76iUc8zZponQZYf7IhekOWa+XWtIbgj7AVVqBqN/o6ZHvW2bpjY8nHlydr9L0arhP /D4D9Q2YuoxGIgf8CU/GOdbLBjG3KC10yO/fJW3L4ayoqkQy4iBs5nZb6PX5s9q2R8Gy 2tZf+p8mU0nSW9fWglRK6mM0RCAOtw0ar3CuB+jquIylvHcrnrEt1gkKGGY0NfquJ1jr ounySu86l02mr+fKmKZUmWZfDysG7UgQReiLDKbKl5A5H2pkFXZPyieBOy2FdqDqBgqa FTr2zoqyMi5qnn3pzCfzgU4Q8d+mqnTucd23Ul8pU60pmoRmfXWt9ypEEknNw0xjoej6 dCRw== X-Gm-Message-State: AOJu0Ywb5ttisq3f1ZFM+vivs4pgYe0MMM9UoSZx0uxH/O6K79/AcefS 2GZ3IITgHlZDmHmYSwL3qFSzFf2Pt7CAx+9sV0U= X-Received: by 2002:a05:6a00:855:b0:690:2e46:aca3 with SMTP id q21-20020a056a00085500b006902e46aca3mr14596181pfk.25.1698272912282; Wed, 25 Oct 2023 15:28:32 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id q29-20020aa7983d000000b006bdb0f011e2sm9861014pfl.123.2023.10.25.15.28.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 15:28:31 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qvmMX-003vaO-03; Thu, 26 Oct 2023 09:28:29 +1100 Date: Thu, 26 Oct 2023 09:28:29 +1100 From: Dave Chinner To: Jens Axboe Cc: Andres Freund , Theodore Ts'o , Thorsten Leemhuis , Shreeya Patel , linux-ext4@vger.kernel.org, Ricardo =?iso-8859-1?Q?Ca=F1uelo?= , gustavo.padovan@collabora.com, zsm@google.com, garrick@google.com, Linux regressions mailing list , io-uring@vger.kernel.org Subject: Re: task hung in ext4_fallocate #2 Message-ID: References: <20231017033725.r6pfo5a4ayqisct7@awork3.anarazel.de> <20231018004335.GA593012@mit.edu> <20231018025009.ulkykpefwdgpfvzf@awork3.anarazel.de> <74921cba-6237-4303-bb4c-baa22aaf497b@kernel.dk> <4ace2109-3d05-4ca0-b582-f7b8db88a0ca@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ace2109-3d05-4ca0-b582-f7b8db88a0ca@kernel.dk> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 25 Oct 2023 15:28:40 -0700 (PDT) On Tue, Oct 24, 2023 at 06:34:05PM -0600, Jens Axboe wrote: > On 10/24/23 6:06 PM, Dave Chinner wrote: > > On Tue, Oct 24, 2023 at 12:35:26PM -0600, Jens Axboe wrote: > >> On 10/24/23 8:30 AM, Jens Axboe wrote: > >>> I don't think this is related to the io-wq workers doing non-blocking > >>> IO. > > > > The io-wq worker that has deadlocked _must_ be doing blocking IO. If > > it was doing non-blocking IO (i.e. IOCB_NOWAIT) then it would have > > done a trylock and returned -EAGAIN to the worker for it to try > > again later. I'm not sure that would avoid the issue, however - it > > seems to me like it might just turn it into a livelock rather than a > > deadlock.... > > Sorry typo, yes they are doing blocking IO, that's all they ever do. My > point is that it's not related to the issue. > > >>> The callback is eventually executed by the task that originally > >>> submitted the IO, which is the owner and not the async workers. But... > >>> If that original task is blocked in eg fallocate, then I can see how > >>> that would potentially be an issue. > >>> > >>> I'll take a closer look. > >> > >> I think the best way to fix this is likely to have inode_dio_wait() be > >> interruptible, and return -ERESTARTSYS if it should be restarted. Now > >> the below is obviously not a full patch, but I suspect it'll make ext4 > >> and xfs tick, because they should both be affected. > > > > How does that solve the problem? Nothing will issue a signal to the > > process that is waiting in inode_dio_wait() except userspace, so I > > can't see how this does anything to solve the problem at hand... > > Except task_work, which when it completes, will increment the i_dio > count again. This is the whole point of the half assed patch I sent out. What task_work is that? When does that actually run? Please don't assume that everyone is intimately familiar with the subtle complexities of io_uring infrastructure - if the fix relies on a signal from -somewhere- then you need to explain where that signal comes from and why we should be able to rely on that... > > > I'm also very leary of adding new error handling complexity to paths > > like truncate, extent cloning, fallocate, etc which expect to block > > on locks until they can perform the operation safely. > > I actually looked at all of them, ext4 and xfs specifically. It really > doesn't seem to bad. > > > On further thinking, this could be a self deadlock with > > just async direct IO submission - submit an async DIO with > > IOCB_CALLER_COMP, then run an unaligned async DIO that attempts to > > drain in-flight DIO before continuing. Then the thread waits in > > inode_dio_wait() because it can't run the completion that will drop > > the i_dio_count to zero. > > No, because those will be non-blocking. Any blocking IO will go via > io-wq, and that won't then hit the deadlock. If you're doing > inode_dio_wait() from the task itself for a non-blocking issue, then > that would surely be an issue. But we should not be doing that, and we > are checking for it. There's no documentation that says IO submission inside a IOCB_DIO_CALLER_COMP context must be IOCB_NOWAIT. I don't recall it being mentioned during patch submission or review, and if it was ithe implications certainly didn't register with me - I would not have given a rvb without such a landmine either being removed or very well documented. I don't see anywhere that is checked and I don't see how it can be, because the filesystem IO submission path itself has no idea if the caller is already has a IOCB_DIO_CALLER_COMP IO in flight and pending completion. > > Hence it appears to me that we've missed some critical constraints > > around nesting IO submission and completion when using > > IOCB_CALLER_COMP. Further, it really isn't clear to me how deep the > > scope of this problem is yet, let alone what the solution might be. > > I think you're missing exactly what the deadlock is. Then you need to explain exactly what it is, not send undocumented hacks that appear to do absolutely nothing to fix the problem. > > With all this in mind, and how late this is in the 6.6 cycle, can we > > just revert the IOCB_CALLER_COMP changes for now? > > Yeah I'm going to do a revert of the io_uring side, which effectively > disables it. Then a revised series can be done, and when done, we could > bring it back. Please revert the whole lot, I'm now unconvinced that this is functionality we can sanely support at the filesystem level without a whole lot more thought. -Dave. -- Dave Chinner david@fromorbit.com