Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp19862387rwd; Wed, 28 Jun 2023 15:45:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6jV6mSDXuH0MCYnDpd8cRnPJ1RdxKACuCpyn56ID92fJcD3znM6zS3zLQSvPWW2Z4bWgrg X-Received: by 2002:a05:6a20:748c:b0:11f:39e2:d09a with SMTP id p12-20020a056a20748c00b0011f39e2d09amr26550183pzd.33.1687992335165; Wed, 28 Jun 2023 15:45:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687992335; cv=none; d=google.com; s=arc-20160816; b=uRhI1nIWHNJuqaKQKuIwNUiNMPZ73TjNrkL/jIiMsSgZTdlM5FkWgzJlgLBqFVkNkL 72vgEhj8I22n/KEd2QzZIlfRJbycp14Xmw0AaHLCC+StQsOsoeckH8WJ2c8Ue6nJw2ns pWLCeh5EpY8yLwpfwjvnRpUpzlPiSXi9RrPwNYqF/CxMeMCVIaFp79rm5AkdMPzH5yMK cWAXVV50GvxT4OVRo8BjyI3CmIhaZaXxtwwZmSVAdB8R21OADnqbOo1AgENJxzqJZr7r rhHu6xE2wSQOkVZkZAMoWN9ZkpUoffWnX/H1No07OsAQbfxusyqDzQ7HCte5zI7x1NJS ZdJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=0oW5BXhQovw4TV6zdZd424J+I/WFKk4cAc5zD1n255c=; fh=GlLie4/BvyG9HFmLRy3ceDcVH12sYJwMavTUwn1HIzg=; b=S0+8xserIAk8zDPElsZ+Gyxd991EZs4b94NWLVshFdxVD5jdE5khdPef3WEvJsL9vQ 0LxctNmvu5oDfvzOHASixoQW3yCH5vxCo+Lpa0tM3Dd5lZYN6Dip/zczW8cbhXQNVvYs 0CyzwxUvYb70nqq2hfC0EBOg58dlYtNW6HHElpL2h0NgRF8/PQSWUDoRLLWbWiPSDBC+ G5qA0OVB8MPdjoL/dSqf9ZaPHYbsnxKYnodfJGXN95uZvqR8v8F7yos4SMPncGuKRLs/ 6jTaHOyt4/qW3s1pWKWx3ewb4vEvvUwW9X0EE0q0BKDk8a2WxES2cdt39/wtzE2IwLd6 FIuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20221208.gappssmtp.com header.s=20221208 header.b=c9gdujX9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 26-20020a63155a000000b00537c6c285a0si9458093pgv.167.2023.06.28.15.45.22; Wed, 28 Jun 2023 15:45:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20221208.gappssmtp.com header.s=20221208 header.b=c9gdujX9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231207AbjF1WeC (ORCPT + 99 others); Wed, 28 Jun 2023 18:34:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229581AbjF1WeA (ORCPT ); Wed, 28 Jun 2023 18:34:00 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71A9726BF for ; Wed, 28 Jun 2023 15:33:58 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-656bc570a05so13912b3a.0 for ; Wed, 28 Jun 2023 15:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1687991638; x=1690583638; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=0oW5BXhQovw4TV6zdZd424J+I/WFKk4cAc5zD1n255c=; b=c9gdujX9zi75jg4hmjr8oFrVx/hG1FqUKaICQS9v4vWmbwHtPoIvFmN2HPuH1nKOmH Pr1msq3OxLAmq2VcgBoES9iaK9Zc8z+b5ZQZd71b7s5lgDlgAHwyTujMIW1MRU8a4rr2 p4/A8htbaVVzcTDo2KN3o+qS2x186ciX4lMjB8XYBjVnPZZh9ljhVdjpitMuL4FYAxBM qGV5qyKojK31W9x0w3Y5llYh7hoHMKOqibL5dsKzLQhJs5KSD1DmJqhmRHbLM5tbRk6E S9JurP+nkWdOKQDd9z/CfYnXAE2hhs8knpCXHLcZ978FdyBF8flULECq7F+jYV0tgouY dGxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687991638; x=1690583638; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0oW5BXhQovw4TV6zdZd424J+I/WFKk4cAc5zD1n255c=; b=SZ+8FSqAtNBg98BfoowmMWVQn5W7uh4elRhe10JnAIVncWGuwir8iWBAbeV/uU5aYP MQwxdkBzPbrvD9cvyYZOmuexIN8Q7BksWVR9BlnaGh1e6NO64gb5uWySg25IpWUnox1S 2729q6xCT04NDXwaAWBV8frTdNI7qt4C1ssJxDwTdIEGKDx+WsPnVtjXtxGw2mbiE6yu E1TruOcJ0WBjtULyphOQZ3aUMv+b99B8A15VYWh2EMyXObRctlsd4FQS/2n5pAwzDbDI 34kdfa2F5eqbP4ShPLO2G9Jgak1xW9i3HXbVg6psu3pGvhbSYY3TPyjugRTpp4N6U+zK LxaA== X-Gm-Message-State: AC+VfDyLRbGL4HGqhdiCNzc4MIFRncVxBPboG2g4oqq7hK4YuCnWe5LC pwU/b6cmHuBPTHElkFq5JnYrGw0zXpb1OliiSYA= X-Received: by 2002:a05:6a20:8f04:b0:121:b5af:acbc with SMTP id b4-20020a056a208f0400b00121b5afacbcmr38497363pzk.3.1687991637811; Wed, 28 Jun 2023 15:33:57 -0700 (PDT) Received: from [192.168.1.136] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id z6-20020aa791c6000000b0065434edd521sm1236833pfa.196.2023.06.28.15.33.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Jun 2023 15:33:56 -0700 (PDT) Message-ID: Date: Wed, 28 Jun 2023 16:33:55 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [GIT PULL] bcachefs Content-Language: en-US To: Kent Overstreet Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Christoph Hellwig , Christian Brauner , Al Viro References: <23922545-917a-06bd-ec92-ff6aa66118e2@kernel.dk> <20230627201524.ool73bps2lre2tsz@moria.home.lan> <20230628040114.oz46icbsjpa4egpp@moria.home.lan> <4b863e62-4406-53e4-f96a-f4d1daf098ab@kernel.dk> <20230628175204.oeek4nnqx7ltlqmg@moria.home.lan> <2e635579-37ba-ddfc-a2ab-e6c080ab4971@kernel.dk> <20230628221342.4j3gr3zscnsu366p@moria.home.lan> From: Jens Axboe In-Reply-To: <20230628221342.4j3gr3zscnsu366p@moria.home.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/28/23 4:13?PM, Kent Overstreet wrote: > On Wed, Jun 28, 2023 at 03:17:43PM -0600, Jens Axboe wrote: >> Case in point, just changed my reproducer to use aio instead of >> io_uring. Here's the full script: >> >> #!/bin/bash >> >> DEV=/dev/nvme1n1 >> MNT=/data >> ITER=0 >> >> while true; do >> echo loop $ITER >> sudo mount $DEV $MNT >> fio --name=test --ioengine=aio --iodepth=2 --filename=$MNT/foo --size=1g --buffered=1 --overwrite=0 --numjobs=12 --minimal --rw=randread --output=/dev/null & >> Y=$(($RANDOM % 3)) >> X=$(($RANDOM % 10)) >> VAL="$Y.$X" >> sleep $VAL >> ps -e | grep fio > /dev/null 2>&1 >> while [ $? -eq 0 ]; do >> killall -9 fio > /dev/null 2>&1 >> echo will wait >> wait > /dev/null 2>&1 >> echo done waiting >> ps -e | grep "fio " > /dev/null 2>&1 >> done >> sudo umount /data >> if [ $? -ne 0 ]; then >> break >> fi >> ((ITER++)) >> done >> >> and if I run that, fails on the first umount attempt in that loop: >> >> axboe@m1max-kvm ~> bash test2.sh >> loop 0 >> will wait >> done waiting >> umount: /data: target is busy. >> >> So yeah, this is _nothing_ new. I really don't think trying to address >> this in the kernel is the right approach, it'd be a lot saner to harden >> the xfstest side to deal with the umount a bit more sanely. There are >> obviously tons of other ways that a mount could get pinned, which isn't >> too relevant here since the bdev and mount point are basically exclusive >> to the test being run. But the kill and delayed fput is enough to make >> that case imho. > > Uh, count me very much not in favor of hacking around bugs elsewhere. > > Al, do you know if this has been considered before? We've got fput() > being called from aio completion, which often runs out of a worqueue (if > not a workqueue, a bottom half of some sort - what happens then, I > wonder) - so the effect is that it goes on the global list, not the task > work list. > > hence, kill -9ing a process doing aio (or io_uring io, for extra > reasons) causes umount to fail with -EBUSY. > > and since there's no mechanism for userspace to deal with this besides > sleep and retry, this seems pretty gross. But there is, as Christian outlined. I would not call it pretty or intuitive, but you can in fact make it work just fine and not just for the deferred fput() case but also in the presence of other kinds of pins. Of which there are of course many. > I'd be willing to tackle this for aio since I know that code... But it's not aio (or io_uring or whatever), it's simply the fact that doing an fput() from an exiting task (for example) will end up being done async. And hence waiting for task exits is NOT enough to ensure that all file references have been released. Since there are a variety of other reasons why a mount may be pinned and fail to umount, perhaps it's worth considering that changing this behavior won't buy us that much. Especially since it's been around for more than 10 years: commit 4a9d4b024a3102fc083c925c242d98ac27b1c5f6 Author: Al Viro Date: Sun Jun 24 09:56:45 2012 +0400 switch fput to task_work_add though that commit message goes on to read: We are guaranteed that __fput() will be done before we return to userland (or exit). Note that for fput() from a kernel thread we get an async behaviour; it's almost always OK, but sometimes you might need to have __fput() completed before you do anything else. There are two mechanisms for that - a general barrier (flush_delayed_fput()) and explicit __fput_sync(). Both should be used with care (as was the case for fput() from kernel threads all along). See comments in fs/file_table.c for details. where that first sentence isn't true if the task is indeed exiting. I guess you can say that it is as it doesn't return to userland, but splitting hairs. Though the commit in question doesn't seem to handle that case, but assuming that came in with a later fixup. It is true if the task_work gets added, as that will get run before returning to userspace. If a case were to be made that we also guarantee that fput has been done by the time to task returns to userspace, or exits, then we'd probably want to move that deferred fput list to the task_struct and ensure that it gets run if the task exits rather than have a global deferred list. Currently we have: 1) If kthread or in interrupt 1a) add to global fput list 2) task_work_add if not. If that fails, goto 1a. which would then become: 1) If kthread or in interrupt 1a) add to global fput list 2) task_work_add if not. If that fails, we know task is existing. add to per-task defer list to be run at a convenient time before task has exited. and seems a lot saner than hacking around this in umount specifically. -- Jens Axboe