Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp1649345rwr; Thu, 27 Apr 2023 23:02:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6jlIgM5bRfk8ihVBn+VxDXCCsVvngAyv07BZURgOQ4X+yjEX5/yBFWrJt80HUQ5PyFPTkI X-Received: by 2002:a05:6a20:7f82:b0:f5:c6e5:e65 with SMTP id d2-20020a056a207f8200b000f5c6e50e65mr5052866pzj.7.1682661700291; Thu, 27 Apr 2023 23:01:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682661700; cv=none; d=google.com; s=arc-20160816; b=vrOdSYRtoGZNtmrTVVWx08NXxq7yApVlx6+zd5c+67zXC7D77tP5f5UnpodoJ+D/hw Cu+zUd5jKJceXpf5Yyh33SgIRhyPJJvyUvA32NHsyn+PG+slQ7ZZ8cNsoBA8ZtBTEYN8 EJrCICHLjWIvC4wrYr6SV4S6AKP8OHtMlEdUp0uxjh0DQxTj4XG5hjULxkJ2vihIC0mA 6P3HSPOMPvf2z18ItanNfLh7Gytqs7GHkCnA9XaCNtRzT1cY9IfU6u238/ZSOeSS+mte lDxOwKqgnnoNRzRgrWoxuwe4KYYE0R5qjTHxfs1E4sJgwBkKP/qObUmRxyfxkUpYL31n vFtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=n45cV6emPGhOhWVrM3aPDXulvuQpgauHWb0nGls0bOE=; b=i/B3nhg7OXscA9BKE4aozZ52QuvyObPCVu68CxPgCRrnka0ubOnVHHIANToeneUb/5 3Eei0wRKaYJpPXUhk8B4fNJ0YbLOar5uUTICglrzSRUTEAokhh6SlBOsbhkFLV+NAdNf VLZeTjVFYzLBJBLLtWLtR2bcV5D2gAdJ+fL1+kqjBxo41G91WUYHz19bKuwEsB3wI6sP UvY94LHdXsvZyGG7VdVuZFxtflboNnH2Mq8zGvqGxHBSTwgMOeELFNbS+2xfv1oAJ2UQ nbNTet7JtJ5yHIv+pewuTTJJOV9+/4kIa3e5DA8Xux8MMBbYLBbXC36e0wDBaoGwEtKr bK1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=HdOr1Ncm; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g191-20020a636bc8000000b0051a650b8f62si20827207pgc.836.2023.04.27.23.01.24; Thu, 27 Apr 2023 23:01:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=HdOr1Ncm; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345256AbjD1FvY (ORCPT + 99 others); Fri, 28 Apr 2023 01:51:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345309AbjD1FvU (ORCPT ); Fri, 28 Apr 2023 01:51:20 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 902F940E5 for ; Thu, 27 Apr 2023 22:51:17 -0700 (PDT) Received: from letrec.thunk.org ([76.150.80.181]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 33S5lMlM026990 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Apr 2023 01:47:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1682660846; bh=n45cV6emPGhOhWVrM3aPDXulvuQpgauHWb0nGls0bOE=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=HdOr1NcmCK3MvPO2flI/Eva8KXzuIVfF5D3UOs5MkdFFBoUxAslYlHnX2IUEEGUmg dcw7Zved6ZgTRwJw3fO/GEQBxgRKwQy986GY+v3jl6XgEpqrsNHyH3e18DjI3gTBpM 4J3jqLQAJ3gnOgxeweK4gZbDact/GWtxUHMpOciVtFAcv6mSTup4dQRkrDyrVY8vXY 5UNs20kH3oBnNCJljcpMWHSw5NgJ82Xp2aqUF7Hoh5MdUZBY1LfxHtHhe7slW6iAYM 0nHIgyBJogWvJ3XbAhzJy5SkWAqswP/A+yi7Q0GGJxoB5KuJg0v+5rx6W4hmZxrWcK AEfZtOOyql9zA== Received: by letrec.thunk.org (Postfix, from userid 15806) id 0C1D48C0208; Fri, 28 Apr 2023 01:47:22 -0400 (EDT) Date: Fri, 28 Apr 2023 01:47:22 -0400 From: "Theodore Ts'o" To: Baokun Li Cc: Ming Lei , Matthew Wilcox , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , yangerkun Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: <663b10eb-4b61-c445-c07c-90c99f629c74@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Apr 28, 2023 at 11:47:26AM +0800, Baokun Li wrote: > Ext4 just detects I/O Error and remounts it as read-only, it doesn't know > if the current disk is dead or not. > > I asked Yu Kuai and he said that disk_live() can be used to determine > whether > a disk has been removed based on the status of the inode corresponding to > the block device, but this is generally not done in file systems. What really needs to happen is that del_gendisk() needs to inform file systems that the disk is gone, so that the file system can shutdown the file system and tear everything down. disk_live() is relatively new; it was added in August 2021. Back in 2015, I had added the following in fs/ext4/super.c: /* * The del_gendisk() function uninitializes the disk-specific data * structures, including the bdi structure, without telling anyone * else. Once this happens, any attempt to call mark_buffer_dirty() * (for example, by ext4_commit_super), will cause a kernel OOPS. * This is a kludge to prevent these oops until we can put in a proper * hook in del_gendisk() to inform the VFS and file system layers. */ static int block_device_ejected(struct super_block *sb) { struct inode *bd_inode = sb->s_bdev->bd_inode; struct backing_dev_info *bdi = inode_to_bdi(bd_inode); return bdi->dev == NULL; } As the comment states, it's rather awkward to have the file system check to see if the block device is dead in various places; the real problem is that the block device shouldn't just *vanish*, with the block device structures egetting partially de-initialized, without the block layer being polite enough to let the file system know. > Those dirty pages that are already there are piling up and can't be > written back, which I think is a real problem. Can the block layer > clear those dirty pages when it detects that the disk is deleted? Well, the dirty pages belong to the file system, and so it needs to be up to the file system to clear out the dirty pages. But I'll also what the right thing to do when a disk gets removed is not necessarily obvious. For example, suppose some process has a file mmap'ed into its address space, and that file is on the disk which the user has rudely yanked out from their laptop; what is the right thing to do? Do we kill the process? Do we let the process write to the mmap'ed region, and silently let the modified data go *poof* when the process exits? What if there is an executable file on the removable disk, and there are one or more processes running that executable when the device disappears? Do we kill the process? Do we let the process run unti it tries to access a page which hasn't been paged in and then kill the process? We should design a proper solution for What Should Happen when a removable disk gets removed unceremoniously without unmounting the file system first. It's not just a matter of making some tests go green.... - Ted