Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp44667pxy; Wed, 21 Apr 2021 18:01:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzqBSfEukntrln8WTLg6uLw0krQuuIVx9YYLuk73+Iv5rgYtyV31CXpHbYv8eMwMhWeZfBh X-Received: by 2002:a17:903:30c3:b029:ea:afe2:56f5 with SMTP id s3-20020a17090330c3b02900eaafe256f5mr814877plc.64.1619053269533; Wed, 21 Apr 2021 18:01:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619053269; cv=none; d=google.com; s=arc-20160816; b=tRRzr/y/TJTDMIfFDbwsprg1aSKJDdP14uwp3RC6LSjzdFMgdyIsy69zZRMT1SYKXg Q1PeKt8iY93sHekJvDUqysRAAUs+mfnOOZFQ+o6kBUWa7+gZ+pNRyEZEi5l/wZzKHJK/ JK5E8/7TzQ9oZXGh9/zKO+BvZhe4B4yRMbty/g/R/UuiQ3u41Qsy2XXixemF1JXyn9xr 37syiLOILs6D2srZ+vQ7/nZ5Bw7eDlcOLv276bPdKpOi944Wv+94e6B9L6sfNPdD6P4i 1OI9UltcYXNBW8Q1/bxod106x7B3fPWaK4JEQ8YxSIa9WGECJ2gsY6g/lxj5cmkEbKxj HPQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=wIOWuLdJH5pMKRJ/ZJX6Zw7Z12fC5lQYsRHhK0dMXRg=; b=SERTdgtc/n8LrxTR998QVC1fVBXYalz7Kgw0l6WGDdIL+1QVSmzan5bhM6yg0YdgWN GS6NeJdt6gjxCT19ZNonLb9ObnoSREpjC7Hd77IzDmJirm2o1Jc3cMJqZm8T0NsKggzU 9qdEjjxsgky1XYinCpB4Wz+kIlEeqyzrWqgXlEIoOrcu9giBIat2RFKn/vau2Pa5n3ut 1/6Rva67w70+4v3P+5nFNnFcjY3ZL8k+rFab42YT6+C/3iqLPGHLKcmv3pDicbQBgmeb E957pQgnBpQnGTDOI2XpAsaxWG7XctJMBgfEOc9xlz6hiqkpF76o3/kPu7RY8e43YVd2 D3qA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 6si1230697ple.402.2021.04.21.17.59.55; Wed, 21 Apr 2021 18:01:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240670AbhDUQ6y (ORCPT + 99 others); Wed, 21 Apr 2021 12:58:54 -0400 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:36342 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S238561AbhDUQ6y (ORCPT ); Wed, 21 Apr 2021 12:58:54 -0400 Received: from cwcc.thunk.org (pool-72-74-133-215.bstnma.fios.verizon.net [72.74.133.215]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 13LGvd3R003456 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 21 Apr 2021 12:57:40 -0400 Received: by cwcc.thunk.org (Postfix, from userid 15806) id CBF4815C3B0D; Wed, 21 Apr 2021 12:57:39 -0400 (EDT) Date: Wed, 21 Apr 2021 12:57:39 -0400 From: "Theodore Ts'o" To: Jan Kara Cc: Christoph Hellwig , Zhang Yi , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, adilger.kernel@dilger.ca, yukuai3@huawei.com Subject: Re: [RFC PATCH v2 7/7] ext4: fix race between blkdev_releasepage() and ext4_put_super() Message-ID: References: <20210414134737.2366971-1-yi.zhang@huawei.com> <20210414134737.2366971-8-yi.zhang@huawei.com> <20210415145235.GD2069063@infradead.org> <20210420130841.GA3618564@infradead.org> <20210421134634.GT8706@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210421134634.GT8706@quack2.suse.cz> Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Apr 21, 2021 at 03:46:34PM +0200, Jan Kara wrote: > > Indeed, after 12 years in kernel .bdev_try_to_free_page is implemented only > by ext4. So maybe it is not that important? I agree with Zhang and > Christoph that getting the lifetime rules sorted out will be hairy and it > is questionable, whether it is worth the additional pages we can reclaim. > Ted, do you remember what was the original motivation for this? The comment in fs/ext4/super.c is I thought a pretty good explanation: /* * Try to release metadata pages (indirect blocks, directories) which are * mapped via the block device. Since these pages could have journal heads * which would prevent try_to_free_buffers() from freeing them, we must use * jbd2 layer's try_to_free_buffers() function to release them. */ When we modify a metadata block, we attach a journal_head (jh) structure to the buffer_head, and bump the ref count to prevent the buffer from being freed. Before the transaction is committed, the buffer is marked jbddirty, but the dirty bit is not set until the transaction commit. At that back, writeback happens entirely at the discretion of the buffer cache. The jbd layer doesn't get notification when the I/O is completed, nor when there is an I/O error. (There was an attempt to add a callback but that was NACK'ed because of a complaint that it was jbd specific.) So we don't actually know when it's safe to detach the jh from the buffer_head and can drop the refcount so that the buffer_head can be freed. When the space in the journal starts getting low, we'll look at at the jh's attached to completed transactions, and see how many of them have clean bh's, and at that point, we can release the buffer heads. The other time when we'll attempt to detach jh's from clean buffers is via bdev_try_to_free_buffers(). So if we drop the bdev_try_to_free_page hook, then when we are under memory pressure, there could be potentially a large percentage of the buffer cache which can't be freed, and so the OOM-killer might trigger more often. Now, if we could get a callback on I/O completion on a per-bh basis, then we could detach the jh when the buffer is clean --- and as a bonus, we'd get a notification when there was an I/O error writing back a metadata block, which would be even better. So how about an even swap? If we can get a buffer I/O completion callback, we can drop bdev_to_free_swap hook..... - Ted