Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1078521pxb; Fri, 22 Jan 2021 06:44:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJxiKue/M4SJKYOAkVR6ZQZ7KZHRvR99pBuuXM5oTJO6mLp1jcdF9g3GpqchWT2ovkxrA/Wt X-Received: by 2002:a17:907:7785:: with SMTP id ky5mr913640ejc.176.1611326673830; Fri, 22 Jan 2021 06:44:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611326673; cv=none; d=google.com; s=arc-20160816; b=nxL0cR3QIWyIy3Zk7/BxQcYUNhIDQXehMb7tKtL28xGeotdkTYWkUrCKStORD8zSSs k8ONzk42IfnJMlaHHD+YhiogXZU/PLacUAzghZzYN0M36aHiz4+fLopuMrsHsLyyNEkT vAEMdvns+IKD+ReF9/L4e67E3iDT2dXvL8AlVI0y6cv7wFgy5sUUZnLWlCzw9r/LSwFo C2Kd6xOeps3v3ZKhv1KzhOJvdxjimoCbBi7NTeqp9WgSZVpxEyKtQeqCyDJpWpHFXo7g fRib7UW/TCL+x8FM7N9Pd+19Rrp5GHh8GlzkmXiDMfo/XfNb7XhzXcii5c+gnVRXbXwQ 41cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=i2cPcSJyw4ak+bdQzjrhKdw5wI7pBiOoEoiSweSe4bo=; b=CBaTgaNEt0xjZhsit73gqtmatUYTod3XbsfBPOJ93EOs7XLGKbHXlu+M/w4bANV380 ehIaYVpjc4zy4sEcwf1SM8VwdApASvTxZ0KT7GWG6L9YHiEI/YjijB0o3AD77Hl+59T+ YHEj2/8J0lpoqa2NKatPzj3lh3rQTMSxtahJRrWMsiSNX2P2/8+t3/iwdaVty7lxNt6m MXoHXt10TrU6iH0rbG3Lqr7rouW+XeAg2sUgfPOXAMKDKRe/XSgO3ZquzCx7+PZprWBi fiXNsyEPvj9Kol8zYMlAzIuhJa6tlvzakG7d0m4e3Z0H8Gq1bXqMFeA21tQMzhYU82E8 6stw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p18si3173467edx.541.2021.01.22.06.44.03; Fri, 22 Jan 2021 06:44:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728817AbhAVOnh (ORCPT + 99 others); Fri, 22 Jan 2021 09:43:37 -0500 Received: from mx2.suse.de ([195.135.220.15]:60932 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728612AbhAVOdQ (ORCPT ); Fri, 22 Jan 2021 09:33:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D71C5ADA2; Fri, 22 Jan 2021 14:32:32 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 94D161E14C3; Fri, 22 Jan 2021 15:32:32 +0100 (CET) Date: Fri, 22 Jan 2021 15:32:32 +0100 From: Jan Kara To: Matthew Wilcox Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org Subject: Re: [PATCH 0/3 RFC] fs: Hole punch vs page cache filling races Message-ID: <20210122143232.GA1175@quack2.suse.cz> References: <20210120160611.26853-1-jack@suse.cz> <20210121192755.GC4127393@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210121192755.GC4127393@casper.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu 21-01-21 19:27:55, Matthew Wilcox wrote: > On Wed, Jan 20, 2021 at 05:06:08PM +0100, Jan Kara wrote: > > Hello, > > > > Amir has reported [1] a that ext4 has a potential issues when reads can race > > with hole punching possibly exposing stale data from freed blocks or even > > corrupting filesystem when stale mapping data gets used for writeout. The > > problem is that during hole punching, new page cache pages can get instantiated > > in a punched range after truncate_inode_pages() has run but before the > > filesystem removes blocks from the file. In principle any filesystem > > implementing hole punching thus needs to implement a mechanism to block > > instantiating page cache pages during hole punching to avoid this race. This is > > further complicated by the fact that there are multiple places that can > > instantiate pages in page cache. We can have regular read(2) or page fault > > doing this but fadvise(2) or madvise(2) can also result in reading in page > > cache pages through force_page_cache_readahead(). > > Doesn't this indicate that we're doing truncates in the wrong order? > ie first we should deallocate the blocks, then we should free the page > cache that was caching the contents of those blocks. We'd need to > make sure those pages in the page cache don't get written back to disc > (either by taking pages in the page cache off the lru list or having > the filesystem handle writeback of pages to a freed extent as a no-op). Well, it depends on how much you wish to complicate the matters :). Filesystems have metadata information attached to pages (e.g. buffer heads), once you are removing blocks from a file, this information is becoming stale. So it makes perfect sense to first evict page cache to remove this metadata caching information and then remove blocks from on-disk structures. You can obviously try to do it the other way around - i.e., first remove blocks from on-disk structures and then remove the cached information from the page cache. But then you have to make sure stale cached information isn't used until everything is in sync. So whichever way you slice it, you have information in two places, you need to keep it in sync and you need some synchronization between different updaters of this information in both places so that they cannot get those two places out of sync... TLDR: I don't see much benefit in switching the ordering. Honza -- Jan Kara SUSE Labs, CR