Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp359835pxb; Wed, 14 Apr 2021 17:48:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyF/ux2jCAPJBxHFMpxbYF1d8ryb3pA6OvznoQzZ5aBncsnaJHvM+2X7ceTjSud/YD9f5KB X-Received: by 2002:a63:105d:: with SMTP id 29mr997377pgq.45.1618447704252; Wed, 14 Apr 2021 17:48:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618447704; cv=none; d=google.com; s=arc-20160816; b=ny7QvE7GDyrvokpQiNigj7lI4bHrNBDmpfaLjDTq2cTtM3VwjQKlwNM1bXfuncnsyJ /Jg/FUZ626/Ti8oa2nf5nvAmYWNM0nNuJzmpzfLmNHXCuRcuR0h86b19bQjUpTJpJIOU RjIxNFihPpbFtWMpDxUFxWmcZpXOHwP2XaxvBNOZMKHD8ivFW6jtgmB6nFgIscc74szO PDqPbcumVaBN3InAuER7bwxhLkIPdQJwecLlOJ2yKwFL0q0bnD4oAcWR/r4xPM6Cthl3 n3+hctubKWh83x5urCxGU59GtXbjaJbDxgbAA345N3ig2QN4X9n2OSpYExHpWgVgG9Y+ 3b/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=My7VvxzFxSKczzN1SQWcBeooJ99yliKrbC8/96CIiDo=; b=tKmMFlQWxhlgEoKqhLYqo4QnweO2mQv5qGRVVWlTdR/DBc0b3nijzqV7WBqibjmW+F B8j7LRznG5usx+z0bNY6dN2rVxkmlspVGQxME0cZCJiXOwqahcU5NfXdPtPjzRrHoRmW V3vYB/Xh8leDr8SAbdg7z2sdboMHyL9CDZDPwzIk/KVMtdSy8INA4V9A0jAJwaFDJ/8A YXzFxc4CUKrAMlTwLocXOspqi2OMS8nMnOOpqsTK+MwUXK6tSghRIrG8CHPXkqnD11RC XiavnnZUXnU/dMeVPeiMyMP0Vm7wnPuzqEbarr2WlJ0Roh1Enz0tsimLhlKn+DACgXvE EO7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s25si1176389pfh.8.2021.04.14.17.48.11; Wed, 14 Apr 2021 17:48:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230107AbhDNV6G (ORCPT + 99 others); Wed, 14 Apr 2021 17:58:06 -0400 Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:59054 "EHLO mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230018AbhDNV6G (ORCPT ); Wed, 14 Apr 2021 17:58:06 -0400 Received: from dread.disaster.area (pa49-181-239-12.pa.nsw.optusnet.com.au [49.181.239.12]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 6F85C1140474; Thu, 15 Apr 2021 07:57:41 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1lWnVz-008Dty-Ua; Thu, 15 Apr 2021 07:57:39 +1000 Date: Thu, 15 Apr 2021 07:57:39 +1000 From: Dave Chinner To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, Ted Tso , Christoph Hellwig , Amir Goldstein Subject: Re: [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock Message-ID: <20210414215739.GH63242@dread.disaster.area> References: <20210413105205.3093-1-jack@suse.cz> <20210413112859.32249-2-jack@suse.cz> <20210414000113.GG63242@dread.disaster.area> <20210414122319.GD31323@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210414122319.GD31323@quack2.suse.cz> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0 cx=a_idp_f a=gO82wUwQTSpaJfP49aMSow==:117 a=gO82wUwQTSpaJfP49aMSow==:17 a=kj9zAlcOel0A:10 a=3YhXtTcJ-WEA:10 a=7-415B0cAAAA:8 a=okFlZK5Gy1F5i8BF3G8A:9 a=FPZG3ZJ8YKhqHbYJ:21 a=YwUA21l3Sj-Qg0rY:21 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Apr 14, 2021 at 02:23:19PM +0200, Jan Kara wrote: > On Wed 14-04-21 10:01:13, Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 01:28:46PM +0200, Jan Kara wrote: > > > index c5b0457415be..ac5bb50b3a4c 100644 > > > --- a/mm/readahead.c > > > +++ b/mm/readahead.c > > > @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > > */ > > > unsigned int nofs = memalloc_nofs_save(); > > > > > > + down_read(&mapping->host->i_mapping_sem); > > > /* > > > * Preallocate as many pages as we will need. > > > */ > > > > I can't say I'm a great fan of having the mapping reach back up to > > the host to lock the host. THis seems the wrong way around to me > > given that most of the locking in the IO path is in "host locks > > mapping" and "mapping locks internal mapping structures" order... > > > > I also come back to the naming confusion here, in that when we look > > at this in long hand from the inode perspective, this chain actually > > looks like: > > > > lock(inode->i_mapping->inode->i_mapping_sem) > > > > i.e. the mapping is reaching back up outside it's scope to lock > > itself against other inode->i_mapping operations. Smells of layering > > violations to me. > > > > So, next question: should this truncate semanphore actually be part > > of the address space, not the inode? This patch is actually moving > > the page fault serialisation from the inode into the address space > > operations when page faults and page cache operations are done, so > > maybe the lock should also make that move? That would help clear up > > the naming problem, because now we can name it based around what it > > serialises in the address space, not the address space as a whole... > > I think that moving the lock to address_space makes some sence although the > lock actually protects consistency of inode->i_mapping->i_pages with > whatever the filesystem has in its file_offset->disk_block mapping > structures (which are generally associated with the inode). Well, I look at is as a mechanism that the filesystem uses to ensure coherency of the page cache accesses w.r.t. physical layout changes. The layout is a property of the inode, but changes to the physical layout of the inode are serialised by other inode based mechanisms. THe page cache isn't part of the inode - it's part of the address space - but coherency with the inode is required. Hence inode operations need to be able to ensure coherency of the address space content and accesses w.r.t. physical layout changes of the inode, but the address space really knows nothing about the physical layout of the inode or how it gets changed... Hence it's valid for the inode operations to lock the address space to ensure coherency of the page cache when making physical layout changes, but locking the address space, by itself, is not sufficient to safely serialise against physical changes to the inode layout. > So it is not > only about inode->i_mapping contents but I agree that struct address_space > is probably a bit more logical place than struct inode. Yup. Remember that the XFS_MMAPLOCK arose at the inode level because that was the only way the filesystem could acheive the necessary serialisation of page cache accesses whilst doing physical layout changes. So the lock became an "inode property" because of implementation constraints, not because it was the best way to implement the necessary coherency hooks. > Regarding the name: How about i_pages_rwsem? The lock is protecting > invalidation of mapping->i_pages and needs to be held until insertion of > pages into i_pages is safe again... I don't actually have a good name for this right now. :( The i_pages structure has it's own internal locking, so i_pages_rwsem implies things that aren't necessarily true, and taking a read lock for insertion for something that is named like a structure protection lock creates cognitive dissonance... I keep wanting to say "lock for invalidation" and "lock to exclude invalidation" because those are the two actions that we need for coherency of operations. But they are way too verbose for an actual API... So I want to call this an "invalidation lock" of some kind (no need to encode the type in the name!), but haven't worked out a good shorthand for "address space invalidation coherency mechanism"... Naming is hard. :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com