Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp402508imu; Fri, 7 Dec 2018 03:04:04 -0800 (PST) X-Google-Smtp-Source: AFSGD/WgXxjjXWqdhj1wnEx7TseWwddoUkVif1EygkgpTNEPwB1Y/4mMTYBC3coynXOZQc8EdvNT X-Received: by 2002:a17:902:74c1:: with SMTP id f1mr1644396plt.273.1544180644916; Fri, 07 Dec 2018 03:04:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544180644; cv=none; d=google.com; s=arc-20160816; b=lbtjSxZvWEc+GwlYXs5RrCIQ9PSzrvpCNMxFq52G8b1PIQA3l5acUvnrPo2sGNQJB4 ujonpI5rUaHfbBH4ed5or5/UBUTIyLQwqEPK7D1rhPZCRFVAcC4XFIwvizsiRaI4beJL maFokTe4qT9wuPeMrfbfHPqi20A3KxU4ODDJuIrYQ48Ou54+x15HHEQL0MhJ5u48Logy 3afV1ySwAXOGDEHyQCKVF1CYp2s/hkt2S/hIpwikRfZOOalda9BlszFF1XOK3fzRglHS R9TLu5HD8uFX35UT08hbjitCtBdWvRCArcON55Xg2PA0sBEiFRCKMt6jy+dnW6LiNaVM rEbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DGjca1T2Ov63KVZZuGLHuD6X+CBovF7JzTs2PIojFaQ=; b=tdycNY971OPWEjLbV9IsS5hVuOlFNATbEfW7s26mS05Er8tlP800lACeXtUwbWj3jC 4PAHB/OKw766G8OHMIMyHu4q2+UaP1BlAxcvOB2kJ48B/BWsGIVsFYDQGBaC1DN2qmbY kY/qIbdCqbxN4ANkWGQbjc7tI/J0GV5cyF1AAzWLEce8KWge5GkGYS3i9EYpKrdOYJfz YlHpUmWhRsrwV5M783ZvdHRKw9mrQ80uNZSEEa3GvaViK43AUwS6chgeA8zGOhrG2C6T 8QvZsEJjHXxWrNoVhZVSGkNosxZ5Yl9/677T5mMLRTjBQoYqJDFEMXB+/VXaIe5vwq0y /sTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q128si3044790pfc.179.2018.12.07.03.03.45; Fri, 07 Dec 2018 03:04:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726047AbeLGLBm (ORCPT + 99 others); Fri, 7 Dec 2018 06:01:42 -0500 Received: from mx2.suse.de ([195.135.220.15]:39870 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725989AbeLGLBm (ORCPT ); Fri, 7 Dec 2018 06:01:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6264EAFF8; Fri, 7 Dec 2018 11:01:39 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 93DCD1E0D9D; Fri, 7 Dec 2018 12:01:38 +0100 (CET) Date: Fri, 7 Dec 2018 12:01:38 +0100 From: Jan Kara To: Josef Bacik Cc: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Message-ID: <20181207110138.GE13008@quack2.suse.cz> References: <20181130195812.19536-1-josef@toxicpanda.com> <20181130195812.19536-4-josef@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181130195812.19536-4-josef@toxicpanda.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 30-11-18 14:58:11, Josef Bacik wrote: > Currently we only drop the mmap_sem if there is contention on the page > lock. The idea is that we issue readahead and then go to lock the page > while it is under IO and we want to not hold the mmap_sem during the IO. > > The problem with this is the assumption that the readahead does > anything. In the case that the box is under extreme memory or IO > pressure we may end up not reading anything at all for readahead, which > means we will end up reading in the page under the mmap_sem. > > Instead rework filemap fault path to drop the mmap sem at any point that > we may do IO or block for an extended period of time. This includes > while issuing readahead, locking the page, or needing to call ->readpage > because readahead did not occur. Then once we have a fully uptodate > page we can return with VM_FAULT_RETRY and come back again to find our > nicely in-cache page that was gotten outside of the mmap_sem. > > Signed-off-by: Josef Bacik > --- > mm/filemap.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 93 insertions(+), 20 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index f068712c2525..5e76b24b2a0f 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter); > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +static struct file *maybe_unlock_mmap_for_io(struct file *fpin, > + struct vm_area_struct *vma, > + int flags) > +{ > + if (fpin) > + return fpin; > + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == > + FAULT_FLAG_ALLOW_RETRY) { > + fpin = get_file(vma->vm_file); > + up_read(&vma->vm_mm->mmap_sem); > + } > + return fpin; > +} > > /* > * Synchronous readahead happens when we don't even find > * a page in the page cache at all. > */ > -static void do_sync_mmap_readahead(struct vm_area_struct *vma, > - struct file_ra_state *ra, > - struct file *file, > - pgoff_t offset) > +static struct file *do_sync_mmap_readahead(struct vm_area_struct *vma, > + struct file_ra_state *ra, > + struct file *file, > + pgoff_t offset, > + int flags) > { IMO it would be nicer to pass vmf here at this point. Everything this function needs is there and the number of arguments is already quite big. But I don't insist. > /* > * Asynchronous readahead happens when we find the page and PG_readahead, > * so we want to possibly extend the readahead further.. > */ > -static void do_async_mmap_readahead(struct vm_area_struct *vma, > - struct file_ra_state *ra, > - struct file *file, > - struct page *page, > - pgoff_t offset) > +static struct file *do_async_mmap_readahead(struct vm_area_struct *vma, > + struct file_ra_state *ra, > + struct file *file, > + struct page *page, > + pgoff_t offset, int flags) > { The same here (except for 'page' which needs to be kept). > @@ -2433,9 +2458,32 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > return vmf_error(-ENOMEM); > } > > - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { > - put_page(page); > - return ret | VM_FAULT_RETRY; > + /* > + * We are open-coding lock_page_or_retry here because we want to do the > + * readpage if necessary while the mmap_sem is dropped. If there > + * happens to be a lock on the page but it wasn't being faulted in we'd > + * come back around without ALLOW_RETRY set and then have to do the IO > + * under the mmap_sem, which would be a bummer. > + */ Hum, lock_page_or_retry() has two callers and you've just killed one. I think it would be better to modify the function to suit both callers rather than opencoding? Maybe something like lock_page_maybe_drop_mmap() which would unconditionally acquire the lock and return whether it has dropped mmap sem or not? Callers can then decide what to do. BTW I'm not sure this complication is really worth it. The "drop mmap_sem for IO" is never going to be 100% thing if nothing else because only one retry is allowed in do_user_addr_fault(). So the second time we get to filemap_fault(), we will not have FAULT_FLAG_ALLOW_RETRY set and thus do blocking locking. So I think your code needs to catch common cases you observe in practice but not those super-rare corner cases... Honza -- Jan Kara SUSE Labs, CR