Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp2560857ima; Mon, 22 Oct 2018 11:48:50 -0700 (PDT) X-Google-Smtp-Source: ACcGV60ZYVEtOD15nemijukzfpLlyJA4x0VjtyAS9wz2i8sGm41KorAI7xq7sphHSv8K8dVM/3dJ X-Received: by 2002:a17:902:1ab:: with SMTP id b40-v6mr46067912plb.82.1540234130092; Mon, 22 Oct 2018 11:48:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540234130; cv=none; d=google.com; s=arc-20160816; b=b8i1NfOkj7iiplsnAWN4Rx0YAdtllA4pZ6qMzwdflGXwZ6db6EAzaMjC5gEhHeTZQb DvIVPHrr7S10D5FTPrjMpB01liHGJroWOJtmDp7qxBKV7unM4qP9u+fCfYJAwN1KVmcf Aq+FN51JSVAUv8gpfwwQupPIB7A/GviL6hCjDapGMC+Hh1T11NE1lQFlVPL4dWxDNN2e DX7EcKAlpb/D9lrYzWLL8XsAMrg5RUKhptDmmvHbffXb6INyDWSRtui91yxr5QEfLv/r BjOTp7bufMJ9Jwq4+Z3s1kZWg46trghbDzYtSD0FBX321wTVVGsln99SWu3eiuRRcmk0 iSdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=my5wh6X5Z2sj/VWBRX7uC0ClqBKdVPmhq2qlgotFP0U=; b=q4zrHB0p2mVIXNntd8yFaYtfN9D0W/aXku6op8S9yQHBskyYF5FzAJCYFivEUuTZuq VvK9LIMtlqXPO4GT9ulzUxgBWua+w50UCXLTafTl31b5mafByI5Dn/9wAYfrRTp86C9/ kFcE7G2uR7RBuFRv54RTyw8Sr4jUNKQP+tDfJ3StYxW3zoM4ZabxFzzZb5lE9625UnG7 j1YpP5wsdeHukPrBV7rHZkJVDYY7v8VC4/7Y/cghxgzHTkGjTwGSqvryk1y46AtLHFh+ 9wM0Ogw7ngUhDX3jgHXgc+c657ftgyr420glzcXhE9EbgNmPGAy3HQXkfmqGvKW0M3jA 4tdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=0AiVyJMl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a6-v6si37480911pln.78.2018.10.22.11.48.34; Mon, 22 Oct 2018 11:48:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=0AiVyJMl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728924AbeJWCQ2 (ORCPT + 99 others); Mon, 22 Oct 2018 22:16:28 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:36403 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728876AbeJWCQ2 (ORCPT ); Mon, 22 Oct 2018 22:16:28 -0400 Received: by mail-qt1-f194.google.com with SMTP id u34-v6so47354588qth.3 for ; Mon, 22 Oct 2018 10:56:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=my5wh6X5Z2sj/VWBRX7uC0ClqBKdVPmhq2qlgotFP0U=; b=0AiVyJMlWD2ykObNuH0HBjsZhep/bkFPv9/FWWz8mtuDFhtkqPOWAdpZD5pjUnyQ1I /zLuH2UQS2KSMAH6YTbZaNtQnV0ffkEAGsLFnjfQSVk6KGrIOfbSnKsw31mXbxuhdh+2 A+lMVxVjANuTYyF2Jo1TBwzCovT64sYO/WXSOB/sxCRc60t8Hak5vCdLrSmq1yo8IlZ2 hybMy+NwIQXXYHcQKPyZy4CJkxQdRqbQ7DOfd4PWNU4hRnCo28V4lftCmupIq5ok/lzP I18ApYV/U8HaFyYx3Srq4KJ9et7x50O9bxYJwdO2EmIFhSOUWSAr2JB0RYiylhpGEhvU +ICA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=my5wh6X5Z2sj/VWBRX7uC0ClqBKdVPmhq2qlgotFP0U=; b=nWrKRV86pawupkucOr3pCIZFz/VQadPzIdPOT0Mdp/xmpvY2N1zrhN7YgdWPNQwHRE TPhPDjWorLPdMp26l3wbqtxcG5GvDdUOCTkXTalnTHAvlZEfRu5lAWqXHWIzqC5nLlx7 5U35aw4fdccPzUbvZ2gR1ybc25kVcx2dbqq+7baFXzCcfTvKgr/UK2EUdoBO9r5Oah/e LXdS7FVhiLtxsTdj/gKxBG30Ap3YE6/P6hbAabHYmp4KrmMuttzh+GSu2xHfv8HUElk0 14YqmOUVSXKLXMoW35OXr6Wac6X6QwMl5F8DOYyhadXGoiqGnLNZT1VCt28rwcqq20cw eUvA== X-Gm-Message-State: AGRZ1gIEoPu18+zggOEoNTALp+EpvskOPoPyY/BVEqZ6Urjyryjr5wDi lX2jG9OWp+w3Jh8FXSBYWp8neA== X-Received: by 2002:a0c:f7c8:: with SMTP id f8mr6000977qvo.154.1540231016322; Mon, 22 Oct 2018 10:56:56 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id 70-v6sm7352212qkf.35.2018.10.22.10.56.54 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 22 Oct 2018 10:56:55 -0700 (PDT) Date: Mon, 22 Oct 2018 13:56:54 -0400 From: Josef Bacik To: Dave Chinner Cc: Josef Bacik , kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org, riel@fb.com, linux-mm@kvack.org Subject: Re: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs Message-ID: <20181022175652.ase7u23uzizqtlao@destiny> References: <20181018202318.9131-1-josef@toxicpanda.com> <20181018202318.9131-8-josef@toxicpanda.com> <20181019034847.GM18822@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181019034847.GM18822@dastard> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 19, 2018 at 02:48:47PM +1100, Dave Chinner wrote: > On Thu, Oct 18, 2018 at 04:23:18PM -0400, Josef Bacik wrote: > > ->page_mkwrite is extremely expensive in btrfs. We have to reserve > > space, which can take 6 lifetimes, and we could possibly have to wait on > > writeback on the page, another several lifetimes. To avoid this simply > > drop the mmap_sem if we didn't have the cached page and do all of our > > work and return the appropriate retry error. If we have the cached page > > we know we did all the right things to set this page up and we can just > > carry on. > > > > Signed-off-by: Josef Bacik > > --- > > fs/btrfs/inode.c | 41 +++++++++++++++++++++++++++++++++++++++-- > > include/linux/mm.h | 14 ++++++++++++++ > > mm/filemap.c | 3 ++- > > 3 files changed, 55 insertions(+), 3 deletions(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 3ea5339603cf..6b723d29bc0c 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -8809,7 +8809,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, > > vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) > > { > > struct page *page = vmf->page; > > - struct inode *inode = file_inode(vmf->vma->vm_file); > > + struct file *file = vmf->vma->vm_file, *fpin; > > + struct mm_struct *mm = vmf->vma->vm_mm; > > + struct inode *inode = file_inode(file); > > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; > > struct btrfs_ordered_extent *ordered; > > @@ -8828,6 +8830,29 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) > > > > reserved_space = PAGE_SIZE; > > > > + /* > > + * We have our cached page from a previous mkwrite, check it to make > > + * sure it's still dirty and our file size matches when we ran mkwrite > > + * the last time. If everything is OK then return VM_FAULT_LOCKED, > > + * otherwise do the mkwrite again. > > + */ > > + if (vmf->flags & FAULT_FLAG_USED_CACHED) { > > + lock_page(page); > > + if (vmf->cached_size == i_size_read(inode) && > > + PageDirty(page)) > > + return VM_FAULT_LOCKED; > > + unlock_page(page); > > + } > > What does the file size have to do with whether we can use the > initialised page or not? The file can be extended by other > data operations (like write()) while a page fault is in progress, > so I'm not sure how or why this check makes any sense. > > I also don't see anything btrfs specific here, so.... > First thanks for the review, I've gone through and addressed everything you mentioned, however this one is subtle. The problem is the vmf->vma->vm_file access. Once we drop the mmap_sem we can no longer safely go into vmf->vma, so I'd have to fix all the page_mkwrite()'s to not touch vma, and add a vmf->fpin instead to mess with. Plus I didn't want to miss some subtlety in other fs's page_mkwrite()'s and inavertedly break them. If I break btrfs I can fix it, I'm not as good with xfs. If you want this in the generic layer and not in the fs I can go back and add a vmf->fpin and vmf->mm, and then fix up all the mkwrite()'s to use that instead of vmf->vma. I think that's the only things we care about so it wouldn't be hard. Does that sound reasonable to you? Also the size thing was just a paranoid way to make sure everything had stayed exactly the same, but you're right, I'll just keep it consistent with all of our other "is this page ok" checks. Thanks, Josef