Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3887784imu; Mon, 10 Dec 2018 09:20:56 -0800 (PST) X-Google-Smtp-Source: AFSGD/VaQbxU4iF6bqi8s3lVfDN6wtm92NHsORYjPBqkVsrFKbhNM9XRlcP6PTmtSl1/UXNswlNd X-Received: by 2002:a62:220d:: with SMTP id i13mr13031991pfi.162.1544462456824; Mon, 10 Dec 2018 09:20:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544462456; cv=none; d=google.com; s=arc-20160816; b=UbBE/Q8ViBP61oZsiCUC+sxyqV4RCZGCN74V/IJiLWdREbaRhLT3/ymLmtTqQpK7AB bSvzNBRdZHCgMpE0Qz9Gly4B5zEhv3QE7UN7Tnhn7zjFtN1rPYCOBrfg/DagOB4mc4AZ MlfUOD3cX0VLn3lJAuLNBuPKd1MVZq9zjWReKPfFi4aLgRaFj/cinCrp/+umHRvRHpb7 U7tNZvXCKKZgMrpw6LfFgCLjn3WriMSnSkJo9tK/XWRcSfBSBbWxkxRxAzoGgDx3lUXQ CVeNnV1jdrmGvq24iBnry4sG5GWsl3mWePnog3L+OtZA6vvUURezN2xaQwBNxMCb7l2o 4CUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=CJ7nsfyzpbknrmC9BmYxUWeW2PtYV41qI7ZyrTnnMr0=; b=HsWRnjx+VnybK2DeVN40wLxLYA7AjXQBJVCQmfjQKlkxpUnjff5KmNHSMZCZNg1Ium AI6VwPfRQB2LH0xBaTlXj2lUHuBMp79XQmAU7C78eFqaVeX3QFx3kafNFMBITu22pT0h kWlaHXvZ5+8YLFXNdXuj/hsxlo2NdtF3QNGns6rQV7glXg4dKrPn6N0GX3dCs83cFk0g VlidL7c3hcpWLf1yOMAIOmwYVcoElaHqWs9l/MROnqJ73H0RZ4PqNAbOsZMHV4UpPIAe bl7qCc9FSkMxVNAwrkfxUYQrHYAyphMc9pVn+u5QFCrBf/d7S4Q075TaOzwAou+fqeIf BNmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x23si9971205pgk.272.2018.12.10.09.20.41; Mon, 10 Dec 2018 09:20:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728788AbeLJRPL (ORCPT + 99 others); Mon, 10 Dec 2018 12:15:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58538 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728520AbeLJRNm (ORCPT ); Mon, 10 Dec 2018 12:13:42 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A4B7B3001FCD; Mon, 10 Dec 2018 17:13:41 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id B87BF608E6; Mon, 10 Dec 2018 17:13:35 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 88A6122426C; Mon, 10 Dec 2018 12:13:30 -0500 (EST) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, sweil@redhat.com, swhiteho@redhat.com Subject: [PATCH 33/52] fuse, dax: Take ->i_mmap_sem lock during dax page fault Date: Mon, 10 Dec 2018 12:12:59 -0500 Message-Id: <20181210171318.16998-34-vgoyal@redhat.com> In-Reply-To: <20181210171318.16998-1-vgoyal@redhat.com> References: <20181210171318.16998-1-vgoyal@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Mon, 10 Dec 2018 17:13:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We need some kind of locking mechanism here. Normal file systems like ext4 and xfs seems to take their own semaphore to protect agains truncate while fault is going on. We have additional requirement to protect against fuse dax memory range reclaim. When a range has been selected for reclaim, we need to make sure no other read/write/fault can try to access that memory range while reclaim is in progress. Once reclaim is complete, lock will be released and read/write/fault will trigger allocation of fresh dax range. Taking inode_lock() is not an option in fault path as lockdep complains about circular dependencies. So define a new fuse_inode->i_mmap_sem. Signed-off-by: Vivek Goyal --- fs/fuse/dir.c | 2 ++ fs/fuse/file.c | 17 +++++++++++++---- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index b7e6e421f6bb..8aa4ff82ea7a 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1553,8 +1553,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + down_write(&fi->i_mmap_sem); truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(inode->i_mapping); + up_write(&fi->i_mmap_sem); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index eb12776f5ff6..73068289f62e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2523,13 +2523,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, if (write) sb_start_pagefault(sb); - /* TODO inode semaphore to protect faults vs truncate */ - + /* + * We need to serialize against not only truncate but also against + * fuse dax memory range reclaim. While a range is being reclaimed, + * we do not want any read/write/mmap to make progress and try + * to populate page cache or access memory we are trying to free. + */ + down_read(&get_fuse_inode(inode)->i_mmap_sem); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + up_read(&get_fuse_inode(inode)->i_mmap_sem); + if (write) sb_end_pagefault(sb); @@ -3476,9 +3483,11 @@ static long __fuse_file_fallocate(struct file *file, int mode, file_update_time(file); } - if (mode & FALLOC_FL_PUNCH_HOLE) + if (mode & FALLOC_FL_PUNCH_HOLE) { + down_write(&fi->i_mmap_sem); truncate_pagecache_range(inode, offset, offset + length - 1); - + down_write(&fi->i_mmap_sem); + } fuse_invalidate_attr(inode); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e32b0059493b..280f717deb57 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -211,6 +211,13 @@ struct fuse_inode { */ struct rw_semaphore i_dmap_sem; + /** + * Can't take inode lock in fault path (leads to circular dependency). + * So take this in fuse dax fault path to make sure truncate and + * punch hole etc. can't make progress in parallel. + */ + struct rw_semaphore i_mmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 234b9c0c80ab..59fc5a7a18fc 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->state = 0; fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_dmap_sem); fi->forget = fuse_alloc_forget(); if (!fi->forget) { -- 2.13.6