Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp176035ybl; Mon, 2 Dec 2019 09:10:10 -0800 (PST) X-Google-Smtp-Source: APXvYqx6/KZ5zl2Qdj5DthJ+Qng4B3VULkKjmgarpkD864R2+BykANhQgHSJtT9EYe+WSe8v57Zi X-Received: by 2002:aa7:c49a:: with SMTP id m26mr15198272edq.264.1575306610231; Mon, 02 Dec 2019 09:10:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575306610; cv=none; d=google.com; s=arc-20160816; b=s2EXPWTMGr18VgjaaXK8oJ38sgSv8bwJTTCQIjDg1TKTvQVmYEbs4DTMIPzzFZqoZG lCabp5IHEm35ufXYg60HBpKFfFe16eaNSlpT7KnqRhkRxaVpz7bcf7EnZodqn0tNSejz HfGrCBLdjTlYJrMPtpDToSptFKVHUdKeic7DNnkZIovcCtyqScHJEZ/mk9+99EostNPl 9ZFipllT8by5QV08Ylyt8CEYq3XoMvaBg0ftItevgku69zNraXZgrRYRL41eISB0pLj9 XHFrLLG5gU6N2LvkK6hzK1qOSo+IMEd4cf6cDD/Hi0YC7czl3f+k3fbrF7HmbeRwdVuZ s0Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Bbe3gDdEP0eJTwUIZaW8+QYkjdYfh8aDqS2vODKLAQs=; b=oxo24f6fYAJYTOZUypabqkANX/twcBvwNsWXDBpmvbJlXfnh3esQojLGQ5ir9dUSd0 MBw0VHguyZOkwBBYsmlw4yj6kIq/jm79kS4klxR/O1KDtqP528GrDChZvzZbrZ0CWr6G ceb4Ozk3ICDQEaPRgJByHPThRaCsbEcjbwMUPeFH54+p2Sa3Z3/T6n25s8HLhzQ0v99c r3jGUMcukmhR9Q7ASmghl7frtLGNJvZfzZf3Hx34vbFkHbD8FhT/Aav7gtrnAYy+Ge7S 6ijSj4oacgcwPohNqgNXn1YwqSfVWDvFIrDHHoeisnyPgJqXDE2BnRsV61IvrvkAGxi4 1Erg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b=WumvQAi7; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b7si3930381edx.352.2019.12.02.09.09.44; Mon, 02 Dec 2019 09:10:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b=WumvQAi7; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727513AbfLBRJj (ORCPT + 99 others); Mon, 2 Dec 2019 12:09:39 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:58776 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727493AbfLBRJi (ORCPT ); Mon, 2 Dec 2019 12:09:38 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xB2GsS8d005525; Mon, 2 Dec 2019 17:09:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2019-08-05; bh=Bbe3gDdEP0eJTwUIZaW8+QYkjdYfh8aDqS2vODKLAQs=; b=WumvQAi7iRLd7p4QVlUUnPMBV25wpwF74Ofd4kECizaZcawWz9UijTOhyKAtOEAtCgwF 5YobxEedKwO24Wti5sZ60jITN46XydqBk+nMN3J7AYiIC1MnNVON30Rl/7Bb+TxbN907 vRiWs+dyf48i8BjFBkGt6RH00oMtLMZDwia9M3PYplEoXupOGuLbt2PbpOh7MG+bBSP4 p5oKXUjNKYI9k53FznkBHbIw4Pjp/B+7g+p0PkExlfghOE1NS2bWa520IUfgY3C2AZhq 5Aqi2TJdvVBBvhPRx16Qnp9vOjW/I7Tpu0KIZHev2SblrNWp6kcjUt7VaLdxDXTjY6i6 0w== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 2wkgcq1eyu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 02 Dec 2019 17:09:33 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xB2GsbXc156460; Mon, 2 Dec 2019 17:09:33 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 2wm1w2xhj9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 02 Dec 2019 17:09:32 +0000 Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id xB2H9Sq7024423; Mon, 2 Dec 2019 17:09:31 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 02 Dec 2019 09:09:27 -0800 Date: Mon, 2 Dec 2019 09:09:26 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: Trond Myklebust , "linux-fsdevel@vger.kernel.org" , "linux-nfs@vger.kernel.org" Subject: Re: Question about clone_range() metadata stability Message-ID: <20191202170926.GA7323@magnolia> References: <20191127202136.GV6211@magnolia> <20191201210519.GB2418@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191201210519.GB2418@dread.disaster.area> User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9459 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912020145 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9459 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912020145 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Dec 02, 2019 at 08:05:19AM +1100, Dave Chinner wrote: > On Wed, Nov 27, 2019 at 12:21:36PM -0800, Darrick J. Wong wrote: > > On Wed, Nov 27, 2019 at 06:38:46PM +0000, Trond Myklebust wrote: > > > Hi all > > > > > > A quick question about clone_range() and guarantees around metadata > > > stability. > > > > > > Are users required to call fsync/fsync_range() after calling > > > clone_range() in order to guarantee that the cloned range metadata is > > > persisted? > > > > Yes. > > > > > I'm assuming that it is required in order to guarantee that > > > data is persisted. > > > > Data and metadata. XFS and ocfs2's reflink implementations will flush > > the page cache before starting the remap, but they both require fsync to > > force the log/journal to disk. > > So we need to call xfs_fs_nfs_commit_metadata() to get that done > post vfs_clone_file_range() completion on the server side, yes? That sounds like a much better/less hastily researched answer! :) > > > > > (AFAICT the same reasoning applies to btrfs, but don't trust my word for > > it.) > > > > > I'm asking because knfsd currently just does a call to > > > vfs_clone_file_range() when parsing a NFSv4.2 CLONE operation. It does > > > not call fsync()/fsync_range() on the destination file, and since the > > > NFSv4.2 protocol does not require you to perform any other operation in > > > order to persist data/metadata, I'm worried that we may be corrupting > > > the cloned file if the NFS server crashes at the wrong moment after the > > > client has been told the clone completed. > > Yup, that's exactly what server side calls to commit_metadata() are > supposed to address. > > I suspect to be correct, this might require commit_metadata() to be > called on both the source and destination inodes, as both of them > may have modified metadata as a result of the clone operation. For > XFS one of them will be a no-op, Hmm. If xfs had to set its reflink flag on the source inode then we want to ->commit_metadata the source inode to push the log forward far enough to record the metadata change. That said, we set the reflink flag on both inodes before we remap anything, so chances are that ->commit_metadata on the dest inode will be enough to push the log forward. I suspect that from NFS' point of view it probably ought to ->commit_metadata both inodes to insulate itself from fs-specific behaviors and avoid weird crash dataloss bugs. Someday, someone will design a filesystem with per-inode logs /and/ hook it up to NFS. > but for other filesystems that > don't implement ->commit_metadata, we'll need to call > sync_inode_metadata() on both inodes... --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com