Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3139493pxj; Mon, 10 May 2021 20:10:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxEpd2p4GUZhHLrdyTGZLxxhWiKamPLPouWhUV68OwT3xc8v8b8fVa5iM6WWVECSxidLbef X-Received: by 2002:a05:6e02:118b:: with SMTP id y11mr24247373ili.163.1620702644541; Mon, 10 May 2021 20:10:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620702644; cv=none; d=google.com; s=arc-20160816; b=urId54+DYtGete1VrYWTGHxePIkVlbjtCb2oxnSpWNTqmZhPZRZia1V0l71uNgpGjL vhj7gJcfbsIop+9qvcwcIlDc+A0JIXqkTSdyXWRUqcY4wZ96JtTxvYuR74qqnbeYUsOg s789AkB7A6VjQC4qUX0KMiZGiZ0FlqS1ruLzQmXDWXU0M5O+UcB/qVUYci3x3BjDjH/3 F89RC5lqhgkhA0KWfXYsHYsSH5AC5gH1kjWDIHQrJD+DhuPg3oD8FCBkgF1PNumGKF6n zmWs6Up76IA1Yjnwj/U9lEm/HD4ulaQSU1uP/xMDcfER6HRpgLuUjsQ0OTK5qlFYUhL1 xSBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-hdrordr; bh=I5Me7px3DteeMDfuZ42heuN4IIjE6JBwtOKIfiwrKso=; b=cnfftA9xMs36oDs8TOfjyJlqUXp9KISZesvk4M8GlNnjlHMNTtRdWYyRTsnzSzDsxu DHOv+Yh8ePNKOugpF2POd6e9hkdFHcuePnZgec+zt/ZCNKtKC7K7Ibx0IvUCWajnLxNK Wb3pUoKteSy3eXa+ZGQezIwfliVsxC5pKaT/J2PC5hdNwTQ+OP00w2WjH2KRI4vLNCo0 xyZDYaN5tJ8rd7QUg8HGk3r3E248B7m12BN6eILgyQcn+WJV2T+zFmOGSxtAOygIqmlr t4l0ZOPvp1jz1xm2ATV367agaYMlzpFsmOutLPF81KAGnP56GgCLR85Iuggfsjk3QqjA 9Ojg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q16si18899851iow.52.2021.05.10.20.10.32; Mon, 10 May 2021 20:10:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230118AbhEKDLB (ORCPT + 99 others); Mon, 10 May 2021 23:11:01 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:41042 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230073AbhEKDLA (ORCPT ); Mon, 10 May 2021 23:11:00 -0400 IronPort-HdrOrdr: =?us-ascii?q?A9a23=3Aoe65pq4rK8lAY93lRwPXwCDXdLJyesId70hD?= =?us-ascii?q?6qhwISY6TiX+rbHLoB17726StN9/YhEdcLy7VJVoIkmskKKdg7NhXotKNTOO0A?= =?us-ascii?q?DDQb2KhrGC/9SPIULDH5ZmpMVdmrZFeabNJGk/ncDn+xO5Dtpl5NGG9ZqjjeDY?= =?us-ascii?q?w2wFd3ASV4hQqxd+Fh2AElB7AC1PBZ8CHpKa4cZd4xW6f3B/VLXCOlA1G/jEu8?= =?us-ascii?q?bQlI/rJToPBxsc4gGIij+yrJ7WeiLouCsjbw=3D=3D?= X-IronPort-AV: E=Sophos;i="5.82,290,1613404800"; d="scan'208";a="108110522" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 11 May 2021 11:09:53 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 04DF94D0BA67; Tue, 11 May 2021 11:09:48 +0800 (CST) Received: from G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.80) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 11 May 2021 11:09:36 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 11 May 2021 11:09:36 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , Subject: [PATCH v5 1/7] fsdax: Introduce dax_iomap_cow_copy() Date: Tue, 11 May 2021 11:09:27 +0800 Message-ID: <20210511030933.3080921-2-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210511030933.3080921-1-ruansy.fnst@fujitsu.com> References: <20210511030933.3080921-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-yoursite-MailScanner-ID: 04DF94D0BA67.A4A19 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com X-Spam-Status: No Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In the case where the iomap is a write operation and iomap is not equal to srcmap after iomap_begin, we consider it is a CoW operation. The destance extent which iomap indicated is new allocated extent. So, it is needed to copy the data from srcmap to new allocated extent. In theory, it is better to copy the head and tail ranges which is outside of the non-aligned area instead of copying the whole aligned range. But in dax page fault, it will always be an aligned range. So, we have to copy the whole range in this case. Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig --- fs/dax.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 81 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index bf3fc8242e6c..f0249bb1d46a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1038,6 +1038,61 @@ static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size, return rc; } +/** + * dax_iomap_cow_copy(): Copy the data from source to destination before write. + * @pos: address to do copy from. + * @length: size of copy operation. + * @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE) + * @srcmap: iomap srcmap + * @daddr: destination address to copy to. + * + * This can be called from two places. Either during DAX write fault, to copy + * the length size data to daddr. Or, while doing normal DAX write operation, + * dax_iomap_actor() might call this to do the copy of either start or end + * unaligned address. In this case the rest of the copy of aligned ranges is + * taken care by dax_iomap_actor() itself. + * Also, note DAX fault will always result in aligned pos and pos + length. + */ +static int dax_iomap_cow_copy(loff_t pos, loff_t length, size_t align_size, + struct iomap *srcmap, void *daddr) +{ + loff_t head_off = pos & (align_size - 1); + size_t size = ALIGN(head_off + length, align_size); + loff_t end = pos + length; + loff_t pg_end = round_up(end, align_size); + bool copy_all = head_off == 0 && end == pg_end; + void *saddr = 0; + int ret = 0; + + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); + if (ret) + return ret; + + if (copy_all) { + ret = copy_mc_to_kernel(daddr, saddr, length); + return ret ? -EIO : 0; + } + + /* Copy the head part of the range. Note: we pass offset as length. */ + if (head_off) { + ret = copy_mc_to_kernel(daddr, saddr, head_off); + if (ret) + return -EIO; + } + + /* Copy the tail part of the range */ + if (end < pg_end) { + loff_t tail_off = head_off + length; + loff_t tail_len = pg_end - end; + + ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, + tail_len); + if (ret) + return -EIO; + } + return 0; +} + /* * The user has performed a load from a hole in the file. Allocating a new * page in the file would cause excessive storage usage for workloads with @@ -1167,11 +1222,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, struct dax_device *dax_dev = iomap->dax_dev; struct iov_iter *iter = data; loff_t end = pos + length, done = 0; + bool write = iov_iter_rw(iter) == WRITE; ssize_t ret = 0; size_t xfer; int id; - if (iov_iter_rw(iter) == READ) { + if (!write) { end = min(end, i_size_read(inode)); if (pos >= end) return 0; @@ -1180,7 +1236,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return iov_iter_zero(min(length, end - pos), iter); } - if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) + /* + * In DAX mode, we allow either pure overwrites of written extents, or + * writes to unwritten extents as part of a copy-on-write operation. + */ + if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED && + !(iomap->flags & IOMAP_F_SHARED))) return -EIO; /* @@ -1219,6 +1280,13 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, break; } + if (write && srcmap->addr != iomap->addr) { + ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, + kaddr); + if (ret) + break; + } + map_len = PFN_PHYS(map_len); kaddr += offset; map_len -= offset; @@ -1230,7 +1298,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, * validated via access_ok() in either vfs_read() or * vfs_write(), depending on which operation we are doing. */ - if (iov_iter_rw(iter) == WRITE) + if (write) xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr, map_len, iter); else @@ -1382,6 +1450,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, unsigned long entry_flags = pmd ? DAX_PMD : 0; int err = 0; pfn_t pfn; + void *kaddr; /* if we are reading UNWRITTEN and HOLE, return a hole. */ if (!write && @@ -1392,18 +1461,25 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, return dax_pmd_load_hole(xas, vmf, iomap, entry); } - if (iomap->type != IOMAP_MAPPED) { + if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) { WARN_ON_ONCE(1); return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS; } - err = dax_iomap_direct_access(iomap, pos, size, NULL, &pfn); + err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn); if (err) return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags, write && !sync); + if (write && + srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) { + err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); + if (err) + return dax_fault_return(err); + } + if (sync) return dax_fault_synchronous_pfnp(pfnp, pfn); -- 2.31.1