Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp2571540ybx; Fri, 8 Nov 2019 06:17:30 -0800 (PST) X-Google-Smtp-Source: APXvYqwjGte4GKWCce41VRclSN+o7N+dibgl3KPBfr8SHSnuaPRp1cZ9u93p8dFl/QM2D8vn+Mgo X-Received: by 2002:a50:9b43:: with SMTP id a3mr10488542edj.73.1573222649660; Fri, 08 Nov 2019 06:17:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573222649; cv=none; d=google.com; s=arc-20160816; b=moiqDiUyL7s6RHVdPcetAjrsise0WxlMX0GviEcVM0wUJsW+2IEZzBFyuD/TsDh2Hm 8M49OvGGnhCbqKw4YFWiNeTN1ouRgMhstJyzMkSjFKywFNYMGdaz2IZNBRnjNDKK+o6x WIJW+zWqF58wSSR54aTzPTeBLuhL2w4+t1bCPlFxcu4C+EEksImaHZxjXB3w7wOdC8U3 ePTHfBSdH/Y9IbAhcYBI3mrXbj768h9l+/PqIV0sklq6DXLg/d6YYJjUK43UM0ySXj+e 48D2vVzHF+Ilauo1bOE2ZzvGps86Uxu5mqeYkjCNrUQWPiMgzlwwD8PSyt2TKgdKbJW0 IDXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=hZUr+o+U7VBEv51mkKMbWHY5YT7PHJmivKAf4oeWTUo=; b=DaIMxI4ZF2ZJQkwfl07VoJjMwhCpqB+0Wp/nYZzJuqaxC6SAfW2kOxISy9ZMncIbrx l0Vdf896r3M48y9RDz44hKkqqdVVfyWf0KZpTRyZl969kOWbdEwDcgqiImKM3SzSVash gnDqbanaZ0RL+cQIeRGy9PPemc377MABR6A1k7uyPyiUhUqt89s9aHrvKks/BnJ4GUoo lOjQ936XWWa6g5jBMwrkmmHT2zfqA5XLXQJSakDoE7TRI12+cEgBsBEXqJTRE/WfDZxP 0JtyiHa1sQIeN8rboh6dqREfyjLaZZS65fnh2mWi+SPtWrTEkfSegABjm+PFgAj0+4l4 0FGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w27si4315431eda.296.2019.11.08.06.17.06; Fri, 08 Nov 2019 06:17:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728370AbfKHOQB (ORCPT + 99 others); Fri, 8 Nov 2019 09:16:01 -0500 Received: from mx2.suse.de ([195.135.220.15]:58058 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726294AbfKHOQA (ORCPT ); Fri, 8 Nov 2019 09:16:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0696CB391; Fri, 8 Nov 2019 14:15:58 +0000 (UTC) From: Luis Henriques To: Jeff Layton , Ilya Dryomov , Sage Weil , "Yan, Zheng" Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, Luis Henriques Subject: [RFC PATCH 1/2] ceph: add support for sending truncate_{seq,size} in 'copy-from' Op Date: Fri, 8 Nov 2019 14:15:54 +0000 Message-Id: <20191108141555.31176-2-lhenriques@suse.com> In-Reply-To: <20191108141555.31176-1-lhenriques@suse.com> References: <20191108141555.31176-1-lhenriques@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org By default, doing an object copy in Ceph will result in not only the data being copied but also the truncate_seq and truncate_size values. This may make sense in generic RADOS object copies, but for the specific case of performing a file copy will result in data corruption in the destination file. In order to fix this, the 'copy-from' operation has been modified so that it could receive the two extra parameters for the destination object truncate_seq and truncate_size. This patch adds support for these extra parameters to the kernel client. Unfortunately, this operation modification is available in Ceph Octopus only, so it is necessary to ensure that the OSD doing the copy does indeed support this feature. Link: https://tracker.ceph.com/issues/37378 Signed-off-by: Luis Henriques --- fs/ceph/file.c | 4 +++- include/linux/ceph/ceph_features.h | 6 ++++- include/linux/ceph/osd_client.h | 1 + include/linux/ceph/rados.h | 1 + net/ceph/osd_client.c | 37 +++++++++++++++++++++++++++++- 5 files changed, 46 insertions(+), 3 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index d277f71abe0b..e21a8eaabeb1 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2075,7 +2075,9 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, CEPH_OSD_OP_FLAG_FADVISE_NOCACHE, &dst_oid, &dst_oloc, CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL | - CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0); + CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, + dst_ci->i_truncate_seq, dst_ci->i_truncate_size, + CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ); if (err) { dout("ceph_osdc_copy_from returned %d\n", err); if (!ret) diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h index 39e6f4c57580..232257f6b60c 100644 --- a/include/linux/ceph/ceph_features.h +++ b/include/linux/ceph/ceph_features.h @@ -9,6 +9,7 @@ */ #define CEPH_FEATURE_INCARNATION_1 (0ull) #define CEPH_FEATURE_INCARNATION_2 (1ull<<57) // CEPH_FEATURE_SERVER_JEWEL +#define CEPH_FEATURE_INCARNATION_3 ((1ull<<57)|(1ull<<28)) // SERVER_MIMIC #define DEFINE_CEPH_FEATURE(bit, incarnation, name) \ static const uint64_t CEPH_FEATURE_##name = (1ULL<client->monc); } +/* + * This function will check, for each OSD operation in the request, if the + * required support features are available in the connection. + */ +static bool check_con_features(struct ceph_connection *con, + struct ceph_osd_request *req) +{ + int i; + + for (i = 0; i < req->r_num_ops; i++) { + switch (req->r_ops[i].op) { + case CEPH_OSD_OP_COPY_FROM: + /* + * 'copy-from' implementation had a bug in the OSDs + * before Octopus release where file data would get + * corructed when truncated + */ + if (!CEPH_HAVE_FEATURE(con->peer_features, + SERVER_OCTOPUS)) + return false; + break; + } + } + return true; +} + static void complete_request(struct ceph_osd_request *req, int err); static void send_map_check(struct ceph_osd_request *req); @@ -2336,6 +2362,10 @@ static void __submit_request(struct ceph_osd_request *req, bool wrlocked) } mutex_lock(&osd->lock); + if (!check_con_features(&osd->o_con, req)) { + err = -EOPNOTSUPP; + need_send = false; + } /* * Assign the tid atomically with send_request() to protect * multiple writes to the same object from racing with each @@ -5315,6 +5345,7 @@ static int osd_req_op_copy_from_init(struct ceph_osd_request *req, struct ceph_object_locator *src_oloc, u32 src_fadvise_flags, u32 dst_fadvise_flags, + u32 truncate_seq, u64 truncate_size, u8 copy_from_flags) { struct ceph_osd_req_op *op; @@ -5335,6 +5366,8 @@ static int osd_req_op_copy_from_init(struct ceph_osd_request *req, end = p + PAGE_SIZE; ceph_encode_string(&p, end, src_oid->name, src_oid->name_len); encode_oloc(&p, end, src_oloc); + ceph_encode_32(&p, truncate_seq); + ceph_encode_64(&p, truncate_size); op->indata_len = PAGE_SIZE - (end - p); ceph_osd_data_pages_init(&op->copy_from.osd_data, pages, @@ -5350,6 +5383,7 @@ int ceph_osdc_copy_from(struct ceph_osd_client *osdc, struct ceph_object_id *dst_oid, struct ceph_object_locator *dst_oloc, u32 dst_fadvise_flags, + u32 truncate_seq, u64 truncate_size, u8 copy_from_flags) { struct ceph_osd_request *req; @@ -5366,7 +5400,8 @@ int ceph_osdc_copy_from(struct ceph_osd_client *osdc, ret = osd_req_op_copy_from_init(req, src_snapid, src_version, src_oid, src_oloc, src_fadvise_flags, - dst_fadvise_flags, copy_from_flags); + dst_fadvise_flags, truncate_seq, + truncate_size, copy_from_flags); if (ret) goto out;