Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5408980ybe; Tue, 10 Sep 2019 03:16:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqwjJTs/CsJ18EZiBvCujk36S/riITIn2wdldsGCeNwsfnFaNbch0JYDQlOHHf1gzEbmykKR X-Received: by 2002:a17:906:1ed1:: with SMTP id m17mr6809075ejj.82.1568110601194; Tue, 10 Sep 2019 03:16:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568110601; cv=none; d=google.com; s=arc-20160816; b=MkWPThR5XgVh/GKo8GS8m8MOi7I/Kvgbhg4ZvJFgid4TLXavTnTIQvXx+WsYkE5Mwv Kd/go42FG6OVPi9YtRWpa7xuocqxrVczwZfIVrdCtUI94HgzLtdTYtynnrmbIpRBIWr7 fHXRR1HNN2ygZER0sa6wtYowxgZbbYdLUJg8qq+1zetvtJmCDxmMDgUJA4IQsJlF3Ttm 6U4PVFhyIVo1vP/yyVtmhBZlsTaRrxRvGzkD0bBnlpENF2E7GaUqj4Plt8pezvryAGoH xLY7nhgOq1vtmmZEMCyUiVofVcrOBtl3f+o/nPiu9//DhjL7dUaXI3Kg2sYcBw3qoNMy 3TQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=DPNwyU/i5Hrqk518iMHxH7f+QUjV0R1yOPd/BqO65Co=; b=yQ2PmOBd4WE8ifJE52V1YTlpMc5aQwDiWOLNSQG0gBeiUzVuRPhc+NC8hHNbOQNddh aWpTtJyahBQNOsrjN+mSyZqoH21C49TUHnxa62LqR3HjSF6PQUeUBDLd6zb+2cu1jLhG wLicPiNHlmVjIIqgeXDwQAPYsd9fD++NiGrIiwUcGIEnAy+uTotxpeX8+VG6A/F9t++K LcVY35NsncfJNmmPj4jezDYcWgfs6SPGosi1ee7UeD6Ns6bcv9ssSNE8QPXX9I2Vc4J4 q46JscU2rCzu6fTGXqRD9f8nowi2YHomOZBb9lxjsHneEYMeMR+oibsiZMJVhvXCE/ls dwBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x8si10836568edd.282.2019.09.10.03.16.17; Tue, 10 Sep 2019 03:16:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726827AbfIIWWY (ORCPT + 99 others); Mon, 9 Sep 2019 18:22:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59808 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726406AbfIIWWX (ORCPT ); Mon, 9 Sep 2019 18:22:23 -0400 Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4E351C058CBD for ; Mon, 9 Sep 2019 22:22:23 +0000 (UTC) Received: by mail-qk1-f198.google.com with SMTP id v143so18147174qka.21 for ; Mon, 09 Sep 2019 15:22:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DPNwyU/i5Hrqk518iMHxH7f+QUjV0R1yOPd/BqO65Co=; b=mUaSbZXYQTvrHZ85wg1JlbLPwvnQdY/ODgPDiArnCZ5x6WXO3RMfPGtiu6U9PU2GqD RnqChBxY60yn5YKIg5GxcAr4PSYsypPR2bcp5Y5tgin9vS/y/eOPc3g5TN3vRZ2/nCs7 /4A+aQ2Bc7U2rggf9fjZDHVHMojsa3MyEmalARNV0RM+hsuBgqZnxdIoyU1edK02bdoR i0Ww756+FSwTk60LWLWUV/mWoPim7suXyNakAMFtEeeBT1qeRZ5mrFpIZ6fZCB9UjNfw 4XSEqgBzxohIMYPb8YlA0BjBRCBoYs03zzpTY2Ix6sIE22FafYfP3MPslqnnIU/K1gHQ gSHQ== X-Gm-Message-State: APjAAAVXIeQ/WUD/mn9oHhk97HUdmEy8X4bhP6ccImxcm69daopZtqLg alDsjRkM8wVYKq9z6qsCER4QBfaOVOzKCfUfE1jSKJ+KwyWFjQmghojOjmHFxBaPtw5Y+4rnknm PEWQzvmxFvTB7znoXwxtA/zUPL3v4ITlhP0ixG0wu X-Received: by 2002:a37:a704:: with SMTP id q4mr8563256qke.385.1568067742238; Mon, 09 Sep 2019 15:22:22 -0700 (PDT) X-Received: by 2002:a37:a704:: with SMTP id q4mr8563216qke.385.1568067741764; Mon, 09 Sep 2019 15:22:21 -0700 (PDT) MIME-Version: 1.0 References: <87k1ahojri.fsf@suse.com> <20190909102834.16246-1-lhenriques@suse.com> <3f838e42a50575595c7310386cf698aca8f89607.camel@kernel.org> <87d0g9oh4r.fsf@suse.com> In-Reply-To: <87d0g9oh4r.fsf@suse.com> From: Gregory Farnum Date: Mon, 9 Sep 2019 15:22:10 -0700 Message-ID: Subject: Re: [PATCH v2] ceph: allow object copies across different filesystems in the same cluster To: Luis Henriques Cc: Jeff Layton , IlyaDryomov , Sage Weil , ceph-devel , linux-kernel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 9, 2019 at 4:15 AM Luis Henriques wrote: > > "Jeff Layton" writes: > > > On Mon, 2019-09-09 at 11:28 +0100, Luis Henriques wrote: > >> OSDs are able to perform object copies across different pools. Thus, > >> there's no need to prevent copy_file_range from doing remote copies if the > >> source and destination superblocks are different. Only return -EXDEV if > >> they have different fsid (the cluster ID). > >> > >> Signed-off-by: Luis Henriques > >> --- > >> fs/ceph/file.c | 18 ++++++++++++++---- > >> 1 file changed, 14 insertions(+), 4 deletions(-) > >> > >> Hi, > >> > >> Here's the patch changelog since initial submittion: > >> > >> - Dropped have_fsid checks on client structs > >> - Use %pU to print the fsid instead of raw hex strings (%*ph) > >> - Fixed 'To:' field in email so that this time the patch hits vger > >> > >> Cheers, > >> -- > >> Luis > >> > >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> index 685a03cc4b77..4a624a1dd0bb 100644 > >> --- a/fs/ceph/file.c > >> +++ b/fs/ceph/file.c > >> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > >> struct ceph_inode_info *src_ci = ceph_inode(src_inode); > >> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode); > >> struct ceph_cap_flush *prealloc_cf; > >> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode); > >> struct ceph_object_locator src_oloc, dst_oloc; > >> struct ceph_object_id src_oid, dst_oid; > >> loff_t endoff = 0, size; > >> @@ -1915,8 +1916,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > >> > >> if (src_inode == dst_inode) > >> return -EINVAL; > >> - if (src_inode->i_sb != dst_inode->i_sb) > >> - return -EXDEV; > >> + if (src_inode->i_sb != dst_inode->i_sb) { > >> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode); > >> + > >> + if (ceph_fsid_compare(&src_fsc->client->fsid, > >> + &dst_fsc->client->fsid)) { > >> + dout("Copying object across different clusters:"); > >> + dout(" src fsid: %pU dst fsid: %pU\n", > >> + &src_fsc->client->fsid, &dst_fsc->client->fsid); > >> + return -EXDEV; > >> + } > >> + } > > > > Just to be clear: what happens here if I mount two entirely separate > > clusters, and their OSDs don't have any access to one another? Will this > > fail at some later point with an error that we can catch so that we can > > fall back? > > This is exactly what this check prevents: if we have two CephFS from two > unrelated clusters mounted and we try to copy a file across them, the > operation will fail with -EXDEV[1] because the FSIDs for these two > ceph_fs_client will be different. OTOH, if these two filesystems are > within the same cluster (and thus with the same FSID), then the OSDs are > able to do 'copy-from' operations between them. > > I've tested all these scenarios and they seem to be handled correctly. > Now, I'm assuming that *all* OSDs within the same ceph cluster can > communicate between themselves; if this assumption is false, then this > patch is broken. But again, I'm not aware of any mechanism that > prevents 2 OSDs from communicating between them. Your assumption is correct: all OSDs in a Ceph cluster can communicate with each other. I'm not aware of any plans to change this. I spent a bit of time trying to figure out how this could break security models and things and didn't come up with anything, so I think functionally it's fine even though I find it a bit scary. Also, yes, cluster FSIDs are UUIDs so they shouldn't collide. -Greg > > [1] Actually, the files will still be copied because we'll fallback into > the default VFS generic_copy_file_range behaviour, which is to do > reads+writes operations. > > Cheers, > -- > Luis > > > > > > > >> if (ceph_snap(dst_inode) != CEPH_NOSNAP) > >> return -EROFS; > >> > >> @@ -1928,7 +1938,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > >> * efficient). > >> */ > >> > >> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM)) > >> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM)) > >> return -EOPNOTSUPP; > >> > >> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) || > >> @@ -2044,7 +2054,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, > >> dst_ci->i_vino.ino, dst_objnum); > >> /* Do an object remote copy */ > >> err = ceph_osdc_copy_from( > >> - &ceph_inode_to_client(src_inode)->client->osdc, > >> + &src_fsc->client->osdc, > >> src_ci->i_vino.snap, 0, > >> &src_oid, &src_oloc, > >> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |