Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5558861ybe; Tue, 10 Sep 2019 05:45:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwV5exmuja1LZQ67hMOehgcn/BGhdSFXW5LGZY2b/sOeEgUEZsnI5P2yMVPbJFoNcjutA5A X-Received: by 2002:a17:906:1944:: with SMTP id b4mr24649815eje.44.1568119511570; Tue, 10 Sep 2019 05:45:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568119511; cv=none; d=google.com; s=arc-20160816; b=D1/TSYrwX3V94bXK3tavCyIDYc6kjUM4Rr2nhf9z8vVtacl7zB2DzeIijv6pvbTLwr HyDd4KL01Nbcvy2POXT6ZoeUUGzwLofELxhZcY4HCUUDppXmXuUeHDjhYSjw+CY5ECNu CtU+l26AAyjGAdw9rAl+ECrSNmLNnLMIz/QmGJ/O2oZvKkOub8Xp2KlL3AopSQrRstrE 9aTxWSgg6Bk+WpcTpGlzatkYtn6CNGmgBYeKlplKBoqqHWmh6wHJ+l2v7LQr1MFKGo4e 5Yqwq0yYZp8FXRbvl72yrsCRbdvAdhxaNwnocLuV8GGcEmbUNfRGNrataTkQvMv4E6sj yUOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:in-reply-to:date :references:subject:cc:to:from; bh=dJPxymh5B8j4KyER5+yp++JM8vvorqrDob7tntkqz1Q=; b=uf6PfOpZHkhNiEcUR3iWylOOOgwd37atsLM3yyd11exypMNfWeWf4FSqKejMqenU2N c+6vYQhfHoIzoAhmnVP3iCW1id+rWHp++JhwnKJcWosZotWTEkTg+7msQShQ2rMTZx3z YL1LwvGCX7QQRyoQLlXh0p4vadBQkwH+FiIHddNPNzuI7pb1R/jtoHGaqEF6kjSM0eUU SwpwaTjfT3mKlQRxmuXWue9wASxhREjTvJf3gm01Gka6ClVObY3s5hOhO9CggHSYSujj jVOeJsl9LO5WyEIktZ/poU9QGENi1OWyHzOMhQdlgh5vFGy9+gIg3tuFflJbiP1emdnL j5WA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p39si10675648eda.416.2019.09.10.05.44.46; Tue, 10 Sep 2019 05:45:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729595AbfIJKpq (ORCPT + 99 others); Tue, 10 Sep 2019 06:45:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:39996 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728238AbfIJKpp (ORCPT ); Tue, 10 Sep 2019 06:45:45 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C1A64B684; Tue, 10 Sep 2019 10:45:42 +0000 (UTC) From: Luis Henriques To: Gregory Farnum Cc: IlyaDryomov , Jeff Layton , Sage Weil , ceph-devel , linux-kernel Subject: Re: [PATCH v2] ceph: allow object copies across different filesystems in the same cluster References: <87k1ahojri.fsf@suse.com> <20190909102834.16246-1-lhenriques@suse.com> <3f838e42a50575595c7310386cf698aca8f89607.camel@kernel.org> <87d0g9oh4r.fsf@suse.com> Date: Tue, 10 Sep 2019 11:45:41 +0100 In-Reply-To: (Gregory Farnum's message of "Mon, 9 Sep 2019 15:22:10 -0700") Message-ID: <871rwoo2ei.fsf@suse.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Gregory Farnum writes: > On Mon, Sep 9, 2019 at 4:15 AM Luis Henriques wrote: >> >> "Jeff Layton" writes: >> >> > On Mon, 2019-09-09 at 11:28 +0100, Luis Henriques wrote: >> >> OSDs are able to perform object copies across different pools. Thus, >> >> there's no need to prevent copy_file_range from doing remote copies if the >> >> source and destination superblocks are different. Only return -EXDEV if >> >> they have different fsid (the cluster ID). >> >> >> >> Signed-off-by: Luis Henriques >> >> --- >> >> fs/ceph/file.c | 18 ++++++++++++++---- >> >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> >> >> Hi, >> >> >> >> Here's the patch changelog since initial submittion: >> >> >> >> - Dropped have_fsid checks on client structs >> >> - Use %pU to print the fsid instead of raw hex strings (%*ph) >> >> - Fixed 'To:' field in email so that this time the patch hits vger >> >> >> >> Cheers, >> >> -- >> >> Luis >> >> >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> >> index 685a03cc4b77..4a624a1dd0bb 100644 >> >> --- a/fs/ceph/file.c >> >> +++ b/fs/ceph/file.c >> >> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> struct ceph_inode_info *src_ci = ceph_inode(src_inode); >> >> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode); >> >> struct ceph_cap_flush *prealloc_cf; >> >> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode); >> >> struct ceph_object_locator src_oloc, dst_oloc; >> >> struct ceph_object_id src_oid, dst_oid; >> >> loff_t endoff = 0, size; >> >> @@ -1915,8 +1916,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> >> >> if (src_inode == dst_inode) >> >> return -EINVAL; >> >> - if (src_inode->i_sb != dst_inode->i_sb) >> >> - return -EXDEV; >> >> + if (src_inode->i_sb != dst_inode->i_sb) { >> >> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode); >> >> + >> >> + if (ceph_fsid_compare(&src_fsc->client->fsid, >> >> + &dst_fsc->client->fsid)) { >> >> + dout("Copying object across different clusters:"); >> >> + dout(" src fsid: %pU dst fsid: %pU\n", >> >> + &src_fsc->client->fsid, &dst_fsc->client->fsid); >> >> + return -EXDEV; >> >> + } >> >> + } >> > >> > Just to be clear: what happens here if I mount two entirely separate >> > clusters, and their OSDs don't have any access to one another? Will this >> > fail at some later point with an error that we can catch so that we can >> > fall back? >> >> This is exactly what this check prevents: if we have two CephFS from two >> unrelated clusters mounted and we try to copy a file across them, the >> operation will fail with -EXDEV[1] because the FSIDs for these two >> ceph_fs_client will be different. OTOH, if these two filesystems are >> within the same cluster (and thus with the same FSID), then the OSDs are >> able to do 'copy-from' operations between them. >> >> I've tested all these scenarios and they seem to be handled correctly. >> Now, I'm assuming that *all* OSDs within the same ceph cluster can >> communicate between themselves; if this assumption is false, then this >> patch is broken. But again, I'm not aware of any mechanism that >> prevents 2 OSDs from communicating between them. > > Your assumption is correct: all OSDs in a Ceph cluster can communicate > with each other. I'm not aware of any plans to change this. > > I spent a bit of time trying to figure out how this could break > security models and things and didn't come up with anything, so I > think functionally it's fine even though I find it a bit scary. > > Also, yes, cluster FSIDs are UUIDs so they shouldn't collide. Awesome, thanks for clarifying these points! Cheers, -- Luis > -Greg > >> >> [1] Actually, the files will still be copied because we'll fallback into >> the default VFS generic_copy_file_range behaviour, which is to do >> reads+writes operations. >> >> Cheers, >> -- >> Luis >> >> >> > >> > >> >> if (ceph_snap(dst_inode) != CEPH_NOSNAP) >> >> return -EROFS; >> >> >> >> @@ -1928,7 +1938,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> * efficient). >> >> */ >> >> >> >> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM)) >> >> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM)) >> >> return -EOPNOTSUPP; >> >> >> >> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) || >> >> @@ -2044,7 +2054,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> dst_ci->i_vino.ino, dst_objnum); >> >> /* Do an object remote copy */ >> >> err = ceph_osdc_copy_from( >> >> - &ceph_inode_to_client(src_inode)->client->osdc, >> >> + &src_fsc->client->osdc, >> >> src_ci->i_vino.snap, 0, >> >> &src_oid, &src_oloc, >> >> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL | >