Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp2498524rwb; Thu, 27 Jul 2023 08:02:58 -0700 (PDT) X-Google-Smtp-Source: APBJJlF0S1j5RLEteCy1AMA55dnx5oNllOn9ruvlLJM0233UGSMAJmZpEj9RV66cbF1Fq0h5Hjxr X-Received: by 2002:a17:902:e74c:b0:1b9:d335:1b7d with SMTP id p12-20020a170902e74c00b001b9d3351b7dmr3057104plf.6.1690470178208; Thu, 27 Jul 2023 08:02:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690470178; cv=none; d=google.com; s=arc-20160816; b=ziPAHGddas2jlgEs7nUk3ffsnLtsriM7140KR7jaTidstzLjRu5qMh2LKoG8Sh5eny noEb2RLJStCa9E5MmIQQLspt6TYn+2MjhwJAHkiMlGlXIhBmuUIGYaDmhGm+hikCKfwg ZyYpFj6Dii5w8s1yktHE/MHvk/+pIAN+MAStv2gTDI3fjXKyEmHWsNSaFUhEsCvqaxbR WICovguc2pnaxQKtCBT/F8bzvLy3ftqmSvfFu2/F82bwQfbZ7AAfE9TPD9DHRBw9c8mC j3GwXYFHjOfKHpHyund1Er4Ir28DrrhnnbWe2lU6krO/bFyrsMoG4PYV6t3qLU5J4Sg2 O2Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=B2jfSBEcIc1QOvPbWXl/BDtxCrJTAeHkGjc7wAubIM0=; fh=cFSdjNTYEfu21sVIRboKXUjAGEJIOBGv/RihFpUZaYo=; b=wtaz2Fa7014wt8Jt+4LXJAVWodkXcEUuddpIJcz8Qu0dHQtITlM3iu4zHjLSx3GCsS 1W83Yt7ktUoVmQUDbXnwRpRhOtUA2q7BNjVfJIlFeXX5I+KqNA9kyVtpaeJPU4ILfq8R sT4aBMoSt5AtVhcUdBcHL07/Qtduyigv1LOFhnFm2mHW/PMm0lOHPp3lMoH9oXgY5fcR kDGkbxAZXKkkpf137a+KUKuLLQC8bySFttRanssxLUggqqKjUXlpwgqrBmlaA/DLnA66 dDctkfqEwLlflv9R8AJTflz5YOAazqs4olukviVqyu9vBS6Q3OiuSNAlLia1cef00kY4 P1EA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020a6566c5000000b00563d74b6350si1349528pgw.737.2023.07.27.08.02.45; Thu, 27 Jul 2023 08:02:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234070AbjG0OrT convert rfc822-to-8bit (ORCPT + 99 others); Thu, 27 Jul 2023 10:47:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234175AbjG0OrE (ORCPT ); Thu, 27 Jul 2023 10:47:04 -0400 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 403844200 for ; Thu, 27 Jul 2023 07:46:32 -0700 (PDT) Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-52227884855so1458803a12.1 for ; Thu, 27 Jul 2023 07:46:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690469190; x=1691073990; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ljpBQH2qNQhiX/CKG2FDM7D9f+JnAt6COG592tmGJdU=; b=cwI+LZL+kDR6ekMO9apv4T6x9zjK6Rz9I9u3aq/e7TO8KaHVSnfyC20InxOUosUCf3 JAolWeFPFAjWVRnoOHzhjURV9F2E3eytrlzow3RbBqf7vLLw4aSpgf2Zopu6bw+Utv65 Eee05QmOlqcr/mTJ17Zvxtzv6/u2jCQlJlaOry2+wCG1NNuDhYFDdozomcKgVhJ8dBKH 8ZH52POHjK813yFu4fQbHCvvnTV9rJGJELLEfbEFqSc19YCK+V0n2akPR8WhHa3UMQQj gkepO+BJWS/dhPPqGHjl433v7oS4z5U5TIQITX8kkI8wVPzQpFyoT0unj1929Yg2rpnR k2DQ== X-Gm-Message-State: ABy/qLZkN+Vi+xGjRtYbq/yGt0fIGAwMYYlqVihTmcdk1nTpUUYhqSCE NoeceFfmVc4u4K1E2CyqEg2/Bf5f0DytwfWXIu+xc5tL X-Received: by 2002:a05:6402:31e5:b0:522:15c4:b5ba with SMTP id dy5-20020a05640231e500b0052215c4b5bamr1935569edb.3.1690469190528; Thu, 27 Jul 2023 07:46:30 -0700 (PDT) Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com. [209.85.208.42]) by smtp.gmail.com with ESMTPSA id w16-20020a50fa90000000b0052275deb475sm722699edr.23.2023.07.27.07.46.29 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Jul 2023 07:46:30 -0700 (PDT) Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5221e487e08so1474057a12.0 for ; Thu, 27 Jul 2023 07:46:29 -0700 (PDT) X-Received: by 2002:aa7:d311:0:b0:522:29c9:d30 with SMTP id p17-20020aa7d311000000b0052229c90d30mr2015473edq.10.1690469189494; Thu, 27 Jul 2023 07:46:29 -0700 (PDT) MIME-Version: 1.0 References: <20230726141026.307690-1-aleksandr.mikhalitsyn@canonical.com> <20230726141026.307690-4-aleksandr.mikhalitsyn@canonical.com> <6ea8bf93-b456-bda4-b39d-a43328987ac9@redhat.com> <20230727-bedeuten-endkampf-22c87edd132b@brauner> In-Reply-To: From: =?UTF-8?Q?St=C3=A9phane_Graber?= Date: Thu, 27 Jul 2023 10:46:18 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v7 03/11] ceph: handle idmapped mounts in create_request_message() To: Aleksandr Mikhalitsyn Cc: Christian Brauner , Xiubo Li , linux-fsdevel@vger.kernel.org, Jeff Layton , Ilya Dryomov , ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn wrote: > > On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner wrote: > > > > On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: > > > On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li wrote: > > > > > > > > > > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > > > > Inode operations that create a new filesystem object such as ->mknod, > > > > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > > > filesystem object. > > > > > > > > > > In order to ensure that the correct {g,u}id is used map the caller's > > > > > fs{g,u}id for creation requests. This doesn't require complex changes. > > > > > It suffices to pass in the relevant idmapping recorded in the request > > > > > message. If this request message was triggered from an inode operation > > > > > that creates filesystem objects it will have passed down the relevant > > > > > idmaping. If this is a request message that was triggered from an inode > > > > > operation that doens't need to take idmappings into account the initial > > > > > idmapping is passed down which is an identity mapping. > > > > > > > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > > > which adds two new fields (owner_{u,g}id) to the request head structure. > > > > > So, we need to ensure that MDS supports it otherwise we need to fail > > > > > any IO that comes through an idmapped mount because we can't process it > > > > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > > > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > > > > values are unmapped. At the same time we can't map these fields with an > > > > > idmapping as it can break UID/GID-based permission checks logic on the > > > > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > > > > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > > > > > > > Cc: Xiubo Li > > > > > Cc: Jeff Layton > > > > > Cc: Ilya Dryomov > > > > > Cc: ceph-devel@vger.kernel.org > > > > > Co-Developed-by: Alexander Mikhalitsyn > > > > > Signed-off-by: Christian Brauner > > > > > Signed-off-by: Alexander Mikhalitsyn > > > > > --- > > > > > v7: > > > > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > > > > --- > > > > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > > > > fs/ceph/mds_client.h | 5 ++++- > > > > > include/linux/ceph/ceph_fs.h | 4 +++- > > > > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > > > index c641ab046e98..ac095a95f3d0 100644 > > > > > --- a/fs/ceph/mds_client.c > > > > > +++ b/fs/ceph/mds_client.c > > > > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > { > > > > > int mds = session->s_mds; > > > > > struct ceph_mds_client *mdsc = session->s_mdsc; > > > > > + struct ceph_client *cl = mdsc->fsc->client; > > > > > struct ceph_msg *msg; > > > > > struct ceph_mds_request_head_legacy *lhead; > > > > > const char *path1 = NULL; > > > > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > lhead = find_legacy_request_head(msg->front.iov_base, > > > > > session->s_con.peer_features); > > > > > > > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > > > > + pr_err_ratelimited_client(cl, > > > > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > > > + " is not supported by MDS. Fail request with -EIO.\n"); > > > > > + > > > > > + ret = -EIO; > > > > > + goto out_err; > > > > > + } > > > > > + > > > > > > > > I think this couldn't fail the mounting operation, right ? > > > > > > This won't fail mounting. First of all an idmapped mount is always an > > > additional mount, you always > > > start from doing "normal" mount and only after that you can use this > > > mount to create an idmapped one. > > > ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > > > > > > > > > > IMO we should fail the mounting from the beginning. > > > > > > Unfortunately, we can't fail mount from the beginning. Procedure of > > > the idmapped mounts > > > creation is handled not on the filesystem level, but on the VFS level > > > > Correct. It's a generic vfsmount feature. > > > > > (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 > > > ) > > > > > > Kernel perform all required checks as: > > > - filesystem type has declared to support idmappings > > > (fs_type->fs_flags & FS_ALLOW_IDMAP) > > > - user who creates idmapped mount should be CAP_SYS_ADMIN in a user > > > namespace that owns superblock of the filesystem > > > (for cephfs it's always init_user_ns => user should be root on the host) > > > > > > So I would like to go this way because of the reasons mentioned above: > > > - root user is someone who understands what he does. > > > - idmapped mounts are never "first" mounts. They are always created > > > after "normal" mount. > > > - effectively this check makes "normal" mount to work normally and > > > fail only requests that comes through an idmapped mounts > > > with reasonable error message. Obviously, all read operations will > > > work perfectly well only the operations that create new inodes will > > > fail. > > > Btw, we already have an analogical semantic on the VFS level for users > > > who have no UID/GID mapping to the host. Filesystem requests for > > > such users will fail with -EOVERFLOW. Here we have something close. > > > > Refusing requests coming from an idmapped mount if the server misses > > appropriate features is good enough as a first step imho. And yes, we do > > have similar logic on the vfs level for unmapped uid/gid. > > Thanks, Christian! > > I wanted to add that alternative here is to modify caller_{u,g}id > fields as it was done in the first approach, > it will break the UID/GID-based permissions model for old MDS versions > (we can put printk_once to inform user about this), > but at the same time it will allow us to support idmapped mounts in > all cases. This support will be not fully ideal for old MDS > and perfectly well for new MDS versions. > > Alternatively, we can introduce cephfs mount option like > "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id > for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled > (default) we fail requests with -EIO. For > new MDS everything goes in the right way. > > Kind regards, > Alex Hey there, A very strong +1 on there needing to be some way to make this work with older Ceph releases. Ceph Reef isn't out yet and we're in July 2023, so I'd really like not having to wait until Ceph Squid in mid 2024 to be able to make use of this! Some kind of mount option, module option or the like would all be fine for this. Stéphane