Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp978286rwd; Wed, 7 Jun 2023 09:17:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4ClVznSkAKEsLZ85Lekj/m1d/Owvik6BWE/P3zoiFf/B1l4GjEA3KOgmJfWG4Wskd6K0b+ X-Received: by 2002:a17:90b:2308:b0:240:f8a6:55c7 with SMTP id mt8-20020a17090b230800b00240f8a655c7mr1835325pjb.20.1686154621540; Wed, 07 Jun 2023 09:17:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686154621; cv=none; d=google.com; s=arc-20160816; b=LcZoe3xrVrC/0oHKsErLA3V1I8SpD046AiiuhWWlNn4KEt/4V9sIU7r6bMPuChWU66 j/mzwV88MEZMqPqTwUrn4FDNvBwW5vKe5/J+0QTRq2DfDPshuu7/nREjBYH64T+QO0AN 6VFLGqz7d76v+2065x5OGqg1lyCeqb/FOicVPYICjkOl4YiOidZ1e7Su0Krquhujx0y2 +d+xl5i4mxrUXq9DN64t83TS/Cz5YDeHcudubEsRByUi2M0gpDmL7rdlcoInIr2CBGcP L+K1PX7Pc7f2kOJulB11Wc813VCDqMck8sr3gZBG8k9vt4PJzhaEI0/EvtGfb684It95 tcAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=IWFv37c+eoG8tJI1JWj5okUGLZBX2yml/Xw7WCENPys=; b=FilLUOxZzq9K9zE9tbGMGezfusQ5+xDCcGz1nnhJMmSH85zojcdGEC+X4TOmk3NjlQ u+ejRdktWUNaFMiYOonCTn5aA2mihfY5ax+lvaAo3QCqbC8MUpoLRSbsEXWvw5nENvqf PH56sI1pp0Mj8D3hzeDnFalnAQecJdgU9Zidivi9pd6/NOu4JLV/YRce3qUf5P5Br2W4 icOCYRER/lCdx4u9U/ihZ1GAjJVdDt34LIRW3wf1mXS7CMKX6508op1s5P8O3EP8wN3g 46l0snFmogtnW42hxfRb84jYU8TIhxKEWt7SID/75Iuoz3CkkWnBSEjqlE+PgyKDAqsD 7uPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b="t/ldjzpw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 123-20020a630181000000b005303c3dedc2si9144200pgb.419.2023.06.07.09.16.46; Wed, 07 Jun 2023 09:17:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b="t/ldjzpw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241266AbjFGPdf (ORCPT + 99 others); Wed, 7 Jun 2023 11:33:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241291AbjFGPdd (ORCPT ); Wed, 7 Jun 2023 11:33:33 -0400 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5142D1FC3 for ; Wed, 7 Jun 2023 08:33:24 -0700 (PDT) Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id CB3BD3F15C for ; Wed, 7 Jun 2023 15:33:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1686152002; bh=IWFv37c+eoG8tJI1JWj5okUGLZBX2yml/Xw7WCENPys=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=t/ldjzpw/R7gPa42tWKVISUAqjeIqZ6g6LmSPP59jjZnP5OGnIva5ppk5xaP4KmAR OdqYaMpZ1915o9XmoYDzuB7Mm+M8V5ID1Y5bHOQ9eWAFuHf6mSxpLeWYDvc3cDSu++ mOOn4wro7J6AX9DJuxqLSRmlQTbKgoM/1f6LZ8e21rxHrsYTZY0cswSsqO0hhTdYBm xuB18wbOUAyVQHfUri8f2KOSuSKJSpKiEtQ4dZHMH42QOlIWnerOb+ws4oQxRRQuRB TtPwYUiWg330YHPA5rgJ9+Ayi+TqDrEZrR00pnymnKrI3GkRPIVUeavGtbAYQWkMV4 erWUw2RqNd4gw== Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-568960f4596so126889587b3.1 for ; Wed, 07 Jun 2023 08:33:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686152002; x=1688744002; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IWFv37c+eoG8tJI1JWj5okUGLZBX2yml/Xw7WCENPys=; b=DWKKIauF/uV2OHuTj9C/qifKrrrBJppcMjvzQr0R+ZtTSaYkQnyJxz/+xsIDvVHZm8 FXjb3PWyAyO04GpHXxfApBfbrPfRotV3m2JUxXyYRkjJmE7rnFIE9BStnkCYc2CqUo6O voJ4bAjuzhchn5w9QmX8TpdebeQGkqH9g3NrJQVOvYgTh45v+mkZDf0O/OuZ3sLELXiY wlwWVQ61+KyRA8ds0M9yf4gF2yjMKfYIWfYdixcUSldwGEVMN211Bp4x16cQGjdQ8uLY wtD82RFb8dMDNtg7U9QMk4NAltrdFqiRA9BhcY7ImDO7LJHmg0w5xBg3obh+Yx22zFLp HOKA== X-Gm-Message-State: AC+VfDy5GZgLSxwTl6A58Gl3Ndq7lizQ0cKyaTK96i+YNtlSDm5aj5pI 6IGjet33CGNfO1OHQbfBavZ9tdxNH4eOQkGtW9SNY4IXeIfiReDRdAc3rrqi4ezXS1rlA7SN60E 2i+cmBcb1Mz/+fVVIvAQB3Wq8k1B+VelyUTKwhYhgGfVIjSIYWu19EHBUVQ== X-Received: by 2002:a81:9111:0:b0:561:da0d:6488 with SMTP id i17-20020a819111000000b00561da0d6488mr2028183ywg.50.1686152001633; Wed, 07 Jun 2023 08:33:21 -0700 (PDT) X-Received: by 2002:a81:9111:0:b0:561:da0d:6488 with SMTP id i17-20020a819111000000b00561da0d6488mr2028126ywg.50.1686152000504; Wed, 07 Jun 2023 08:33:20 -0700 (PDT) MIME-Version: 1.0 References: <20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com> <20230607152038.469739-4-aleksandr.mikhalitsyn@canonical.com> In-Reply-To: <20230607152038.469739-4-aleksandr.mikhalitsyn@canonical.com> From: Aleksandr Mikhalitsyn Date: Wed, 7 Jun 2023 17:33:09 +0200 Message-ID: Subject: Re: [PATCH v3 03/14] ceph: handle idmapped mounts in create_request_message() To: xiubli@redhat.com Cc: brauner@kernel.org, stgraber@ubuntu.com, linux-fsdevel@vger.kernel.org, Christian Brauner , Jeff Layton , Ilya Dryomov , ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 7, 2023 at 5:21=E2=80=AFPM Alexander Mikhalitsyn wrote: > > From: Christian Brauner > > Inode operations that create a new filesystem object such as ->mknod, > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > filesystem object. > > Cephfs mds creation request argument structures mirror this filesystem > behavior. They don't encode a {g,u}id explicitly. Instead the caller's > fs{g,u}id that is always sent as part of any mds request is used by the > servers to set the {g,u}id of the new filesystem object. > > In order to ensure that the correct {g,u}id is used map the caller's > fs{g,u}id for creation requests. This doesn't require complex changes. > It suffices to pass in the relevant idmapping recorded in the request > message. If this request message was triggered from an inode operation > that creates filesystem objects it will have passed down the relevant > idmaping. If this is a request message that was triggered from an inode > operation that doens't need to take idmappings into account the initial > idmapping is passed down which is an identity mapping and thus is > guaranteed to leave the caller's fs{g,u}id unchanged.,u}id is sent. > > The last few weeks before Christmas 2021 I have spent time not just > reading and poking the cephfs kernel code but also took a look at the > ceph mds server userspace to ensure I didn't miss some subtlety. > > This made me aware of one complication to solve. All requests send the > caller's fs{g,u}id over the wire. The caller's fs{g,u}id matters for the > server in exactly two cases: > > 1. to set the ownership for creation requests > 2. to determine whether this client is allowed access on this server > > Case 1. we already covered and explained. Case 2. is only relevant for > servers where an explicit uid access restriction has been set. That is > to say the mds server restricts access to requests coming from a > specific uid. Servers without uid restrictions will grant access to > requests from any uid by setting MDS_AUTH_UID_ANY. > > Case 2. introduces the complication because the caller's fs{g,u}id is > not just used to record ownership but also serves as the {g,u}id used > when checking access to the server. > > Consider a user mounting a cephfs client and creating an idmapped mount > from it that maps files owned by uid 1000 to be owned uid 0: > > mount -t cephfs -o [...] /unmapped > mount-idmapped --map-mount 1000:0:1 /idmapped > > That is to say if the mounted cephfs filesystem contains a file "file1" > which is owned by uid 1000: > > - looking at it via /unmapped/file1 will report it as owned by uid 1000 > (One can think of this as the on-disk value.) > - looking at it via /idmapped/file1 will report it as owned by uid 0 > > Now, consider creating new files via the idmapped mount at /idmapped. > When a caller with fs{g,u}id 1000 creates a file "file2" by going > through the idmapped mount mounted at /idmapped it will create a file > that is owned by uid 1000 on-disk, i.e.: > > - looking at it via /unmapped/file2 will report it as owned by uid 1000 > - looking at it via /idmapped/file2 will report it as owned by uid 0 > > Now consider an mds server that has a uid access restriction set and > only grants access to requests from uid 0. > > If the client sends a creation request for a file e.g. /idmapped/file2 > it will send the caller's fs{g,u}id idmapped according to the idmapped > mount. So if the caller has fs{g,u}id 1000 it will be mapped to {g,u}id > 0 in the idmapped mount and will be sent over the wire allowing the > caller access to the mds server. > > However, if the caller is not issuing a creation request the caller's > fs{g,u}id will be send without the mount's idmapping applied. So if the > caller that just successfully created a new file on the restricted mds > server sends a request as fs{g,u}id 1000 access will be refused. This > however is inconsistent. > > From my perspective the root of the problem lies in the fact that > creation requests implicitly infer the ownership from the {g,u}id that > gets sent along with every mds request. > > I have thought of multiple ways of addressing this problem but the one I > prefer is to give all mds requests that create a filesystem object a > proper, separate {g,u}id field entry in the argument struct. This is, > for example how ->setattr mds requests work. > > This way the caller's fs{g,u}id can be used consistenly for server > access checks and is separated from the ownership for new filesystem > objects. > > Servers could then be updated to refuse creation requests whenever the > {g,u}id used for access checking doesn't match the {g,u}id used for > creating the filesystem object just as is done for setattr requests on a > uid restricted server. But I am, of course, open to other suggestions. > > Cc: Xiubo Li > Cc: Jeff Layton > Cc: Ilya Dryomov > Cc: ceph-devel@vger.kernel.org > Signed-off-by: Christian Brauner > Signed-off-by: Alexander Mikhalitsyn > --- > fs/ceph/mds_client.c | 22 ++++++++++++++++++---- > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 810c3db2e369..e4265843b838 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2583,6 +2583,8 @@ static struct ceph_msg *create_request_message(stru= ct ceph_mds_session *session, > void *p, *end; > int ret; > bool legacy =3D !(session->s_con.peer_features & CEPH_FEATURE_FS_= BTIME); > + kuid_t caller_fsuid; > + kgid_t caller_fsgid; > > ret =3D set_request_path_attr(req->r_inode, req->r_dentry, > req->r_parent, req->r_path1, req->r_ino1.in= o, > @@ -2651,10 +2653,22 @@ static struct ceph_msg *create_request_message(st= ruct ceph_mds_session *session, > > head->mdsmap_epoch =3D cpu_to_le32(mdsc->mdsmap->m_epoch); > head->op =3D cpu_to_le32(req->r_op); > - head->caller_uid =3D cpu_to_le32(from_kuid(&init_user_ns, > - req->r_cred->fsuid)); > - head->caller_gid =3D cpu_to_le32(from_kgid(&init_user_ns, > - req->r_cred->fsgid)); > + /* > + * Inode operations that create filesystem objects based on the > + * caller's fs{g,u}id like ->mknod(), ->create(), ->mkdir() etc. = don't > + * have separate {g,u}id fields in their respective structs in th= e > + * ceph_mds_request_args union. Instead the caller_{g,u}id field = is > + * used to set ownership of the newly created inode by the mds se= rver. > + * For these inode operations we need to send the mapped fs{g,u}i= d over > + * the wire. For other cases we simple set req->r_mnt_idmap to th= e > + * initial idmapping meaning the unmapped fs{g,u}id is sent. > + */ > + caller_fsuid =3D from_vfsuid(req->r_mnt_idmap, &init_user_ns, > + VFSUIDT_INIT(req->r_cred->fsuid))= ; > + caller_fsgid =3D from_vfsgid(req->r_mnt_idmap, &init_user_ns, > + VFSGIDT_INIT(req->r_cred->fsgid))= ; > + head->caller_uid =3D cpu_to_le32(from_kuid(&init_user_ns, caller_= fsuid)); > + head->caller_gid =3D cpu_to_le32(from_kgid(&init_user_ns, caller_= fsgid)); > head->ino =3D cpu_to_le64(req->r_deleg_ino); > head->args =3D req->r_args; > > -- > 2.34.1 > Probably it's worth adding to a commit message or cover letter, but let it be there for now. Explanation/demonstration from this thread: https://lore.kernel.org/lkml/CAEivzxefBRPozUPQxYgVh0gOpjsovtBuJ3w9BoqSizpST= _YxTA@mail.gmail.com/#t 1. Mount cephfs mount.ceph admin@XYZ.cephfs=3D/ /mnt/ceph -o mon_addr=3D127.0.0.1:6789,secret=3Dvery_secret_key 2. Make 1000:1000 a root dentry owner (it will be convenient because we want to use mapping 1000:0:1 for simplicity) chown 1000:1000 /mnt/ceph 3. create an idmapped mount based on a regular /mnt/ceph mount using a mount-idmapped tool that was written by Christian. [ taken from https://raw.githubusercontent.com/brauner/mount-idmapped/maste= r/mount-idmapped.c ] ./mount-idmapped --map-mount b:1000:0:1 /mnt/ceph /mnt/ceph_idmapped "b" stands for "both", so we are creating a mapping of length 1 for both UID and GID. 1000 is a UID/GID "on-disk", 0 is a mapped UID/GID. 4. Just to be precise, let's look at which UID/GID we have now. root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph total 4 drwxrwxrwx 2 1000 1000 0 Jun 1 17:51 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph_idmapped total 4 drwxrwxrwx 2 0 0 0 Jun 1 17:51 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. 5. Now let's create a bunch of files with different owners and through different mounts (idmapped/non-idmapped). 5.1. Create a file from 0:0 through the idmapped mount (it should appear as 1000:1000 on disk) root@ubuntu:/home/ubuntu# sudo -u#0 -g#0 touch /mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid0 5.2. Create a file from 1000:1000 through the idmapped mount (should fail because 1000:1000 is not a valid UID/GID as it can't be mapped back to the "on-disk" UID/GID set). root@ubuntu:/home/ubuntu# sudo -u#1000 -g#1000 touch /mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid1000 touch: cannot touch '/mnt/ceph_idmapped/created_through_idmapped_mnt_with_uid1000': Value too large for defined data type ... and we've got EOVERFLOW. That's correct! 5.3. Create a file from 0:0 but through the regular mount. (it should appear as overflowuid(=3D65534) in idmapped mount, because 0:0 on-disk is not mapped to the UID/GID set). root@ubuntu:/home/ubuntu# sudo -u#0 -g#0 touch /mnt/ceph/created_directly_with_uid0 5.4. Create a file from 1000:1000 but through the regular mount. (it should appear as 0:0 in idmapped mount, because 1000 (on-disk) mapped to 0). root@ubuntu:/home/ubuntu# sudo -u#1000 -g#1000 touch /mnt/ceph/created_directly_with_uid1000 6. Now let's look on the result: root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph total 4 drwxrwxrwx 2 1000 1000 3 Jun 1 17:54 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. -rw-r--r-- 1 0 0 0 Jun 1 17:54 created_directly_with_uid0 -rw-rw-r-- 1 1000 1000 0 Jun 1 17:54 created_directly_with_uid1000 -rw-r--r-- 1 1000 1000 0 Jun 1 17:53 created_through_idmapped_mnt_with_= uid0 root@ubuntu:/home/ubuntu# ls -lan /mnt/ceph_idmapped total 4 drwxrwxrwx 2 0 0 3 Jun 1 17:54 . drwxr-xr-x 4 0 0 4096 Jun 1 16:55 .. -rw-r--r-- 1 65534 65534 0 Jun 1 17:54 created_directly_with_uid0 -rw-rw-r-- 1 0 0 0 Jun 1 17:54 created_directly_with_uid1000 -rw-r--r-- 1 0 0 0 Jun 1 17:53 created_through_idmapped_mnt_with_uid0