Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1152615ioo; Fri, 27 May 2022 02:40:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx+92poWokLEzqkaiN3tFT3jHR8+vdrQGoduofWFV4YXR4uXWqisjh0RfGJLVH8ndkpsoEp X-Received: by 2002:a17:907:d1b:b0:6fe:b941:946 with SMTP id gn27-20020a1709070d1b00b006feb9410946mr29399907ejc.618.1653644426567; Fri, 27 May 2022 02:40:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653644426; cv=none; d=google.com; s=arc-20160816; b=MJUXqD/Hdzef+417iMU28+GIxs0BVIdj8Xwnx3R3Lio6ij7OkYRBHX9tk1bqub9h24 8lO4SeYD/5DRE5C2cj6Jhs61o6gWK04JYanGouPq6iSwPmgREQGt0c/lojVyFfjUTntz CifvcfbZ+oEoJ2HmMfh26rTOi5Kdv7i+TssrbTYfvUYAVFEhOUGeK4Lghsl9fTiiVk9U K7IFg0c6SxkCdMnxEylClOqw+FRaClkucfmCJviUfjm3UXqNu9Wq1DNPBqsJk/hZzMzf R8RAUzmByrGcl8kiwNOdNodrHhQumDVyGB54OJtA1Gjf8U4TlLRbvwwVtjgx0a9ql/kE YY8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=xHxSkLfKOvY2blX4vE9oyLCTHCBCzbXp2O86/tRVBP8=; b=C4X9U3UpEFFrvYzgJzS6oWc+40AAq7O3OlwokXwMpLUkyzVakX//vz+qLGmmA4rCXX LeeCx6aGcTVYUxqcKNIPNg59sNn61pM9yMMN1sfAoTQ+FN8ZfpTrgTDp7UWNOFZOS4fA 9V++qOCdUaToeY2sz1RPFeTc23DWK19MrO+quoPb6CPaV8epq2+dfMDOAayTtQhjBh2n TNQoPrkhPaXZc3WkPfuPZe09mmP4FA6O5arJymAfj46ib8WMiHkReJgS+GvdlWqxppAP SDb8nX1QdHDDVzUxjVc58SsvJ4PJfnCDb9UvbKrt7Le6Q1Kyax7j5yQb60f2LZn2GXeo 8mnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=W5pIMFqf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e20-20020a17090658d400b006fe8ee84794si4742261ejs.370.2022.05.27.02.40.00; Fri, 27 May 2022 02:40:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=W5pIMFqf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231446AbiE0AoO (ORCPT + 99 others); Thu, 26 May 2022 20:44:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230115AbiE0AoN (ORCPT ); Thu, 26 May 2022 20:44:13 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B0B3E64C0; Thu, 26 May 2022 17:44:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 02A8CB82178; Fri, 27 May 2022 00:44:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1C545C385B8; Fri, 27 May 2022 00:44:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653612249; bh=YMVu9EkRya7bepAagUUeL4DJvi1g8W6zwzX+5oCDky0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=W5pIMFqfJTCpFzXnh639oDwcdHWEVG7id+hDbNelmVoXxU8dUSYHvIVAMu94q1TcM gLCuAJeYfr//mHeWNvAsnrBa79QWefLCCmm641CrAANg6+bcjBTH+a955l8Z/xZC7e gbp+jQduWYNgBFqlq8g3nMhP8OwqAe60Iity+Ibtcxs9mLIqiHfvf2v9Jc9UHR8pIR wKvkwpQakW9QEaOlnwEZBp74RxLLulqh3Uwn0eP8G8Iew70bpc7kuNcSJp7v6usA3l SNgqYHvyHz7Vn204AJe1fVyN42nhW0Oa458mwEntX3jA4ygRM+iuafLfPb/suA32WL GqK9GYDQ8aZVQ== Message-ID: Subject: Re: [RFC PATCH v2] ceph: prevent a client from exceeding the MDS maximum xattr size From: Jeff Layton To: Xiubo Li , =?ISO-8859-1?Q?Lu=EDs?= Henriques , Ilya Dryomov Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 26 May 2022 20:44:07 -0400 In-Reply-To: <3cb96552-9747-c6b4-c8d3-81af60e5ae6a@redhat.com> References: <20220525172427.3692-1-lhenriques@suse.de> <3cb96552-9747-c6b4-c8d3-81af60e5ae6a@redhat.com> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.1 (3.44.1-1.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2022-05-27 at 08:36 +0800, Xiubo Li wrote: > On 5/27/22 2:39 AM, Jeff Layton wrote: > > On Wed, 2022-05-25 at 18:24 +0100, Lu=EDs Henriques wrote: > > > The MDS tries to enforce a limit on the total key/values in extended > > > attributes. However, this limit is enforced only if doing a synchron= ous > > > operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS > > > doesn't have a chance to enforce these limits. > > >=20 > > > This patch adds support for an extra feature bit that will allow the > > > client to get the MDS max_xattr_pairs_size setting in the session mes= sage. > > > Then, when setting an xattr, the kernel will revert to do a synchrono= us > > > operation if that maximum size is exceeded. > > >=20 > > > While there, fix a dout() that would trigger a printk warning: > > >=20 > > > [ 98.718078] ------------[ cut here ]------------ > > > [ 98.719012] precision 65536 too large > > > [ 98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnpr= intf+0x5e3/0x600 > > > ... > > >=20 > > > URL: https://tracker.ceph.com/issues/55725 > > > Signed-off-by: Lu=EDs Henriques > > > --- > > > fs/ceph/mds_client.c | 12 ++++++++++++ > > > fs/ceph/mds_client.h | 15 ++++++++++++++- > > > fs/ceph/xattr.c | 12 ++++++++---- > > > 3 files changed, 34 insertions(+), 5 deletions(-) > > >=20 > > > * Changes since v1 > > >=20 > > > Added support for new feature bit to get the MDS max_xattr_pairs_size > > > setting. > > >=20 > > > Also note that this patch relies on a patch that hasn't been merged y= et > > > ("ceph: use correct index when encoding client supported features"), > > > otherwise the new feature bit won't be correctly encoded. > > >=20 > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > index 35597fafb48c..87a25b7cf496 100644 > > > --- a/fs/ceph/mds_client.c > > > +++ b/fs/ceph/mds_client.c > > > @@ -3500,6 +3500,7 @@ static void handle_session(struct ceph_mds_sess= ion *session, > > > struct ceph_mds_session_head *h; > > > u32 op; > > > u64 seq, features =3D 0; > > > + u64 max_xattr_pairs_size =3D 0; > > > int wake =3D 0; > > > bool blocklisted =3D false; > > > =20 > > > @@ -3545,6 +3546,9 @@ static void handle_session(struct ceph_mds_sess= ion *session, > > > } > > > } > > > =20 > > > + if (msg_version >=3D 6) > > > + ceph_decode_64_safe(&p, end, max_xattr_pairs_size, bad); > > > + > > > mutex_lock(&mdsc->mutex); > > > if (op =3D=3D CEPH_SESSION_CLOSE) { > > > ceph_get_mds_session(session); > > > @@ -3552,6 +3556,12 @@ static void handle_session(struct ceph_mds_ses= sion *session, > > > } > > > /* FIXME: this ttl calculation is generous */ > > > session->s_ttl =3D jiffies + HZ*mdsc->mdsmap->m_session_autoclose; > > > + > > > + if (max_xattr_pairs_size && (op =3D=3D CEPH_SESSION_OPEN)) { > > > + dout("Changing MDS max xattrs pairs size: %llu =3D> %llu\n", > > > + mdsc->max_xattr_pairs_size, max_xattr_pairs_size); > > > + mdsc->max_xattr_pairs_size =3D max_xattr_pairs_size; > > > + } > > > mutex_unlock(&mdsc->mutex); > > > =20 > > > mutex_lock(&session->s_mutex); > > > @@ -4761,6 +4771,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc) > > > strscpy(mdsc->nodename, utsname()->nodename, > > > sizeof(mdsc->nodename)); > > > =20 > > > + mdsc->max_xattr_pairs_size =3D MDS_MAX_XATTR_PAIRS_SIZE; > > > + > > > fsc->mdsc =3D mdsc; > > > return 0; > > > =20 > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > > > index ca32f26f5eed..3db777df6d88 100644 > > > --- a/fs/ceph/mds_client.h > > > +++ b/fs/ceph/mds_client.h > > > @@ -29,8 +29,11 @@ enum ceph_feature_type { > > > CEPHFS_FEATURE_MULTI_RECONNECT, > > > CEPHFS_FEATURE_DELEG_INO, > > > CEPHFS_FEATURE_METRIC_COLLECT, > > > + CEPHFS_FEATURE_ALTERNATE_NAME, > > > + CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > > > + CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE, > > Having to make this feature-bit-dependent kind of sucks. I wonder if it > > could be avoided... > >=20 > > A question: > >=20 > > How do the MDS's discover this setting? Do they get it from the mons? I= f > > so, I wonder if there is a way for the clients to query the mon for thi= s > > instead of having to extend the MDS protocol? >=20 > It sounds like what the "max_file_size" does, which will be recorded in= =20 > the 'mdsmap'. >=20 > While currently the "max_xattr_pairs_size" is one MDS's option for each= =20 > daemon and could set different values for each MDS. >=20 >=20 Right, but the MDS's in general don't use local config files. Where are these settings stored? Could the client (potentially) query for them? I'm pretty sure the client does fetch and parse the mdsmap. If it's there then it could grab the setting for all of the MDS's at mount time and settle on the lowest one. I think a solution like that might be more resilient than having to fiddle with feature bits and such... > > > =20 > > > - CEPHFS_FEATURE_MAX =3D CEPHFS_FEATURE_METRIC_COLLECT, > > > + CEPHFS_FEATURE_MAX =3D CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE, > > > }; > > > =20 > > > /* > > > @@ -45,9 +48,16 @@ enum ceph_feature_type { > > > CEPHFS_FEATURE_MULTI_RECONNECT, \ > > > CEPHFS_FEATURE_DELEG_INO, \ > > > CEPHFS_FEATURE_METRIC_COLLECT, \ > > > + CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE, \ > > > } > > > #define CEPHFS_FEATURES_CLIENT_REQUIRED {} > > > =20 > > > +/* > > > + * Maximum size of xattrs the MDS can handle per inode by default. = This > > > + * includes the attribute name and 4+4 bytes for the key/value sizes= . > > > + */ > > > +#define MDS_MAX_XATTR_PAIRS_SIZE (1<<16) /* 64K */ > > > + > > > /* > > > * Some lock dependencies: > > > * > > > @@ -404,6 +414,9 @@ struct ceph_mds_client { > > > struct rb_root quotarealms_inodes; > > > struct mutex quotarealms_inodes_mutex; > > > =20 > > > + /* maximum aggregate size of extended attributes on a file */ > > > + u64 max_xattr_pairs_size; > > > + > > > /* > > > * snap_rwsem will cover cap linkage into snaprealms, and > > > * realm snap contexts. (later, we can do per-realm snap > > > diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c > > > index 8c2dc2c762a4..175a8c1449aa 100644 > > > --- a/fs/ceph/xattr.c > > > +++ b/fs/ceph/xattr.c > > > @@ -1086,7 +1086,7 @@ static int ceph_sync_setxattr(struct inode *ino= de, const char *name, > > > flags |=3D CEPH_XATTR_REMOVE; > > > } > > > =20 > > > - dout("setxattr value=3D%.*s\n", (int)size, value); > > > + dout("setxattr value size: %ld\n", size); > > > =20 > > > /* do request */ > > > req =3D ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); > > > @@ -1184,8 +1184,14 @@ int __ceph_setxattr(struct inode *inode, const= char *name, > > > spin_lock(&ci->i_ceph_lock); > > > retry: > > > issued =3D __ceph_caps_issued(ci, NULL); > > > - if (ci->i_xattrs.version =3D=3D 0 || !(issued & CEPH_CAP_XATTR_EXCL= )) > > > + required_blob_size =3D __get_required_blob_size(ci, name_len, val_l= en); > > > + if ((ci->i_xattrs.version =3D=3D 0) || !(issued & CEPH_CAP_XATTR_EX= CL) || > > > + (required_blob_size >=3D mdsc->max_xattr_pairs_size)) { > > > + dout("%s do sync setxattr: version: %llu size: %d max: %llu\n", > > > + __func__, ci->i_xattrs.version, required_blob_size, > > > + mdsc->max_xattr_pairs_size); > > > goto do_sync; > > > + } > > > =20 > > > if (!lock_snap_rwsem && !ci->i_head_snapc) { > > > lock_snap_rwsem =3D true; > > > @@ -1201,8 +1207,6 @@ int __ceph_setxattr(struct inode *inode, const = char *name, > > > ceph_cap_string(issued)); > > > __build_xattrs(inode); > > > =20 > > > - required_blob_size =3D __get_required_blob_size(ci, name_len, val_l= en); > > > - > > > if (!ci->i_xattrs.prealloc_blob || > > > required_blob_size > ci->i_xattrs.prealloc_blob->alloc_len) { > > > struct ceph_buffer *blob; >=20 --=20 Jeff Layton