Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp242170img; Mon, 18 Mar 2019 02:02:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqwUbMkN+c+uYWfD+DJ012C5p/IdTo+QH9kfm4UIoax5+zoNGjj9eyvKZ837h6v7auupZRYr X-Received: by 2002:a63:ef57:: with SMTP id c23mr6562923pgk.176.1552899735859; Mon, 18 Mar 2019 02:02:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552899735; cv=none; d=google.com; s=arc-20160816; b=jRO/b8FuEBmHIR6hHXTCrYPs/wy1RIsO8DVH2uCxlPMf6hlbbZNSEBJWPfFRah6ptX 91pzRz2GoHlFPCa1mFEESJzt5rtj8GEyfWwE3v3QKgAbuV+JXqNjy13/COK+R4HSo6tA xsq1dWWw6Hl44FkUm7vdHIbc++9hn5PzTcPZv57IMooGaZcpyRF2y5CFyDwJTfHNbJCb 2lyxo48hh/ZAmosuJOWBr2H5kCtmS1Z6IgkjfgeR7zVIB95HsjhliOXXTcfC5l7AoPW1 RjgCMb0W27hv387G24n3KHhaCxHzBVOLcSpwpAGBJD9cXLYrNk88ORP6+bBvHQBzT0Lo 4nDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=FBNHIWsOL7REl9Zvf0I2JR5+/QSrHr/YLtEKrAEEbWA=; b=GlKtFsuQydUv2PE9dbob74sJBz5otsdhPRkR91ykVxwRn5i7aOwy/Mj+7vsGeslr++ Hv9NXPHBSnTVcwTnlkCtb+GjXATE2ClW42MWF0jvhFGj1Qd3kKUKkO35XFlx0U+uhEz4 WUXi2uelSMgLziMf5VLC+wWwL3rdGD0EVNhpShH+vCX09nCAXf6tWOz5EC3y8fh2nKnr HwEPB0JhbqB27Q/CArVdjJGiBnQSuKaB/srlrjqOA2odMy7Ndod4s8YBInlwgUtEmbB5 /S7GbN/J1mubh7Ss0+Q4nsmb6pYHAXYUcoLLo8J595NIEBnUbdjMIulAH1IBRtNIfOlz diyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JOLh8w0a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j26si9070620pfe.175.2019.03.18.02.02.00; Mon, 18 Mar 2019 02:02:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JOLh8w0a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727043AbfCRJBU (ORCPT + 99 others); Mon, 18 Mar 2019 05:01:20 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:45223 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726691AbfCRJBT (ORCPT ); Mon, 18 Mar 2019 05:01:19 -0400 Received: by mail-qt1-f194.google.com with SMTP id v20so16978040qtv.12; Mon, 18 Mar 2019 02:01:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FBNHIWsOL7REl9Zvf0I2JR5+/QSrHr/YLtEKrAEEbWA=; b=JOLh8w0aGBP/mglgM+yusLZiDMkPyz8PkSkpiWGO5Orj6DOTfe9vRHoVLMgJE+9ERr yluw7FRfiHqIijOBC3w8FcAIJ/7oL4N5t0soWvSzBiol/QjrJ80Gi+9OHCg5beI0LG3R C27C6/4sNalFJEg0/NfgxZeaGcxgrYqq8k3vPNGhZxHH1FKaTkuuL3nSE5Ch+tqA7vDM yCNlBj8/n0ltPB2vBVJa0TYv7y+YlsuMNCDeZVR8z9gPRRi8i0FJsVTU8Q4Pfsv6fp9k /tJrvDTW/MliyvGgevHDfJDR50ME8I2/CfpIWYqSJ8htGU8XAemuEVG3voC0aP3hWCSz A3Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FBNHIWsOL7REl9Zvf0I2JR5+/QSrHr/YLtEKrAEEbWA=; b=CZa5EjnaL79EJWgcfW0c3YIEM8/Xv4aCBTVo8xv1iNCGdJd+E7jEGJ8DCIgr9S3o2W tJaJ197OKdw3Tq+v50vYxVzTL6WSfZxNerY3R3+f3pfTmka2T3tilVkP3Ar4tmt8mLgq ajpjv3efWDJGC7GVmsfDuU3dAzxZvX2/CppJkC/fqTHeGJOv/gjrsn27DsuVc5k2pVS0 bj3bnH7GFOY88PJyfsf+JwOv6Ad+phZdPn64Ubk0lHRHRDXfpp7T1wIR0VjR9S21oanV L4hDKbko+KfvADeCll7BCdWk1JMgZb1mk3qfBUIxvHSWmUye8dC+0Y3fAmzjhtbiGmu1 7NPQ== X-Gm-Message-State: APjAAAWZJRvbmdFToOL7lROLLg5FHVXKhiJnhJDUoNC4TVsxCeEpGxEn QUQFhLG7z2KabtDrcJGJHAqrtsIgc/RdJUXfpGM= X-Received: by 2002:ac8:3019:: with SMTP id f25mr8154291qte.204.1552899678450; Mon, 18 Mar 2019 02:01:18 -0700 (PDT) MIME-Version: 1.0 References: <20190312142019.30936-1-lhenriques@suse.com> <20190312142019.30936-3-lhenriques@suse.com> In-Reply-To: <20190312142019.30936-3-lhenriques@suse.com> From: "Yan, Zheng" Date: Mon, 18 Mar 2019 17:01:07 +0800 Message-ID: Subject: Re: [PATCH v2 2/2] ceph: quota: fix quota subdir mounts To: Luis Henriques Cc: "Yan, Zheng" , Sage Weil , Ilya Dryomov , ceph-devel , Linux Kernel Mailing List , Hendrik Peyerl Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 12, 2019 at 10:22 PM Luis Henriques wrote: > > The CephFS kernel client does not enforce quotas set in a directory that > isn't visible from the mount point. For example, given the path > '/dir1/dir2', if quotas are set in 'dir1' and the filesystem is mounted with > > mount -t ceph ::/dir1/ /mnt > > then the client won't be able to access 'dir1' inode, even if 'dir2' belongs > to a quota realm that points to it. > > This patch fixes this issue by simply doing an MDS LOOKUPINO operation for > unknown inodes. Any inode reference obtained this way will be added to a > list in ceph_mds_client, and will only be released when the filesystem is > umounted. > > Link: https://tracker.ceph.com/issues/38482 > Reported-by: Hendrik Peyerl > Signed-off-by: Luis Henriques > --- > fs/ceph/mds_client.c | 15 ++++++ > fs/ceph/mds_client.h | 2 + > fs/ceph/quota.c | 106 +++++++++++++++++++++++++++++++++++++++---- > fs/ceph/super.h | 2 + > 4 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 163fc74bf221..1dc24c3525fe 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -3656,6 +3656,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc) > mdsc->max_sessions = 0; > mdsc->stopping = 0; > atomic64_set(&mdsc->quotarealms_count, 0); > + INIT_LIST_HEAD(&mdsc->quotarealms_inodes_list); > + spin_lock_init(&mdsc->quotarealms_inodes_lock); > mdsc->last_snap_seq = 0; > init_rwsem(&mdsc->snap_rwsem); > mdsc->snap_realms = RB_ROOT; > @@ -3726,6 +3728,8 @@ static void wait_requests(struct ceph_mds_client *mdsc) > */ > void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) > { > + struct ceph_inode_info *ci; > + > dout("pre_umount\n"); > mdsc->stopping = 1; > > @@ -3738,6 +3742,17 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) > * their inode/dcache refs > */ > ceph_msgr_flush(); > + /* > + * It's now safe to clean quotarealms_inode_list without holding > + * mdsc->quotarealms_inodes_lock > + */ > + while (!list_empty(&mdsc->quotarealms_inodes_list)) { > + ci = list_first_entry(&mdsc->quotarealms_inodes_list, > + struct ceph_inode_info, > + i_quotarealms_inode_item); > + list_del(&ci->i_quotarealms_inode_item); > + iput(&ci->vfs_inode); > + } > } > > /* > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > index 729da155ebf0..58968fb338ec 100644 > --- a/fs/ceph/mds_client.h > +++ b/fs/ceph/mds_client.h > @@ -329,6 +329,8 @@ struct ceph_mds_client { > int stopping; /* true if shutting down */ > > atomic64_t quotarealms_count; /* # realms with quota */ > + struct list_head quotarealms_inodes_list; > + spinlock_t quotarealms_inodes_lock; > > /* > * snap_rwsem will cover cap linkage into snaprealms, and > diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c > index 9455d3aef0c3..d1ab1b331c0d 100644 > --- a/fs/ceph/quota.c > +++ b/fs/ceph/quota.c > @@ -22,7 +22,16 @@ void ceph_adjust_quota_realms_count(struct inode *inode, bool inc) > static inline bool ceph_has_realms_with_quotas(struct inode *inode) > { > struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > - return atomic64_read(&mdsc->quotarealms_count) > 0; > + struct super_block *sb = mdsc->fsc->sb; > + > + if (atomic64_read(&mdsc->quotarealms_count) > 0) > + return true; > + /* if root is the real CephFS root, we don't have quota realms */ > + if (sb->s_root->d_inode && > + (sb->s_root->d_inode->i_ino == CEPH_INO_ROOT)) > + return false; > + /* otherwise, we can't know for sure */ > + return true; > } > > void ceph_handle_quota(struct ceph_mds_client *mdsc, > @@ -68,6 +77,37 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc, > iput(inode); > } > > +/* > + * This function will try to lookup a realm inode. If the inode is found > + * (through an MDS LOOKUPINO operation), the realm->inode will be updated and > + * the inode will also be added to an mdsc list which will be freed only when > + * the filesystem is umounted. > + */ > +static struct inode *lookup_quotarealm_inode(struct ceph_mds_client *mdsc, > + struct super_block *sb, > + struct ceph_snap_realm *realm) > +{ > + struct inode *in; > + > + in = ceph_lookup_inode(sb, realm->ino); > + if (IS_ERR(in)) { > + pr_warn("Can't lookup inode %llx (err: %ld)\n", > + realm->ino, PTR_ERR(in)); > + return in; > + } > + > + spin_lock(&mdsc->quotarealms_inodes_lock); > + list_add(&ceph_inode(in)->i_quotarealms_inode_item, > + &mdsc->quotarealms_inodes_list); > + spin_unlock(&mdsc->quotarealms_inodes_lock); > + > + spin_lock(&realm->inodes_with_caps_lock); > + realm->inode = in; > + spin_unlock(&realm->inodes_with_caps_lock); > + > + return in; > +} > + > /* > * This function walks through the snaprealm for an inode and returns the > * ceph_snap_realm for the first snaprealm that has quotas set (either max_files > @@ -76,9 +116,15 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc, > * > * Note that the caller is responsible for calling ceph_put_snap_realm() on the > * returned realm. > + * > + * Callers of this function need to hold mdsc->snap_rwsem. However, if there's > + * a need to do an inode lookup, this rwsem will be temporarily dropped. Hence > + * the 'retry' argument: if rwsem needs to be dropped and 'retry' is 'false' > + * this function will return -EAGAIN; otherwise, the snaprealms walk-through > + * will be restarted. > */ > static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc, > - struct inode *inode) > + struct inode *inode, bool retry) > { > struct ceph_inode_info *ci = NULL; > struct ceph_snap_realm *realm, *next; > @@ -88,6 +134,7 @@ static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc, > if (ceph_snap(inode) != CEPH_NOSNAP) > return NULL; > > +restart: > realm = ceph_inode(inode)->i_snap_realm; > if (realm) > ceph_get_snap_realm(mdsc, realm); > @@ -95,11 +142,25 @@ static struct ceph_snap_realm *get_quota_realm(struct ceph_mds_client *mdsc, > pr_err_ratelimited("get_quota_realm: ino (%llx.%llx) " > "null i_snap_realm\n", ceph_vinop(inode)); > while (realm) { > + bool has_inode; > + > spin_lock(&realm->inodes_with_caps_lock); > - in = realm->inode ? igrab(realm->inode) : NULL; > + has_inode = realm->inode; > + in = has_inode ? igrab(realm->inode) : NULL; > spin_unlock(&realm->inodes_with_caps_lock); > - if (!in) > + if (has_inode && !in) > break; > + if (!in) { > + up_read(&mdsc->snap_rwsem); > + in = lookup_quotarealm_inode(mdsc, inode->i_sb, realm); > + down_read(&mdsc->snap_rwsem); > + if (IS_ERR(in)) > + break; > + ceph_put_snap_realm(mdsc, realm); > + if (!retry) > + return ERR_PTR(-EAGAIN); > + goto restart; > + } > > ci = ceph_inode(in); > has_quota = __ceph_has_any_quota(ci); > @@ -125,9 +186,22 @@ bool ceph_quota_is_same_realm(struct inode *old, struct inode *new) > struct ceph_snap_realm *old_realm, *new_realm; > bool is_same; > > +restart: > + /* > + * We need to lookup 2 quota realms atomically, i.e. with snap_rwsem. > + * However, get_quota_realm may drop it temporarily. By setting the > + * 'retry' parameter to 'false', we'll get -EAGAIN if the rwsem was > + * dropped and we can then restart the whole operation. > + */ > down_read(&mdsc->snap_rwsem); > - old_realm = get_quota_realm(mdsc, old); > - new_realm = get_quota_realm(mdsc, new); > + old_realm = get_quota_realm(mdsc, old, true); > + new_realm = get_quota_realm(mdsc, new, false); > + if (PTR_ERR(new_realm) == -EAGAIN) { > + up_read(&mdsc->snap_rwsem); > + if (old_realm) > + ceph_put_snap_realm(mdsc, old_realm); > + goto restart; > + } > is_same = (old_realm == new_realm); > up_read(&mdsc->snap_rwsem); > > @@ -166,6 +240,7 @@ static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op, > return false; > > down_read(&mdsc->snap_rwsem); > +restart: > realm = ceph_inode(inode)->i_snap_realm; > if (realm) > ceph_get_snap_realm(mdsc, realm); > @@ -173,12 +248,23 @@ static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op, > pr_err_ratelimited("check_quota_exceeded: ino (%llx.%llx) " > "null i_snap_realm\n", ceph_vinop(inode)); > while (realm) { > + bool has_inode; > + > spin_lock(&realm->inodes_with_caps_lock); > - in = realm->inode ? igrab(realm->inode) : NULL; > + has_inode = realm->inode; > + in = has_inode ? igrab(realm->inode) : NULL; > spin_unlock(&realm->inodes_with_caps_lock); > - if (!in) > + if (has_inode && !in) > break; > - > + if (!in) { > + up_read(&mdsc->snap_rwsem); > + in = lookup_quotarealm_inode(mdsc, inode->i_sb, realm); > + down_read(&mdsc->snap_rwsem); > + if (IS_ERR(in)) > + break; > + ceph_put_snap_realm(mdsc, realm); > + goto restart; > + } > ci = ceph_inode(in); > spin_lock(&ci->i_ceph_lock); > if (op == QUOTA_CHECK_MAX_FILES_OP) { > @@ -314,7 +400,7 @@ bool ceph_quota_update_statfs(struct ceph_fs_client *fsc, struct kstatfs *buf) > bool is_updated = false; > > down_read(&mdsc->snap_rwsem); > - realm = get_quota_realm(mdsc, d_inode(fsc->sb->s_root)); > + realm = get_quota_realm(mdsc, d_inode(fsc->sb->s_root), true); > up_read(&mdsc->snap_rwsem); > if (!realm) > return false; > diff --git a/fs/ceph/super.h b/fs/ceph/super.h > index ce51e98b08ec..cc7766aeb73b 100644 > --- a/fs/ceph/super.h > +++ b/fs/ceph/super.h > @@ -375,6 +375,8 @@ struct ceph_inode_info { > struct list_head i_snap_realm_item; > struct list_head i_snap_flush_item; > > + struct list_head i_quotarealms_inode_item; > + > struct work_struct i_wb_work; /* writeback work */ > struct work_struct i_pg_inv_work; /* page invalidation work */ > After reading the code carefully. I feel a little uncomfortable with the "lookup_ino" in get_quota_realm. how about populating directories above the 'mount subdir' during mounting (similar to cifs_get_root ). Regards Yan, Zheng