Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp796027pxb; Tue, 3 Nov 2020 12:46:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJwoXmGvhX4VlXxzbSktRfNEVkEr0EXaxcu8Bj75vY+HzOf2y0cfnLkUCPcoxcE5WKhN5X43 X-Received: by 2002:aa7:d888:: with SMTP id u8mr22837880edq.210.1604436381866; Tue, 03 Nov 2020 12:46:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604436381; cv=none; d=google.com; s=arc-20160816; b=hykHFN1buGsJ5RG0PyFTuxslHqvT+Bx1YnxQo7azpHA/mWa7VzZegk1DGzKxbXDI2t +GtETsCnMLu4Al+gBACHBfOas+8Xbd57CdsVFWb1Z0uvvHgc6M9ub1pbn8RRgXYKM2LI Z5DI+tLhBZbvWBCO8uu+pJUWHGkixKNE8UaIub62t66xSJaE4JG8DEeMQ6ZJX7KC/4sz suhijuuKpDaFyc1A3MPu02NXsoRjDbXyTeaHMzpZV0SXiXCjeICOed5FHh46Q/BhI+e1 U7WOAQA2t+MiVM7b25nCc57Zdai+PX17ikPJtI2dHKrll0QEUHjR5CXqQhur7J9w6d+w 2aJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=3qwbFG+mLo6XshHGkqp9vPJk7vveikzwma8du3wjWdw=; b=zXOyw9ARuEvFtblA2sHvKLz1dMQdb1r/q/ueL8Q4mdAqSyQgQNnauO72VqoVEG1Pvt K4YlvWtOvakQkDMiUyQCtNz8MtHOTX8LnFpM2Huv9i0xUoVJpj7ar/usqLbq1fkgqLFa a693yH4dNmqWv7C0XSyCkxmSF3zWfT0mRk8mjR8rqRBvcU4qaATvLgEuIHc84fqek4Kg ZoE6LNTmTXmvHjQrcEJgowW3/5tM03Zj6fUTY9s86h0wb74EXsNT25YcQWDCh+6Jwtfp kbd276ZkU//OH0sWouTbKL6fQUViyt4HgOSbFKEs0a45dwkYlVkjP8HIJifBolkq71cx bMDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FpP6mqzZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 7si1010106edk.565.2020.11.03.12.45.58; Tue, 03 Nov 2020 12:46:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FpP6mqzZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730727AbgKCUnf (ORCPT + 99 others); Tue, 3 Nov 2020 15:43:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:57354 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730132AbgKCUnd (ORCPT ); Tue, 3 Nov 2020 15:43:33 -0500 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E3F8A223AC; Tue, 3 Nov 2020 20:43:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604436212; bh=XDayW03aY4At8iRA9J/d/8dVStdh2pZqyX8RZep60aQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FpP6mqzZzxsEAd0c/nKbk8QeuI3YM7/uIKZOkCXsLILAjsArRT94e8+UU9glXr4zJ fjDxZF1mBtktOAzBz15S5XUbQ+5yDdvts/JgAx8UTF+9NvKiY6wHNQOGRAK03XfIVr fDnW7n9V0idOEsh1QWBzyqw/DVKrbLTl1AiGS5zM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Yan, Zheng" , Jeff Layton , Ilya Dryomov , Sasha Levin Subject: [PATCH 5.9 152/391] ceph: encode inodes parent/d_name in cap reconnect message Date: Tue, 3 Nov 2020 21:33:23 +0100 Message-Id: <20201103203357.089038652@linuxfoundation.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201103203348.153465465@linuxfoundation.org> References: <20201103203348.153465465@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yan, Zheng [ Upstream commit a33f6432b3a63a4909dbbb0967f7c9df8ff2de91 ] Since nautilus, MDS tracks dirfrags whose child inodes have caps in open file table. When MDS recovers, it prefetches all of these dirfrags. This avoids using backtrace to load inodes. But dirfrags prefetch may load lots of useless inodes into cache, and make MDS run out of memory. Recent MDS adds an option that disables dirfrags prefetch. When dirfrags prefetch is disabled. Recovering MDS only prefetches corresponding dir inodes. Including inodes' parent/d_name in cap reconnect message can help MDS to load inodes into its cache. Signed-off-by: "Yan, Zheng" Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov Signed-off-by: Sasha Levin --- fs/ceph/mds_client.c | 89 ++++++++++++++++++++++++++++++-------------- 1 file changed, 61 insertions(+), 28 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 4a26862d7667e..76d8d9495d1d4 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -3612,6 +3612,39 @@ fail_msg: return err; } +static struct dentry* d_find_primary(struct inode *inode) +{ + struct dentry *alias, *dn = NULL; + + if (hlist_empty(&inode->i_dentry)) + return NULL; + + spin_lock(&inode->i_lock); + if (hlist_empty(&inode->i_dentry)) + goto out_unlock; + + if (S_ISDIR(inode->i_mode)) { + alias = hlist_entry(inode->i_dentry.first, struct dentry, d_u.d_alias); + if (!IS_ROOT(alias)) + dn = dget(alias); + goto out_unlock; + } + + hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) { + spin_lock(&alias->d_lock); + if (!d_unhashed(alias) && + (ceph_dentry(alias)->flags & CEPH_DENTRY_PRIMARY_LINK)) { + dn = dget_dlock(alias); + } + spin_unlock(&alias->d_lock); + if (dn) + break; + } +out_unlock: + spin_unlock(&inode->i_lock); + return dn; +} + /* * Encode information about a cap for a reconnect with the MDS. */ @@ -3625,13 +3658,32 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap, struct ceph_inode_info *ci = cap->ci; struct ceph_reconnect_state *recon_state = arg; struct ceph_pagelist *pagelist = recon_state->pagelist; - int err; + struct dentry *dentry; + char *path; + int pathlen, err; + u64 pathbase; u64 snap_follows; dout(" adding %p ino %llx.%llx cap %p %lld %s\n", inode, ceph_vinop(inode), cap, cap->cap_id, ceph_cap_string(cap->issued)); + dentry = d_find_primary(inode); + if (dentry) { + /* set pathbase to parent dir when msg_version >= 2 */ + path = ceph_mdsc_build_path(dentry, &pathlen, &pathbase, + recon_state->msg_version >= 2); + dput(dentry); + if (IS_ERR(path)) { + err = PTR_ERR(path); + goto out_err; + } + } else { + path = NULL; + pathlen = 0; + pathbase = 0; + } + spin_lock(&ci->i_ceph_lock); cap->seq = 0; /* reset cap seq */ cap->issue_seq = 0; /* and issue_seq */ @@ -3652,7 +3704,7 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap, rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci)); rec.v2.issued = cpu_to_le32(cap->issued); rec.v2.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v2.pathbase = 0; + rec.v2.pathbase = cpu_to_le64(pathbase); rec.v2.flock_len = (__force __le32) ((ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) ? 0 : 1); } else { @@ -3663,7 +3715,7 @@ static int reconnect_caps_cb(struct inode *inode, struct ceph_cap *cap, ceph_encode_timespec64(&rec.v1.mtime, &inode->i_mtime); ceph_encode_timespec64(&rec.v1.atime, &inode->i_atime); rec.v1.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v1.pathbase = 0; + rec.v1.pathbase = cpu_to_le64(pathbase); } if (list_empty(&ci->i_cap_snaps)) { @@ -3725,7 +3777,7 @@ encode_again: sizeof(struct ceph_filelock); rec.v2.flock_len = cpu_to_le32(struct_len); - struct_len += sizeof(u32) + sizeof(rec.v2); + struct_len += sizeof(u32) + pathlen + sizeof(rec.v2); if (struct_v >= 2) struct_len += sizeof(u64); /* snap_follows */ @@ -3749,7 +3801,7 @@ encode_again: ceph_pagelist_encode_8(pagelist, 1); ceph_pagelist_encode_32(pagelist, struct_len); } - ceph_pagelist_encode_string(pagelist, NULL, 0); + ceph_pagelist_encode_string(pagelist, path, pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v2)); ceph_locks_to_pagelist(flocks, pagelist, num_fcntl_locks, num_flock_locks); @@ -3758,39 +3810,20 @@ encode_again: out_freeflocks: kfree(flocks); } else { - u64 pathbase = 0; - int pathlen = 0; - char *path = NULL; - struct dentry *dentry; - - dentry = d_find_alias(inode); - if (dentry) { - path = ceph_mdsc_build_path(dentry, - &pathlen, &pathbase, 0); - dput(dentry); - if (IS_ERR(path)) { - err = PTR_ERR(path); - goto out_err; - } - rec.v1.pathbase = cpu_to_le64(pathbase); - } - err = ceph_pagelist_reserve(pagelist, sizeof(u64) + sizeof(u32) + pathlen + sizeof(rec.v1)); - if (err) { - goto out_freepath; - } + if (err) + goto out_err; ceph_pagelist_encode_64(pagelist, ceph_ino(inode)); ceph_pagelist_encode_string(pagelist, path, pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v1)); -out_freepath: - ceph_mdsc_free_path(path, pathlen); } out_err: - if (err >= 0) + ceph_mdsc_free_path(path, pathlen); + if (!err) recon_state->nr_caps++; return err; } -- 2.27.0