Received: by 2002:a05:7412:3290:b0:fa:6e18:a558 with SMTP id ev16csp290469rdb; Thu, 25 Jan 2024 15:59:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IHoQY4Ziq0Dp/d/FoZDj+Tq1ZzntX9V02Wbi4QPQ7l3zVSxg+ypeS3CZnVntOEaArGFxMb7 X-Received: by 2002:a05:6830:1441:b0:6dd:ecab:299e with SMTP id w1-20020a056830144100b006ddecab299emr621077otp.51.1706227143896; Thu, 25 Jan 2024 15:59:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706227143; cv=pass; d=google.com; s=arc-20160816; b=UQGrRvWnPVRPwGENBir1wzMtXQ6Rrru5DlfIQDuwSm9rUF8IOm2KwCVI8fUBBoGIFO LRDLQn4Ogua6xFi0ANktCS66hCjox7ViS2HUIQBrtgZ9UMknkMA0aNH9w3mAENIbuvHs M0n+WTT0VudLVHFPMmPWb/mpmPA3yEEB8a/9nibzf0+/NQzG8YxCb2NJXbp/8rpIMtKV A0d2qjHTvox74nJfa3MoeQQkU+hMbbzrwdOePyhicFZpOD7V0ykH6GOt/xoQhQ3UVw/C WwP9LhgSXq6MoBNkY1YJTsSgqLdfx6mO8pCJU7YPQtLKvHKv83mGQvkv3F92ipe7MDj1 1Bhg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=9/WFK8fQFauwgXhLWMpNffplRpetKmDRjS4WORwUWFg=; fh=0u4WKmQXLbfFx7dTPMZn/CP58jbWkEI+fyqm6Dolj/A=; b=C3YF+SFrN7fKxxxv6Kqr9jD1lD7kODbeNTseUqlGaur49kPZFk33yF3AOhuMcMvOBE BhqJqHjjGsv0H9+7lbdFGjfpOL2pOE7d2uQJ//Qq3ChAkOYeRByFDFY331aFYGO9AHQL Sj0j+Xw29LLlo7LcFvKtVBG3XOWeR+jukyLSwMScXEjv5Ion2o9Fd3e0DG5tlJAc6oe0 m6EYLYvxhefDyKmG//n3J+FsPPAxxSXBNBnVQZZkfKwAPjCHZDWope0NIi41NPveDNlh gavaOswAlkQDcQAv/k0F9mvscuh+/sSnM2/kn4QwsrWiphBqxB/2rkR7VuDqDHZjJPHx 4z9g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KM6zBS97; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-39429-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39429-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id s33-20020a635261000000b005d4cd6a0aaesi90383pgl.398.2024.01.25.15.59.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 15:59:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-39429-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KM6zBS97; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-39429-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39429-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 0F62AB26109 for ; Thu, 25 Jan 2024 23:58:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 03E431B5A2; Thu, 25 Jan 2024 23:57:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KM6zBS97" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90EF218EA1; Thu, 25 Jan 2024 23:57:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706227057; cv=none; b=R/XM9D6j12Ij3WWtrcvaWOGMr/Xb56OIlz1nEFGZWfKDuzKeg/Xlj1a/g9HwLkwSB+d40oKNbt0rmUwFo8dQMvCTo4O77yD7ewzDa9rVplpuLBXuTfurOyGY5KrQM90E60dsHmYw/sko+L2CohZpiPdSFRRtCXT5PStaqPTVsrs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706227057; c=relaxed/simple; bh=3uFlompF9soIL+QzuHer4RVOjpFnG41ZrxgVyatxx8I=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=O+vpE8ilzi/M+7qJ8mUjH2jE9MnVpvmPALXGh04OW1m+mMzAkj3oqWxT0Gvy62wYdJ6UbjpkLPaXzGo255Gk++3l6fea8ie9me53V8wy5PADFRPE4gXS9JwoK93ajMnwCpIT4FhClUccv9pFAIq6HPUwip5fr2xpPsevtMOuYEA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KM6zBS97; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706227056; x=1737763056; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=3uFlompF9soIL+QzuHer4RVOjpFnG41ZrxgVyatxx8I=; b=KM6zBS977o27qcXCShMYAJlvcMarJx/DOFiaMuh2R85okz5H66PHZuH3 SXkp3p6ORLEwIbpEm73AklHagXcGMvyUDMs8BmOzPJ6PaDKIw/8V6FmzC kSFohTJBYqI2jtFr4oyFxAZkij5JgrvhbSYSBV+ZjHWq//xp96rqmAKaU 4wO4A6AC+SJCWo112bK48R5Lg22XsUnRJYvTnSvwQ4C+2WbY47/X2Gfwp Rs/nMxp2rMS+nGRdNnt4DoiP/JObf4/K9wnbW/LOt5n0M7Jeovu358J0k wHKkTeWbKS3pfdOaNlJLURdRfbY7iw+VRne24k4OA//+hz2uO8OusRUV7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10964"; a="15867558" X-IronPort-AV: E=Sophos;i="6.05,216,1701158400"; d="scan'208";a="15867558" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2024 15:57:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10964"; a="930191095" X-IronPort-AV: E=Sophos;i="6.05,216,1701158400"; d="scan'208";a="930191095" Received: from vcostago-mobl3.jf.intel.com ([10.24.14.99]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2024 15:57:34 -0800 From: Vinicius Costa Gomes To: brauner@kernel.org, amir73il@gmail.com, hu1.chen@intel.com Cc: miklos@szeredi.hu, malini.bhandaru@intel.com, tim.c.chen@intel.com, mikko.ylinen@intel.com, lizhen.you@intel.com, linux-unionfs@vger.kernel.org, linux-kernel@vger.kernel.org, Vinicius Costa Gomes Subject: [RFC v2 0/4] overlayfs: Optimize override/revert creds Date: Thu, 25 Jan 2024 15:57:19 -0800 Message-ID: <20240125235723.39507-1-vinicius.gomes@intel.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi, It was noticed that some workloads suffer from contention on increasing/decrementing the ->usage counter in their credentials, those refcount operations are associated with overriding/reverting the current task credentials. (the linked thread adds more context) In some specialized cases, overlayfs is one of them, the credentials in question have a longer lifetime than the override/revert "critical section". In the overlayfs case, the credentials are created when the fs is mounted and destroyed when it's unmounted. In this case of long lived credentials, the usage counter doesn't need to be incremented/decremented. Add a lighter version of credentials override/revert to be used in these specialized cases. To make sure that the override/revert calls are paired, add a cleanup guard macro. This was suggested here: https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/ With a small number of tweaks: - Used inline functions instead of macros; - A small change to store the credentials into the passed argument, the guard is now defined as (note the added '_T ='): DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T), revert_creds_light(_T)); - Allow "const" arguments to be used with these kind of guards; Some comments: - If patch 1/4 is not a good idea (adding the cast), the alternative I can see is using some kind of container for the credentials; - The only user for the backing file ops is overlayfs, so these changes make sense, but may not make sense in the most general case; For the numbers, some from 'perf c2c', before this series: (edited to fit) # # ----- HITM ----- Shared # Num RmtHitm LclHitm Symbol Object Source:Line Node # ..... ....... ....... .......................... ................ .................. .... # ------------------------- 0 412 1028 ------------------------- 41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1 15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1 0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1 0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1 32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1 0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0 ------------------------- 1 50 103 ------------------------- 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 ------------------------- 2 50 98 ------------------------- 96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1 after this series: # # ----- HITM ----- Shared # Num RmtHitm LclHitm Symbol Object Source:Line Node # ..... ....... ....... .................... ................ ................ .... # ------------------------- 0 54 88 ------------------------- 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 ------------------------- 1 48 83 ------------------------- 97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1 ------------------------- 2 28 44 ------------------------- 85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 The contention is practically gone. Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@intel.com/ Vinicius Costa Gomes (4): cleanup: Fix discarded const warning when defining guards cred: Add a light version of override/revert_creds() overlayfs: Optimize credentials usage fs: Optimize credentials reference count for backing file ops fs/backing-file.c | 124 +++++++++++++++++++--------------------- fs/overlayfs/copy_up.c | 4 +- fs/overlayfs/dir.c | 22 +++---- fs/overlayfs/file.c | 70 ++++++++++++----------- fs/overlayfs/inode.c | 60 ++++++++++--------- fs/overlayfs/namei.c | 21 ++++--- fs/overlayfs/readdir.c | 18 +++--- fs/overlayfs/util.c | 23 ++++---- fs/overlayfs/xattrs.c | 34 +++++------ include/linux/cleanup.h | 2 +- include/linux/cred.h | 21 +++++++ kernel/cred.c | 6 +- 12 files changed, 215 insertions(+), 190 deletions(-) -- 2.43.0