Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp3223502rwb; Fri, 20 Jan 2023 13:00:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXv7wvIehjWH+5OYg4UatTnaW4fBKJIM1Wrq/21nAnHabQ9sk4vKFTRPkTojULtiEkv0pttX X-Received: by 2002:a05:6a20:4294:b0:b5:a64d:a181 with SMTP id o20-20020a056a20429400b000b5a64da181mr22606691pzj.19.1674248451740; Fri, 20 Jan 2023 13:00:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674248451; cv=none; d=google.com; s=arc-20160816; b=be4kwcEcVo+6TYZ7bccmOFd6xwmnjlIfE+Y6cTXjNQ3vtIpsD4eT4WR/62z/+rMH0S ita7zSiBw7DA8vdfpRN8mLZVSZNZnYQN0Ar8SucnFn89HNRUdCnt8WGDY91PXnVJFzo1 /Fp3MEO8j7CbpVP2zKAJuxqzo8eQAZFPRVollqSh44VPojpFxIpSthLjS1xCF3WSfC12 PCC7S+lRUIk0InBvOnqy+rFu2+sF75uB72t3kknLD/lBmjwDP+9leibb+wXqy33OAcZX 2Z3sH/WBI21x3ltCPTb25ZLByu5KW8CJ59Q3WY7MaxYVC225WDmvrKJeLYlhxZeTWOm3 exHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=x62E/nl+ONbICiSsmar4KjRWoAfoTSBdrOdf9qKQras=; b=Qy4KuSTSTHdj5ct90d6rTI+4a+0d+Iy6ElfzwI+EUBExUoC2W+C8358Qvrxy/KI7FN m5vHspM2XpM0/NupUu+eY3HsqxklhAGJRFCbkw/zHhEPJPLRze8AyxghP2ceJG3ZX/P+ 4fWtHkYjbxM6mvJIehlHEmY8R4Fi9+ohYNqm250pKWjLlcP6NzzQGnfjXc1/c6uG7SQ4 JIv779NvLR+7q/4d2/QrW8wRb4E9KUf4QBTPNDlSRVXRFSBhxM6hCSvDiCMhziCbsZpx YSEIeG+pNSQ7DkSR4A54uMfVqfC7QS++c7YVrHEsS8wGXfeQUHRp/bpEkOyNxNJdS3RB 688A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore.com header.s=google header.b=LlMl9c3N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=paul-moore.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b17-20020a631b11000000b00476d7732344si10911340pgb.741.2023.01.20.13.00.45; Fri, 20 Jan 2023 13:00:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore.com header.s=google header.b=LlMl9c3N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=paul-moore.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229447AbjATUpt (ORCPT + 50 others); Fri, 20 Jan 2023 15:45:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbjATUpr (ORCPT ); Fri, 20 Jan 2023 15:45:47 -0500 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA7549B135 for ; Fri, 20 Jan 2023 12:45:46 -0800 (PST) Received: by mail-pj1-x102c.google.com with SMTP id k10-20020a17090a590a00b0022ba875a1a4so3177962pji.3 for ; Fri, 20 Jan 2023 12:45:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=x62E/nl+ONbICiSsmar4KjRWoAfoTSBdrOdf9qKQras=; b=LlMl9c3NNGZmycFs5fSTn3CIyY+wd5nQUAS9vIqtV2A1S10Snz9ztD6ABsdBAIJMRX oOqyZ9S+OA9lpbBymTACh/JWRKDGwvqeTHfjPnQKyCRpvfE8/kR08j3fH1/fBHcmzXW+ TameGWN6nJVw0vttTbxWtt9aJX7iXSs3sRQbx3v6Im2BnGLE8bw+ILl7oI/tGr8nE3K9 w0W39/8ApnDEXH2dAz1PRBoVc6nRzQCJet7n6Dt4cJAHBnHCBFVZCcL2Cuuj+41U6mTN 5YH9HJS4Zk1qsDfsPa38q/o9UzgT13gMNA75WHbP8gGNlwakrQWhF/PC1Ia7CBB96akN hkbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=x62E/nl+ONbICiSsmar4KjRWoAfoTSBdrOdf9qKQras=; b=HhOuZ2EvUy3pO1+1vhtjmMzowFbK5B59nBxIlkkm7vgDjZeZa9kPdS1u+H0zb5KkoL AFESXyuvpglMeRVkdSMIaTaX03/ldPSwQ9sB+YXCz7YqNBDLUQqNawe4JilAK+WB3fZe caeolGzmX5xXJfEZigyibUgO5pIdaa9vCJMNHFgIqnwRqxQv+1NxPHlTYXAmMXAadL3Y YmbSTFQc2ktQ7obFer1Eypat2kfT4MHAzcF2h7DDEq0rMGRI4EH9UxyraI7hXvxovRgo r/MRj2R7XH6PnLi5DsbbUaIv3GQExQA6C19KgTw9CINdieNeg0hTPGiE7OTeNVvegRoq qbjQ== X-Gm-Message-State: AFqh2kpgjNpMtv5+t2uVA98TFCvBKgfYf5xxH19C8qwFRZmatXyLvJqZ SDORaJs2MGkLCC9lHFk0o1aOCKSfNF9Im+5SA5jJ X-Received: by 2002:a17:90a:17a1:b0:229:9ffe:135b with SMTP id q30-20020a17090a17a100b002299ffe135bmr1730636pja.72.1674247546190; Fri, 20 Jan 2023 12:45:46 -0800 (PST) MIME-Version: 1.0 References: <20230116212105.1840362-1-mjguzik@gmail.com> <20230116212105.1840362-2-mjguzik@gmail.com> In-Reply-To: <20230116212105.1840362-2-mjguzik@gmail.com> From: Paul Moore Date: Fri, 20 Jan 2023 15:45:34 -0500 Message-ID: Subject: Re: [PATCH v2 2/2] vfs: avoid duplicating creds in faccessat if possible To: Mateusz Guzik Cc: viro@zeniv.linux.org.uk, serge@hallyn.com, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 16, 2023 at 4:21 PM Mateusz Guzik wrote: > > access(2) remains commonly used, for example on exec: > access("/etc/ld.so.preload", R_OK) > > or when running gcc: strace -c gcc empty.c > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 0.00 0.000000 0 42 26 access > > It falls down to do_faccessat without the AT_EACCESS flag, which in turn > results in allocation of new creds in order to modify fsuid/fsgid and > caps. This is a very expensive process single-threaded and most notably > multi-threaded, with numerous structures getting refed and unrefed on > imminent new cred destruction. > > Turns out for typical consumers the resulting creds would be identical > and this can be checked upfront, avoiding the hard work. > > An access benchmark plugged into will-it-scale running on Cascade Lake > shows: > test proc before after > access1 1 1310582 2908735 (+121%) # distinct files > access1 24 4716491 63822173 (+1353%) # distinct files > access2 24 2378041 5370335 (+125%) # same file Out of curiosity, do you have any measurements of the impact this patch has on the AT_EACCESS case when the creds do need to be modified? > The above benchmarks are not integrated into will-it-scale, but can be > found in a pull request: > https://github.com/antonblanchard/will-it-scale/pull/36/files > > Signed-off-by: Mateusz Guzik > > v2: > - fix current->cred usage warn reported by the kernel test robot > Link: https://lore.kernel.org/all/202301150709.9EC6UKBT-lkp@intel.com/ > --- > fs/open.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/fs/open.c b/fs/open.c > index 82c1a28b3308..3c068a38044c 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -367,7 +367,37 @@ COMPAT_SYSCALL_DEFINE6(fallocate, int, fd, int, mode, compat_arg_u64_dual(offset > * access() needs to use the real uid/gid, not the effective uid/gid. > * We do this by temporarily clearing all FS-related capabilities and > * switching the fsuid/fsgid around to the real ones. > + * > + * Creating new credentials is expensive, so we try to skip doing it, > + * which we can if the result would match what we already got. > */ > +static bool access_need_override_creds(int flags) > +{ > + const struct cred *cred; > + > + if (flags & AT_EACCESS) > + return false; > + > + cred = current_cred(); > + if (!uid_eq(cred->fsuid, cred->uid) || > + !gid_eq(cred->fsgid, cred->gid)) > + return true; > + > + if (!issecure(SECURE_NO_SETUID_FIXUP)) { > + kuid_t root_uid = make_kuid(cred->user_ns, 0); > + if (!uid_eq(cred->uid, root_uid)) { > + if (!cap_isclear(cred->cap_effective)) > + return true; > + } else { > + if (!cap_isidentical(cred->cap_effective, > + cred->cap_permitted)) > + return true; > + } > + } > + > + return false; > +} I worry a little that with nothing connecting access_need_override_creds() to access_override_creds() there is a bug waiting to happen if/when only one of the functions is updated. Given the limited credential changes in access_override_creds(), I wonder if a better solution would be to see if we could create a light(er)weight prepare_creds()/override_creds() that would avoid some of the prepare_creds() hotspots (I'm assuming that is where most of the time is being spent). It's possible this could help improve the performance of other, similar operations that need to modify task creds for a brief, and synchronous, period of time. Have you done any profiling inside of access_override_creds() to see where the most time is spent? Looking at the code I have some gut feelings on the hotspots, but it would be good to see some proper data before jumping to any conclusions. > static const struct cred *access_override_creds(void) > { > const struct cred *old_cred; > @@ -436,7 +466,7 @@ static long do_faccessat(int dfd, const char __user *filename, int mode, int fla > if (flags & AT_EMPTY_PATH) > lookup_flags |= LOOKUP_EMPTY; > > - if (!(flags & AT_EACCESS)) { > + if (access_need_override_creds(flags)) { > old_cred = access_override_creds(); > if (!old_cred) > return -ENOMEM; > -- > 2.34.1 -- paul-moore.com