Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp4738222rwb; Tue, 8 Aug 2023 13:03:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG9UrLjzpZWRVafQDwFkOzWxvitdghPeyIP6Il6OHCLwmyApBca1PPoLqqP8mC9Wv3LOCI3 X-Received: by 2002:a05:6a21:47cb:b0:133:f5c1:57bb with SMTP id as11-20020a056a2147cb00b00133f5c157bbmr504000pzc.20.1691525018425; Tue, 08 Aug 2023 13:03:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691525018; cv=none; d=google.com; s=arc-20160816; b=yRxhi7YE6I4qJoIhtjIMIKgNFj2UT4pZUVUG/fu8bKOimvSZURWxtG7VnlIyRRKI/4 ZId+LuH80hy/1BfHWRYQpmJuJwbdlbUlf2dZy1C7GDbpFIVRwhVAY2TrhFDFLaYYmc8u a3df2Zx0uOS1cuZ2Czwg2HUwX5jJTnuff5uD4u6lidvP+QUnZoaucbMb8ggh0Jt6uq4g 0mjwOhkJpK3u7L/ftF43OuwrHjfYn5zVXYaN920O5WXxaYsBbLuNBCCj+Tez7iBKbSYl rehw9uxBxZQDZUzSaTKPYbIP6vHwXwWYEIkuAVYcqmGcrDTIHVsOME6LjoUH+tcCKLTW cOZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ocrLZldiKRRjCpTMnAcgLfaZcJrHDSgwSZ6a+ojKCrY=; fh=/hgM5IvRE5AP6VP2ZdfJoDNUYtWW0ZNnvVItVZVwthk=; b=oguB+OePYlmpFkmNZIvzKbCeiFzIBmfGxmzj4dAlMKOZCk70nT2i/zNLZz0tIFLrR1 HapuoztXPU954XG3CZVs7Lad7lwdFuLzEtez+936HGif4gTjijlQX72XDynEmm/Rxho8 bBxoDn77/PxPAktw5bSP5+V/0Gxwav6sFlT/u7OIY91RM/nHIVdNqS841xQx9AQWGTtB kb1ZYOTLB4crhEON1Nn7gp3mjyA315+IBu2NM/KRFw4n00pTvUM4PBca53fVHDZ65OUs bnCQaVFs0AhTcjRL35TDJYwR0Xl5kr2TouSbOASTnXxPh7H/5Fq6SU5KrnHNzH6uGzMO pFkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SPUyRYpG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f22-20020a631016000000b005641315d956si7796370pgl.147.2023.08.08.13.03.26; Tue, 08 Aug 2023 13:03:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SPUyRYpG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236482AbjHHTdd (ORCPT + 99 others); Tue, 8 Aug 2023 15:33:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236300AbjHHTdQ (ORCPT ); Tue, 8 Aug 2023 15:33:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88336222CE; Tue, 8 Aug 2023 10:36:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 27E1F62788; Tue, 8 Aug 2023 17:36:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE0D4C433CA; Tue, 8 Aug 2023 17:36:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691516162; bh=hcG4JmKbckgBfZQK+co26W7c8YuOsWmymvwei5Re2CI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SPUyRYpGDpWAlgSxVECNisRsgXC4PoBQ+eBMJhF7XdSHKz2E9gBDN+Pxg0lP0wU1V 2lCOH7/ZROGbo7lJNKqYS9RJlrtdSi9smaFktr90RigvyiEvGzVVCm5dpND2jX22zd 8NJKXEoMaFZb9ccH0lYCgytrz67B6xoVVUNH/0MfOlPmA9ff8GSUu0D/loc0LbmkgX wNwJDJeQdfQz9XhnL0oGLvCMPPRiQd6RTV+g/iirsuTEYGJPIEE0QWR9N45M7bNEso 7KlV8wPWejHTzbQnemdbRMBwAH5SjYxixq07kmZCGvgWl5oaC4UBaneWhn3o4DZsUy Dg2hwoKZa706w== Date: Tue, 8 Aug 2023 19:35:58 +0200 From: Christian Brauner To: Mateusz Guzik , Linus Torvalds Cc: "Eric W. Biederman" , viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, oleg@redhat.com, Matthew Wilcox Subject: Re: [PATCH] fs: use __fput_sync in close(2) Message-ID: <20230808-divers-verehren-02abcc37fe60@brauner> References: <20230806230627.1394689-1-mjguzik@gmail.com> <87o7jidqlg.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="ooa7r4rkjff7kak6" Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --ooa7r4rkjff7kak6 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline > you guys sort it out ;) You're all driving me nuts. ;) Fixed patch appended and put on vfs.misc for testing... @Linus, you ok with the appended thing? --ooa7r4rkjff7kak6 Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="0001-fs-use-__fput_sync-in-close-2.patch" From 9b0919d12c74b3de6c23c52dc5b3d6ffd24fecee Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 8 Aug 2023 19:26:35 +0200 Subject: [PATCH] fs: use __fput_sync in close(2) close(2) is a special case which guarantees a shallow kernel stack, making delegation to task_work machinery unnecessary. Said delegation is problematic as it involves atomic ops and interrupt masking trips, none of which are cheap on x86-64. Forcing close(2) to do it looks like an oversight in the original work. Moreover presence of CONFIG_RSEQ adds an additional overhead as fput() -> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the thread returning to userspace land in resume_user_mode_work(), where rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for the thread (and it is by default with contemporary glibc). Sample result when benchmarking open1_processes -t 1 from will-it-scale (that's an open + close loop) + tmpfs on /tmp, running on the Sapphire Rapid CPU (ops/s): stock+RSEQ: 1329857 stock-RSEQ: 1421667 (+7%) patched: 1523521 (+14.5% / +7%) (with / without rseq) Patched result is the same regardless of rseq as the codepath is avoided. Signed-off-by: Linus Torvalds Signed-off-by: Christian Brauner --- fs/file_table.c | 5 +---- fs/open.c | 27 ++++++++++++++++++++++++--- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index fc7d677ff5ad..ee21b3da9d08 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -461,11 +461,8 @@ void fput(struct file *file) */ void __fput_sync(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) { - struct task_struct *task = current; - BUG_ON(!(task->flags & PF_KTHREAD)); + if (atomic_long_dec_and_test(&file->f_count)) __fput(file); - } } EXPORT_SYMBOL(fput); diff --git a/fs/open.c b/fs/open.c index e6ead0f19964..1f5b3a5f7f18 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1503,7 +1503,7 @@ SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode) * "id" is the POSIX thread ID. We use the * files pointer for this.. */ -int filp_close(struct file *filp, fl_owner_t id) +static int filp_flush(struct file *filp, fl_owner_t id) { int retval = 0; @@ -1520,10 +1520,18 @@ int filp_close(struct file *filp, fl_owner_t id) dnotify_flush(filp, id); locks_remove_posix(filp, id); } - fput(filp); return retval; } +int filp_close(struct file *filp, fl_owner_t id) +{ + int retval; + + retval = filp_flush(filp, id); + fput(filp); + + return retval; +} EXPORT_SYMBOL(filp_close); /* @@ -1533,7 +1541,20 @@ EXPORT_SYMBOL(filp_close); */ SYSCALL_DEFINE1(close, unsigned int, fd) { - int retval = close_fd(fd); + int retval; + struct file *file; + + file = close_fd_get_file(fd); + if (!file) + return -EBADF; + + retval = filp_flush(file, current->files); + + /* + * We're returning to user space. Don't bother + * with any delayed fput() cases. + */ + __fput_sync(file); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || -- 2.34.1 --ooa7r4rkjff7kak6--