Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp119168pxb; Tue, 15 Feb 2022 06:43:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJzgLZ2ZnJY6os1tdUVSsvWKafQiomKwCAsKGpwy45HLAUodA/f1yFA/I/UbQd7ADqbo1bdu X-Received: by 2002:a05:6402:702:: with SMTP id w2mr1507012edx.128.1644936213816; Tue, 15 Feb 2022 06:43:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644936213; cv=none; d=google.com; s=arc-20160816; b=rcGUanjPFr01YOldG022pCDwoiAxANPXlem3Bib2BxlpNJmlCQF5XOfhVomaK09yOz S8wc3yu0qFpI3bRFWbktHZF05zLd3vl820gdm1CXN1yF2GIuZ6R2PvHbZNF9Cx0hRiZw Mq0sKklWZA66KHyO3X/pOuUsg2e7L5DyNo67LfO000uiOc0Lmewd4bIzWJxaxNvhzyRL qwN6ybCy1qRBjh4+bqQ0i7QCI9hcs45FA/YeB9ltpsloFvod6I37WdoGucjk2fAl6gWr kcQSx3EB7NKPxIyIDUKdOqXJ26rTzFjIkDYyE7bHRyMoQQasnI96XXOv+wfSnLKNJ7iG KHNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=RV8Ho8pCQoB+zUICiqBodx5YAeRUvW3kx7DwwxzZi0A=; b=DGmsOxrjsIM858bcAfy9ZOEE0BjrN2AcvVWt/3Zk3XnN0dJB77XZBA0XG+nY+W9sFv yzpZ5i3UWizJy60Wja2noLIoHcmDRB8wFjCIgvB6fZo6pt7Tb8Hy3M2H12t7bvmtEL+m RseeSLhmdLbx4W3dhYYfHFrdMnO5PQvuV9AgCDIGw0Uux73gFFRanyrNMjoY+VLaQzPy KVyIBrf4V+7B3UGbOqf9DTRSx22OyV61hDRr25P8DRguMnD24QtOjkUIZZIY/C2wavOu h6j1SmmU7iQjH841rJGbpE9zvp2Ggw4R80wvxAJd7B90Kj+4HCUR3Zhj1TjUJqicevmn jZgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hz1si2253917ejc.57.2022.02.15.06.43.10; Tue, 15 Feb 2022 06:43:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234520AbiBOGff (ORCPT + 99 others); Tue, 15 Feb 2022 01:35:35 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233731AbiBOGfe (ORCPT ); Tue, 15 Feb 2022 01:35:34 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E16DB16C9; Mon, 14 Feb 2022 22:35:23 -0800 (PST) Received: from dggpeml500020.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4JyWSN6srjz1FCsp; Tue, 15 Feb 2022 14:31:00 +0800 (CST) Received: from huawei.com (10.175.127.227) by dggpeml500020.china.huawei.com (7.185.36.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 15 Feb 2022 14:35:20 +0800 From: Baokun Li To: CC: , , , , , , , , Subject: [PATCH 5.10] fget: clarify and improve __fget_files() implementation Date: Tue, 15 Feb 2022 14:51:07 +0800 Message-ID: <20220215065107.3045023-1-libaokun1@huawei.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500020.china.huawei.com (7.185.36.88) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Linus Torvalds commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream. Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached. But I was wrong, as pointed out by the kernel test robot. The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads. That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb. Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too. [ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ] Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot Tested-by: Carel Si Cc: Jann Horn Cc: Miklos Szeredi Signed-off-by: Linus Torvalds Signed-off-by: Baokun Li --- fs/file.c | 72 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 56 insertions(+), 16 deletions(-) diff --git a/fs/file.c b/fs/file.c index 9d02352fa18c..79a76d04c7c3 100644 --- a/fs/file.c +++ b/fs/file.c @@ -817,28 +817,68 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget_files(struct files_struct *files, unsigned int fd, - fmode_t mask, unsigned int refs) +static inline struct file *__fget_files_rcu(struct files_struct *files, + unsigned int fd, fmode_t mask, unsigned int refs) { - struct file *file; + for (;;) { + struct file *file; + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + struct file __rcu **fdentry; - rcu_read_lock(); -loop: - file = fcheck_files(files, fd); - if (file) { - /* File object ref couldn't be taken. - * dup2() atomicity guarantee is the reason - * we loop to catch the new file (or NULL pointer) + if (unlikely(fd >= fdt->max_fds)) + return NULL; + + fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + file = rcu_dereference_raw(*fdentry); + if (unlikely(!file)) + return NULL; + + if (unlikely(file->f_mode & mask)) + return NULL; + + /* + * Ok, we have a file pointer. However, because we do + * this all locklessly under RCU, we may be racing with + * that file being closed. + * + * Such a race can take two forms: + * + * (a) the file ref already went down to zero, + * and get_file_rcu_many() fails. Just try + * again: */ - if (file->f_mode & mask) - file = NULL; - else if (!get_file_rcu_many(file, refs)) - goto loop; - else if (__fcheck_files(files, fd) != file) { + if (unlikely(!get_file_rcu_many(file, refs))) + continue; + + /* + * (b) the file table entry has changed under us. + * Note that we don't need to re-check the 'fdt->fd' + * pointer having changed, because it always goes + * hand-in-hand with 'fdt'. + * + * If so, we need to put our refs and try again. + */ + if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || + unlikely(rcu_dereference_raw(*fdentry) != file)) { fput_many(file, refs); - goto loop; + continue; } + + /* + * Ok, we have a ref to the file, and checked that it + * still exists. + */ + return file; } +} + +static struct file *__fget_files(struct files_struct *files, unsigned int fd, + fmode_t mask, unsigned int refs) +{ + struct file *file; + + rcu_read_lock(); + file = __fget_files_rcu(files, fd, mask, refs); rcu_read_unlock(); return file; -- 2.31.1