Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp2282202ybk; Mon, 11 May 2020 17:11:19 -0700 (PDT) X-Google-Smtp-Source: APiQypKJXeEWt2yRyvMda5eCzO+sSIoU9gUQ1xLnFqcWXpHKIj6qQauHx97kZCxIm2s4EGAi2iFx X-Received: by 2002:a17:906:c950:: with SMTP id fw16mr11436381ejb.65.1589242278898; Mon, 11 May 2020 17:11:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589242278; cv=none; d=google.com; s=arc-20160816; b=ZOG8kqykASNJCwbnwpKHp7y12SlRXNrT3BsnWuog0IbvJFdKvUgBQsWfPYX5FZ9K5C fsdqMypRbJWwm9ythQjudtUAOOBK2S1uzThbdmBakKDPa8+RzLB0C/P/8SqNy742VXhr zjYTo4xTOqHxJG8SdjI3GdwA2c289RjWcqO7r2svExEHD5EW+ludnKoEX+3eXMMWuXAp E0LBKA0Hn2vvQEW2elAnKyuASfTyYK7zNNK7238ytffd9owuJRVCVzSzSQdfTTEiZeuM s5AxwTQVWsyIxb6MZBvoKc8jWHsfgnp4ZL5B5La2h+M7MC4s7FgHg009Z1jSCq9CZvMK UDLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=4mXm62RcaTdZjAzJ3Gu5Dplb3hpC877WN0KE+tkMdyo=; b=xvhc6Mp+YILeorv5WtNu1G0q4g8YxgPx1gDjWr+xi+rTGb+mS2IIWzC9cZ5mvTAyS0 AzYtyvHnG3AB3rvZviC0yn4HzHXkBjFvu7s6eHz93EX28Igtc/Sh0MGN/vxky0tC+36K L+LpDzYauPHQkYIrdjVPG6sfLHuGaRLGdf10Je+WAZygbGlDdQmSX3HJKhrB6T0bY1I9 2Jv9U5KJUDOx10/KBdx7qKL2XMwUb2J+4VS3IDuhWEGe2PiZ+jlV1ta7cUsNZCMWb/qX r3Hqkb8V2Y3InyroOk2ht3pICVXCMJ9vA7Xa/kQavZXBACoLtM0sBhBM3o1LIq5u1Lme R/Ag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a13si4063876edr.32.2020.05.11.17.10.55; Mon, 11 May 2020 17:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728290AbgELAJY (ORCPT + 99 others); Mon, 11 May 2020 20:09:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:34698 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbgELAJX (ORCPT ); Mon, 11 May 2020 20:09:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 717ECAE79; Tue, 12 May 2020 00:09:24 +0000 (UTC) From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: peterz@infradead.org, oleg@redhat.com, paulmck@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 1/2] kernel/sys: only rely on rcu for getpriority(2) Date: Mon, 11 May 2020 17:03:52 -0700 Message-Id: <20200512000353.23653-2-dave@stgolabs.net> X-Mailer: git-send-email 2.26.1 In-Reply-To: <20200512000353.23653-1-dave@stgolabs.net> References: <20200512000353.23653-1-dave@stgolabs.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently the tasklist_lock is shared mainly in order to observe the list atomically for the PRIO_PGRP and PRIO_USER cases, as the actual lookups are already rcu-safe, providing a stable task pointer. By removing the lock, we can race with: (i) fork (insertion), but this is benign as the child's nice is inherited and the actual task is not observable by the user yet either. The return semantics do not differ. and; (ii) exit (deletion), this window is small but if a task is deleted with the highest nice and it is not observed this would cause a change in return semantics. To further reduce the window we ignore any tasks that are PF_EXITING in the 'old' version of the list. The case for PRIO_PROCESS does not need the lock at all as it only looks up the pointer. The following raw microbenchmark improvements on a 40-core box were seen running the stressng-get workload, which pathologically pounds on various syscalls that get information from the kernel. Increasing thread counts of course shows more wins, albeit probably not something that would be seen in a real workload. 5.7.0-rc3 5.7.0-rc3 getpriority-v1 Hmean get-1 3443.65 ( 0.00%) 3314.08 * -3.76%* Hmean get-2 7809.99 ( 0.00%) 8547.60 * 9.44%* Hmean get-4 15498.01 ( 0.00%) 17396.85 * 12.25%* Hmean get-8 28001.37 ( 0.00%) 31137.53 * 11.20%* Hmean get-16 31460.88 ( 0.00%) 40284.35 * 28.05%* Hmean get-32 30036.64 ( 0.00%) 40657.88 * 35.36%* Hmean get-64 31429.86 ( 0.00%) 41021.73 * 30.52%* Hmean get-80 31804.13 ( 0.00%) 39188.55 * 23.22%* Signed-off-by: Davidlohr Bueso --- kernel/sys.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index d325f3ab624a..0b72184f5e3e 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -277,7 +277,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) return -EINVAL; rcu_read_lock(); - read_lock(&tasklist_lock); switch (which) { case PRIO_PROCESS: if (who) @@ -296,6 +295,9 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) else pgrp = task_pgrp(current); do_each_pid_thread(pgrp, PIDTYPE_PGID, p) { + if (unlikely(p->flags & PF_EXITING)) + continue; + niceval = nice_to_rlimit(task_nice(p)); if (niceval > retval) retval = niceval; @@ -313,6 +315,9 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) } do_each_thread(g, p) { if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) { + if (unlikely(p->flags & PF_EXITING)) + continue; + niceval = nice_to_rlimit(task_nice(p)); if (niceval > retval) retval = niceval; @@ -323,7 +328,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) break; } out_unlock: - read_unlock(&tasklist_lock); rcu_read_unlock(); return retval; -- 2.26.1