Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp669237ybg; Thu, 19 Mar 2020 06:43:58 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsZ24mCHqJQ6l9yga5zbbkLqfZN57lJffl6c9j8lGRl3q85oM1WI+5hnfmow8JIZvMXCK// X-Received: by 2002:aca:b9c2:: with SMTP id j185mr2276514oif.112.1584625438395; Thu, 19 Mar 2020 06:43:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584625438; cv=none; d=google.com; s=arc-20160816; b=bOn5abF7s2q0uJgXkemBUfnBtD4JFMahqex3RzGiTrU9PdVZz2puwDJAXZ85ZOLNB9 HnYCNj1O6WYWyO1JV2D06ItklTsJ4ndoejtZZkzaJSoJ0BwK6gj/KzCnvDtHpW9S1g8C AsZXS/UV5noPfx/kI47KP9/SNqTQnLpnnfq7uR22HMuYoS/5KomT7SuaJ82zwo+QRAei U2zLfCIJxFwgsnb6SHzixVArfdhel1PeBqLLeqS/nnKD171/bVa2EskpdX2yYs+c6dna zBLhJXMPylBh3APRbL8LJwBYLWh7G2Hd/+o5J1YUyx6sX3XeTTJ++s8+sZ8DWUy03iW6 ILtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=QAlSvo2x6ONxcT0GQXgXccHnSVNPR8b50ldE6XL0nw8=; b=sy4dFMlUTSOnoRimE1e92zTSkCO5vjurB8Clln6Qpf0Pl3PPHlJ5Cp82sU5rDynUpn KIjOMz2x99OgNcXa/uVE4rcO90Xh6/L6awSnq+tUgDGfKhm07JJbPlRs60VWNO9hAueR xzNJMFH78IRyKsCqSmqv4Btev/vi86/eeyr8NHYZalQ/e0r6MH+fZY8S/tHDYPhmi7j7 YEUe3FsJJh0YrdvG9lkBvfCoQm+byFaKxWoOB17BUWDFPDWHu7KqCyPqq0fstKYU1YzF /EnezURc2GBy8j2iH4XlVN13QC1P+Bdyonm4mVD77zpeSt6z7YDAiT+zV3S816MIYt4T he7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=JWTGHnyc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p6si1180671oth.259.2020.03.19.06.43.46; Thu, 19 Mar 2020 06:43:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=JWTGHnyc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727548AbgCSNmy (ORCPT + 99 others); Thu, 19 Mar 2020 09:42:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:38904 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727137AbgCSNmx (ORCPT ); Thu, 19 Mar 2020 09:42:53 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C7076207FC; Thu, 19 Mar 2020 13:42:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584625372; bh=PsAxtC4FjGDQ24yM5CKq8AIgU9920s16YPXXRv9kz5s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JWTGHnycHKW88RlF4s8BXWKQnYQdLOK7gfw22km2/V/SlW5+58TUyB1xjDImWZVM8 J8e84Tou4YvMN3dNzB+1DUdf3kwILzsOmuku0Fr/OhhyIMSwD3b4Vn4FrWBmVahqMe YKiO660RteN38czxC5PQ3VVRJwW3glm1lhV4qWOE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Eric W. Biederman" , "Huang, Ying" , Philip Li , Andi Kleen , Jiri Olsa , Peter Zijlstra , Linus Torvalds , Sasha Levin , Feng Tang Subject: [PATCH 4.14 91/99] signal: avoid double atomic counter increments for user accounting Date: Thu, 19 Mar 2020 14:04:09 +0100 Message-Id: <20200319124006.961377061@linuxfoundation.org> X-Mailer: git-send-email 2.25.2 In-Reply-To: <20200319123941.630731708@linuxfoundation.org> References: <20200319123941.630731708@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Linus Torvalds [ Upstream commit fda31c50292a5062332fa0343c084bd9f46604d9 ] When queueing a signal, we increment both the users count of pending signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount of the user struct itself (because we keep a reference to the user in the signal structure in order to correctly account for it when freeing). That turns out to be fairly expensive, because both of them are atomic updates, and particularly under extreme signal handling pressure on big machines, you can get a lot of cache contention on the user struct. That can then cause horrid cacheline ping-pong when you do these multiple accesses. So change the reference counting to only pin the user for the _first_ pending signal, and to unpin it when the last pending signal is dequeued. That means that when a user sees a lot of concurrent signal queuing - which is the only situation when this matters - the only atomic access needed is generally the 'sigpending' count update. This was noticed because of a particularly odd timing artifact on a dual-socket 96C/192T Cascade Lake platform: when you get into bad contention, on that machine for some reason seems to be much worse when the contention happens in the upper 32-byte half of the cacheline. As a result, the kernel test robot will-it-scale 'signal1' benchmark had an odd performance regression simply due to random alignment of the 'struct user_struct' (and pointed to a completely unrelated and apparently nonsensical commit for the regression). Avoiding the double increments (and decrements on the dequeueing side, of course) makes for much less contention and hugely improved performance on that will-it-scale microbenchmark. Quoting Feng Tang: "It makes a big difference, that the performance score is tripled! bump from original 17000 to 54000. Also the gap between 5.0-rc6 and 5.0-rc6+Jiri's patch is reduced to around 2%" [ The "2% gap" is the odd cacheline placement difference on that platform: under the extreme contention case, the effect of which half of the cacheline was hot was 5%, so with the reduced contention the odd timing artifact is reduced too ] It does help in the non-contended case too, but is not nearly as noticeable. Reported-and-tested-by: Feng Tang Cc: Eric W. Biederman Cc: Huang, Ying Cc: Philip Li Cc: Andi Kleen Cc: Jiri Olsa Cc: Peter Zijlstra Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- kernel/signal.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 8fee1f2eba2f9..c066168f88541 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -379,27 +379,32 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, int override_rlimi { struct sigqueue *q = NULL; struct user_struct *user; + int sigpending; /* * Protect access to @t credentials. This can go away when all * callers hold rcu read lock. + * + * NOTE! A pending signal will hold on to the user refcount, + * and we get/put the refcount only when the sigpending count + * changes from/to zero. */ rcu_read_lock(); - user = get_uid(__task_cred(t)->user); - atomic_inc(&user->sigpending); + user = __task_cred(t)->user; + sigpending = atomic_inc_return(&user->sigpending); + if (sigpending == 1) + get_uid(user); rcu_read_unlock(); - if (override_rlimit || - atomic_read(&user->sigpending) <= - task_rlimit(t, RLIMIT_SIGPENDING)) { + if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { q = kmem_cache_alloc(sigqueue_cachep, flags); } else { print_dropped_signal(sig); } if (unlikely(q == NULL)) { - atomic_dec(&user->sigpending); - free_uid(user); + if (atomic_dec_and_test(&user->sigpending)) + free_uid(user); } else { INIT_LIST_HEAD(&q->list); q->flags = 0; @@ -413,8 +418,8 @@ static void __sigqueue_free(struct sigqueue *q) { if (q->flags & SIGQUEUE_PREALLOC) return; - atomic_dec(&q->user->sigpending); - free_uid(q->user); + if (atomic_dec_and_test(&q->user->sigpending)) + free_uid(q->user); kmem_cache_free(sigqueue_cachep, q); } -- 2.20.1