Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp7330667imm; Tue, 24 Jul 2018 12:12:21 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd8viX0X9zoZHvQbWsxHg+F/RNUmSNoAcDxGMtwLAfMUOabNlcXl5OI5/2lL2rhbhygETOR X-Received: by 2002:a63:f414:: with SMTP id g20-v6mr17403761pgi.407.1532459541383; Tue, 24 Jul 2018 12:12:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532459541; cv=none; d=google.com; s=arc-20160816; b=BMsYcDn/nGJwpeMFlVuqD9ZTq3KbWukfFfHUXzXc9wHX2G0G3fmpd0CiMw3ogVmsD2 GxrNoIfPmiJKBIKe/+P/MboC5yw5sgnv+mYrd9G3q2WmC/VWTqJfi38RGW3CAjGLeUS6 m6hvYriAYGOXJDNtL6zjDL7FXykHyiCf3zRWTyawg4QIzbrcM1vENrHB+cd4nERFeMB9 ssfI00BPkPNVS2miyRyOY9jyh/fI6FkTyd/KSMomANa+ZD870C2p5LZvYz78qm4ujzZu 1yLujC4moDzsmmacSMS7k2bTODzdO+GV/vQpUZP+D4LdGd5q9F6z1FBgx6/6ofbzLPJ5 FgFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=prP3qnd8RcMNKr0F5zly8wsPphwbbpWYpHf4zR4ErVQ=; b=Ufr660HX+sSJmTQieALTAHjGKh8Q7CVuSJ/Lo5LtzDmUS3wbWqztBuT8+sbX6J+Q1m njmOS2VGvDNywWYAe3IrDcRAi3c7YrshHBZc8OOl4EUk+I0PTco0Rd9ma8Zh0UxDCNvn BJ+U0q7wEygt59PLAo8lVSaWiKTWz54BWWq66xUykSGjAydIlgw5m+I+OSUGaXc0TIyr +D0Vkw6bTdYTKXWqekqKAaeF4lMFDFbUuThdM6grtuyIgsMbGzBRCK+nvMT+FOe6TmRS B00X+ow26ZKe7DHRsoUj5ZGzp25MLwP+4rrouv/XqX0i5/jCNHBqz73BJ16ri9dWkOEP GLCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l190-v6si11290687pgd.375.2018.07.24.12.12.06; Tue, 24 Jul 2018 12:12:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388597AbeGXUS2 (ORCPT + 99 others); Tue, 24 Jul 2018 16:18:28 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49916 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388429AbeGXUS2 (ORCPT ); Tue, 24 Jul 2018 16:18:28 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6DCAC402243B; Tue, 24 Jul 2018 19:10:35 +0000 (UTC) Received: from llong.com (dhcp-17-175.bos.redhat.com [10.18.17.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id EF3A72166BA3; Tue, 24 Jul 2018 19:10:34 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: linux-kernel@vger.kernel.org, Joe Mario , Davidlohr Bueso , Waiman Long Subject: [PATCH v3] locking/rwsem: Exit read lock slowpath if queue empty & no writer Date: Tue, 24 Jul 2018 15:10:25 -0400 Message-Id: <1532459425-19204-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 24 Jul 2018 19:10:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 24 Jul 2018 19:10:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was discovered that a constant stream of readers with occassional writers pounding on a rwsem may cause many of the readers to enter the slowpath unnecessarily thus increasing latency and lowering performance. In the current code, a reader entering the slowpath critical section will unconditionally set the WAITING_BIAS, if not set yet, and clear its active count even if no one is in the wait queue and no writer is present. This causes some incoming readers to observe the presence of waiters in the wait queue and hence have to go into the slowpath themselves. With sufficient numbers of readers and a relatively short lock hold time, the WAITING_BIAS may be repeatedly turned on and off and a substantial portion of the readers will go into the slowpath sustaining a rather long queue in the wait queue spinlock and repeated WAITING_BIAS on/off cycle until the logjam is broken opportunistically. To avoid this situation from happening, an additional check is added to detect the special case that the reader in the critical section is the only one in the wait queue and no writer is present. When that happens, it can just exit the slowpath and return immediately as its active count has already been set in the lock. Other incoming readers won't observe the presence of waiters and so will not be forced into the slowpath. The issue was found in a customer site where they had an application that pounded on the pread64 syscalls heavily on an XFS filesystem. The application was run in a recent 4-socket boxes with a lot of CPUs. They saw significant spinlock contention in the rwsem_down_read_failed() call. With this patch applied, the system CPU usage went down from 85% to 57%, and the spinlock contention in the pread64 syscalls was gone. v3: Revise the commit log and comment again. v2: Add customer testing results and remove wording that may cause confusion. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 3064c50..01fcb80 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, waiter.type = RWSEM_WAITING_FOR_READ; raw_spin_lock_irq(&sem->wait_lock); - if (list_empty(&sem->wait_list)) + if (list_empty(&sem->wait_list)) { + /* + * In case the wait queue is empty and the lock isn't owned + * by a writer, this reader can exit the slowpath and return + * immediately as its RWSEM_ACTIVE_READ_BIAS has already + * been set in the count. + */ + if (atomic_long_read(&sem->count) >= 0) { + raw_spin_unlock_irq(&sem->wait_lock); + return sem; + } adjustment += RWSEM_WAITING_BIAS; + } list_add_tail(&waiter.list, &sem->wait_list); /* we're now waiting on the lock, but no longer actively locking */ -- 1.8.3.1