Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1104038imm; Fri, 13 Jul 2018 11:32:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfIopFZYt6OEMbPJh0098/PYZphypyzzlHMDurHOrSnz7OkHI73D2z8grMMFQ4bKHfUiIxp X-Received: by 2002:a63:a919:: with SMTP id u25-v6mr7266770pge.211.1531506742350; Fri, 13 Jul 2018 11:32:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531506742; cv=none; d=google.com; s=arc-20160816; b=wdRUHL/iKbH2b99hY4F+9X0DWWCXJIe4QWYRmPxiaP/EU39iQNwRcYr+/vRBHlUKso hf9bXdZLEN5L7sDBhfdDno+0OHsUMCEmXoRhhUiAmwx19ZhOndM56xTqz1LrO5eFvhBH 27M15/Qp92hb68jLPd0L49V/QDre2SkG3lesxUgabW8OEXiEfAJgw/iHDdpA1+2FvYpd htB8cPWv4C/iJ8RnoQrcpdevhgWJjoQ7AbUEDX35L/l+gmRQFB5kIWa+hfczZ8ImTA/1 flBNNSWBCyFqXHT0nNPxdXz0gKZADLlrdwGFvZO50alBzrCiAh7+pXKwBvQixg2hrkIB bxqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=4S8pxZjesMqZw4hPtE4+MLszrInHBGNY9ZdzI17eJXI=; b=idVQhr8BxrX3lpVqwSCCuwqaZe9RQXgtc6pgs4U3Ph2SO7S2HEL993pO/sYjUqAlax k9GDtnDSVbpM1ssx4nDBuC0tp5JAfz3erwPlmZD56AcpvQZhb51bsLHCMqgrCKh4hxf0 fOxm/k8YrAAXVq6NUqYxUVX9GWyukc1zy2eNcITSj1OhPQamrHQtjFeO0WdqdbdvU7Wa 1sm7UpDYFiP5kqINrM+vKLTGcz8EeEI6CBWaTm8cHdXwq2jnQQHfABqg+YgfP4RhlCtk 5rntgQ9gAtINqB70mLHNyQthQdhoN5b9Uyz2f1Zzm5VltIBAS5HjZcecjH4Eua/5wChP osKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6-v6si23832066pgm.647.2018.07.13.11.32.07; Fri, 13 Jul 2018 11:32:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731946AbeGMSqu (ORCPT + 99 others); Fri, 13 Jul 2018 14:46:50 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43612 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731366AbeGMSqu (ORCPT ); Fri, 13 Jul 2018 14:46:50 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 952BC81A4EA5; Fri, 13 Jul 2018 18:31:03 +0000 (UTC) Received: from llong.com (dhcp-17-175.bos.redhat.com [10.18.17.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id 839C81C72F; Fri, 13 Jul 2018 18:31:01 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: linux-kernel@vger.kernel.org, Mark Ray , Joe Mario , Scott Norton , Waiman Long Subject: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer Date: Fri, 13 Jul 2018 14:30:53 -0400 Message-Id: <1531506653-5244-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 13 Jul 2018 18:31:03 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 13 Jul 2018 18:31:03 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was discovered that a constant stream of readers might cause the count to go negative most of the time after an initial trigger by a writer even if no writer was present afterward. As a result, most of the readers would have to go through the slowpath reducing their performance. To avoid that from happening, an additional check is added to detect the special case that the reader in the critical section is the only one in the wait queue and no writer is present. When that happens, it can just have the lock and return immediately without further action. Other incoming readers won't see a waiter is present and be forced into the slowpath. The additional code is in the slowpath and so should not have an impact on rwsem performance. However, in the special case listed above, it may greatly improve performance. The issue was found in a customer site where they had an application that pounded on the pread64 syscalls heavily on an XFS filesystem. The application was run in a recent 4-socket boxes with a lot of CPUs. They saw significant spinlock contention in the rwsem_down_read_failed() call. With this patch applied, the system CPU usage went from 85% to 57%, and the spinlock contention in the pread64 syscalls was gone. v2: Add customer testing results and remove wording that may cause confusion. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 3064c50..bf0570e 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, waiter.type = RWSEM_WAITING_FOR_READ; raw_spin_lock_irq(&sem->wait_lock); - if (list_empty(&sem->wait_list)) + if (list_empty(&sem->wait_list)) { + /* + * In the unlikely event that the task is the only one in + * the wait queue and a writer isn't present, it can have + * the lock and return immediately without going through + * the remaining slowpath code. + */ + if (unlikely(atomic_long_read(&sem->count) >= 0)) { + raw_spin_unlock_irq(&sem->wait_lock); + return sem; + } adjustment += RWSEM_WAITING_BIAS; + } list_add_tail(&waiter.list, &sem->wait_list); /* we're now waiting on the lock, but no longer actively locking */ -- 1.8.3.1