Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp573856yba; Fri, 5 Apr 2019 12:24:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqyvq7WQ80AllFWghutwcZgqwrjozZkihpHf1sf/xybGhxN7Ka9p1G+IdJBptyVAMnimILjC X-Received: by 2002:a17:902:8f92:: with SMTP id z18mr15145165plo.123.1554492259461; Fri, 05 Apr 2019 12:24:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554492259; cv=none; d=google.com; s=arc-20160816; b=LXd0DovPPKf2DqmrvoK09jhU+dh/tjIlOKJ7FZujr/VryYQwUPhU+jL9aIvUjQpsRR GXtzhjSMEZL33B11gQ5PD9vbs0ujy5uuH7dhI9lhyyY0dS+LYF3kl7LG96a6ZBbFS/sI J55DkeF1gDUOeHK3vU87c7K1SPy9hEZ8wxDiZ+4qEPjK5TbFmfo0exNY3bvFjdZG0gD6 as6NXXs/2MOwJTus3ZnIINNRGy8/GgTodX1oWUMwmZ1rgDmDV7wrmtLSq6BjGji/qK4h huuDbSOCmBk6sxVTNvFnO3yz9dwqmvYP4Fyu9Y8yaSFkKH1mGHhDlZfW1/FzRavQ0+PJ K9ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=RU3ykS6xlHo8XdrUKEggir4alrQiC3lhoQVJP7XiEsA=; b=nyccqiI4qa+MHn20SfolitAh/+DQJhAlpX65P8EDuAmUIZ1zip5ozQsi2CxvxIVEb1 UMPMYWm2U7BLXReiwXGORirZQZAoR4hl7NoQLJcKTLmzV2y2IabDzRNdDoVa8qZyG5DF W1LAmlLa3CfykBVm5dEP0T93ndhGCDXUe9LZerPGZkezivhYMEpwDOt9LZr8XY/pDf5g RGZ3P/dFpu22gdcVzI1AvZMs3H//7VZLrDQkpzSpgV8M0l2Yfjc3FtojhSec+EecMS5g +bgeHhxBFimOwcw32ay2T6czjHfxY09KGioJYTpc0wPJTslvquxklswzO+rI13KG6I+L YwjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f2si19184587pgi.61.2019.04.05.12.24.04; Fri, 05 Apr 2019 12:24:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731752AbfDETV7 (ORCPT + 99 others); Fri, 5 Apr 2019 15:21:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39390 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731183AbfDETV6 (ORCPT ); Fri, 5 Apr 2019 15:21:58 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 48B503082E4F; Fri, 5 Apr 2019 19:21:58 +0000 (UTC) Received: from llong.com (dhcp-17-47.bos.redhat.com [10.18.17.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6942761353; Fri, 5 Apr 2019 19:21:54 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Davidlohr Bueso , Linus Torvalds , Tim Chen , Waiman Long Subject: [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Date: Fri, 5 Apr 2019 15:21:03 -0400 Message-Id: <20190405192115.17416-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 05 Apr 2019 19:21:58 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v2: - Move the negative reader count checking patch (patch 12->10) forward to before the merge owner to count patch as suggested by Linus & expand the comment. - Change the reader-owned rwsem spinning from count based to time based to have better control of the max time allowed. This is part 2 of a 3-part (0/1/2) series to rearchitect the internal operation of rwsem. part 0: merged into tip part 1: https://lore.kernel.org/lkml/20190404174320.22416-1-longman@redhat.com/ This patchset revamps the current rwsem-xadd implementation to make it saner and easier to work with. It also implements the following 3 new features: 1) Waiter lock handoff 2) Reader optimistic spinning 3) Store write-lock owner in the atomic count (x86-64 only) Waiter lock handoff is similar to the mechanism currently in the mutex code. This ensures that lock starvation won't happen. Reader optimistic spinning enables readers to acquire the lock more quickly. So workloads that use a mix of readers and writers should see an increase in performance as long as the reader critical sections are short. Finally, storing the write-lock owner into the count will allow optimistic spinners to get to the lock holder's task structure more quickly and eliminating the timing gap where the write lock is acquired but the owner isn't known yet. This is important for RT tasks where spinning on a lock with an unknown owner is not allowed. Because of the fact that multiple readers can share the same lock, there is a natural preference for readers when measuring in term of locking throughput as more readers are likely to get into the locking fast path than the writers. With waiter lock handoff, we are not going to starve the writers. On a 8-socket 120-core 240-thread IvyBridge-EX system with 120 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 120 readers, Iterations Min/Mean/Max = 399/400/401 120 writers, Iterations Min/Mean/Max = 400/33,389/211,359 After the patchset, they became: 120 readers, Iterations Min/Mean/Max = 584/10,266/26,609 120 writers, Iterations Min/Mean/Max = 22,080/29,016/38,728 So it was much fairer to readers. With less locking threads, the readers were preferred than writers. Patch 1 implements a new rwsem locking scheme similar to what qrwlock is current doing. Write lock is done by atomic_cmpxchg() while read lock is still being done by atomic_add(). Patch 2 implments lock handoff to prevent lock starvation. Patch 3 removes rwsem_wake() wakeup optimization as it doesn't work with lock handoff. Patch 4 makes rwsem_spin_on_owner() returns owner state. Patch 5 disallows RT tasks to spin on a rwsem with unknown owner. Patch 6 makes reader wakeup to wake almost all the readers in the wait queue instead of just those in the front. Patch 7 enables reader to spin on a writer-owned rwsem. Patch 8 enables a writer to spin on a reader-owned rwsem for at most 25us. Patch 9 adds some new rwsem owner access helper functions. Patch 10 handles the case of too many readers by reserving the sign bit to designate that a reader lock attempt will fail and the locking reader will be put to sleep. This will ensure that we will not overflow the reader count. Patch 11 merges the write-lock owner task pointer into the count. Only 64-bit count has enough space to provide a reasonable number of bits for reader count. This is for x86-64 only for the time being. Patch 12 eliminates redundant computation of the merged owner-count. With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 8-socket IvyBridge-EX system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 1,179 9,436 4 1,505 8,268 8 721 7,041 16 575 7,652 32 70 2,189 64 39 534 Waiman Long (12): locking/rwsem: Implement a new locking scheme locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Remove rwsem_wake() wakeup optimization locking/rwsem: Make rwsem_spin_on_owner() return owner state locking/rwsem: Ensure an RT task will not spin on reader locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: Enable readers spinning on writer locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Add more rwsem owner access helpers locking/rwsem: Guard against making count negative locking/rwsem: Merge owner into count on x86-64 locking/rwsem: Remove redundant computation of writer lock word kernel/locking/lock_events_list.h | 4 + kernel/locking/rwsem-xadd.c | 635 +++++++++++++++++++----------- kernel/locking/rwsem.c | 3 +- kernel/locking/rwsem.h | 290 +++++++++++--- 4 files changed, 647 insertions(+), 285 deletions(-) -- 2.18.1